25.09 PSR升级失败

// 此模板仅供参考,如果不适用可以修改

问题描述

[在这里详细描述您遇到的问题]
2509版本PSR升级失败

升级失败报错日志:

2026-01-31 15:05:54.604223 firmware_mgmt NOTICE: task_instance.lua(677): upgrade FirmwareMode Single
2026-01-31 15:05:54.705074 firmware_mgmt NOTICE: info_mgmt.lua(240): info_key(1_HWSR_1763126026) upgrade set_stage: COMMON_PREPARE → PROCESS, 15
2026-01-31 15:05:54.805321 firmware_mgmt NOTICE: task_mgmt.lua(418): Update task[Id: 1763126026, StartTime: 2026-01-31T15:05:52+08:00, Progress: 15, State: Starting] successfully
2026-01-31 15:05:54.805752 firmware_mgmt NOTICE: task_instance.lua(325): sys_id=1, fw_type=HWSR, filename=/dev/shm/upgrade/1763126026/Firmware1
2026-01-31 15:05:54.811101 firmware_mgmt NOTICE: hpm_package.lua(484): get obj table: 0x70fc5e781fa0 for Id=17_127
2026-01-31 15:05:54.812927 general_hardware NOTICE: upgrade_subject.lua(109): [on_upgrade_process] start upgrade HWSR
2026-01-31 15:05:54.813260 general_hardware NOTICE: sr_upg_service.lua(151): [SRUpgrade] process SR upgrade, firmware type: HWSR
2026-01-31 15:05:54.816741 general_hardware NOTICE: upgrade_subject.lua(116): [on_upgrade_process] end upgrade HWSR
2026-01-31 15:05:54.918297 firmware_mgmt NOTICE: task_mgmt.lua(418): Update task[Id: 1763126026, StartTime: 2026-01-31T15:05:52+08:00, Progress: 31, State: Running] successfully
2026-01-31 15:05:54.921599 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001040302003182.bin
2026-01-31 15:05:54.923725 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001050302057045.bin
2026-01-31 15:05:54.925671 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001100302023955.bin
2026-01-31 15:05:54.927667 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001030302023954.bin
2026-01-31 15:05:54.929749 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001020302068053.bin
2026-01-31 15:05:54.931815 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001010302044492.bin
2026-01-31 15:05:54.933946 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001040302023953.bin
2026-01-31 15:05:54.936299 firmware_mgmt ERROR: control.lua(192): Upgrade HWSR process failed, ret=-1
2026-01-31 15:05:54.936776 firmware_mgmt NOTICE: info_mgmt.lua(240): info_key(1_HWSR_1763126026) upgrade set_stage: PROCESS → COMMON_FINISH, 95
2026-01-31 15:05:55.008668 firmware_mgmt NOTICE: task_instance.lua(266): wait_msg_result stage=COMMON_FINISH, timeout=7200S, loop=1
2026-01-31 15:05:55.009104 firmware_mgmt NOTICE: info_mgmt.lua(350): info_key(1_HWSR_1763126026) upgrade failed, set_stage: COMMON_FINISH → COMPLETED
2026-01-31 15:05:55.039368 firmware_mgmt NOTICE: task_mgmt.lua(418): Update task[Id: 1763126026, StartTime: 2026-01-31T15:05:52+08:00, Progress: 95, State: Running] successfully
2026-01-31 15:05:55.131800 firmware_mgmt NOTICE: task_mgmt.lua(418): Update task[Id: 1763126026, StartTime: 2026-01-31T15:05:52+08:00, Progress: 95, State: Exception] successfully
2026-01-31 15:05:55.132229 firmware_mgmt NOTICE: task_instance.lua(580): Upgrade 1_HWSR_1763126026 completely, pre_version=
2026-01-31 15:05:55.132767 firmware_mgmt NOTICE: task_instance.lua(586): firmware(info key:1_HWSR_1763126026) do upgrade ret:-1, pre_version:
2026-01-31 15:05:55.194440 firmware_mgmt NOTICE: utils.lua(172): Does not exists the same Id obj
2026-01-31 15:05:55.445019 firmware_mgmt NOTICE: tasks_scheduling.lua(126): upgrade queue is empty, exit the tasks processer
2026-01-31 15:05:55.446333 firmware_mgmt NOTICE: tasks_scheduling.lua(138): stop tasks processer
2026-01-31 15:05:55.448537 firmware_mgmt NOTICE: active_fructl.lua(95): get host type is Singlehost
2026-01-31 15:05:55.449715 firmware_mgmt NOTICE: active_single_host_fructrl.lua(60): active_single_host_fructrl fructrl get power status
2026-01-31 15:05:55.452843 firmware_mgmt NOTICE: state_simple_upgrading.lua(101): simple upgraded, current active mode is:nil, wait restart seconds:360000
2026-01-31 15:05:55.457665 firmware_mgmt NOTICE: init.lua(40): update status to FS_IDLE.
2026-01-31 15:05:55.459747 firmware_mgmt NOTICE: init.lua(79): Upgrading_Flag is false

升级成功的环境日志:
2000-01-28 11:50:59.074784 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001020302068053.bin
2000-01-28 11:50:59.076233 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001050302057045.bin
2000-01-28 11:50:59.077628 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001030302023954.bin
2000-01-28 11:50:59.080508 general_hardware NOTICE: sr_upgrade.lua(277): [SRUpgrade] upgrade start with version 1.18, uid: 000000010402315HFB
2000-01-28 11:50:59.179214 general_hardware NOTICE: sr_upgrade.lua(130): [write_protect_proc] get write protect == 0
2000-01-28 11:50:59.179574 general_hardware NOTICE: sr_upgrade.lua(295): [SRUpgrade]EEPROM write protection turn off
2000-01-28 11:51:00.341493 general_hardware NOTICE: sr_upgrade.lua(236): [SRUpgrade] write header completed, offset: 0, length: 128
2000-01-28 11:51:00.557065 general_hardware NOTICE: sr_upgrade.lua(242): [SRUpgrade] read back header and check it successfully
2000-01-28 11:51:05.446149 general_hardware NOTICE: sr_upgrade.lua(265): [SRUpgrade] write data completed, offset: 3200, length: 4960
2000-01-28 11:51:06.209596 general_hardware NOTICE: sr_upgrade.lua(271): [SRUpgrade] read back data and check it successfully
2000-01-28 11:51:06.210830 general_hardware NOTICE: sr_upgrade.lua(300): [SRUpgrade] write file to eeprom successfully, totally retry 1 times
2000-01-28 11:51:06.306747 general_hardware NOTICE: sr_upgrade.lua(130): [write_protect_proc] get write protect == 1
2000-01-28 11:51:06.307087 general_hardware NOTICE: sr_upgrade.lua(307): [SRUpgrade]EEPROM write protection turn on
2000-01-28 11:51:06.307451 general_hardware NOTICE: sr_upgrade.lua(313): [SRUpgrade] upgrade completed, uid: 000000010402315HFB
2000-01-28 11:51:06.312392 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001010302044492.bin
2000-01-28 11:51:06.313810 general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001100302023955.bin

环境信息

  • 操作系统Ubuntu 24.04

  • 软件版本:OpenUBMC2509

  • 硬件配置:[如 CPU、内存等]

重现步骤

  1. [第一步]

  2. [第二步]

  3. [第三步]

期望结果

[您希望实现什么]

实际结果

[实际发生了什么,包含错误信息]

尝试过的解决方案

[列出您已经尝试过的解决方案及结果]

答案内容由AI生成,仅供参考:

PSR升级失败问题分析与解决方案

问题概述

在使用OpenUBMC2509版本对PSR(Platform Service Record)进行固件升级时,出现升级失败的问题。关键错误日志显示:

general_hardware ERROR: sr_upg_service.lua(139): [SRUpgrade] upgrade path can not get file, file_name:00000001040302003182.bin

该错误在多个文件上重复出现,最终导致 Upgrade HWSR process failed, ret=-1,升级流程异常终止。


问题定位

:magnifying_glass_tilted_left: 关键日志分析

  • 错误代码sr_upg_service.lua(139) 报错提示无法获取升级所需文件。
  • 报错类型upgrade path can not get file 表明系统在尝试读取 .bin 升级数据文件时失败。
  • 升级阶段:此错误发生在 PROCESS 阶段(set_stage: COMMON_PREPARE → PROCESS),即执行实际写入操作前的数据准备阶段。
  • 对比成功日志
    • 成功的升级日志中,尽管也出现类似“不能获取文件”的报错,但后续仍能执行:
      [SRUpgrade] upgrade start with version 1.18
      [SRUpgrade] write header completed
      [SRUpgrade] write data completed
      [SRUpgrade] read back data and check it successfully
      [SRUpgrade] write file to eeprom successfully
      
    • 表明系统具备容错机制,对部分缺失文件可跳过处理,而失败案例则直接阻断流程。

:puzzle_piece: 问题本质

核心原因:固件包中的 SR 文件(HWSR 类型)解包后未能正确生成或定位所需的 .bin 数据文件,sr_upg_service.lua 脚本在查找升级文件路径时失败。

这可能由以下原因导致:

  1. HPM 包制作不规范

    • 使用 BMC Studio 打包时,未正确包含所有二进制数据文件(.bin)。
    • 或者打包路径配置错误,导致运行时无法按预期路径检索文件。
  2. PSR 固件内容不完整或损坏

    • .sr 文件缺失某些组件的固件段。
    • .bin 文件未正确嵌入到 HPM 包中。
  3. OpenUBMC2509 版本兼容性问题

    • 25.09 版本加强了对升级文件完整性的校验逻辑。
    • 在旧版本中可忽略的“文件缺失”警告,在新版本中可能被升级为硬性失败条件
  4. 文件系统或内存权限问题

    • 升级过程中临时目录 /dev/shm/upgrade/... 中文件未被正确提取或访问受限。

解决方案建议

:white_check_mark: 1. 检查并重新打包 HPM 文件

  • 使用 BMC Studio 工具重新构建 HPM 包:
    • 确保所有 .bin 文件已正确导入项目。
    • 核对 filelist.cfg 或打包配置中声明的文件名与实际存在的一致。
    • 参考文档:BMC Studio用户指南
  • 验证打包后的 HPM 是否包含必要的二进制资源:
    hpmtool list PSR_package.hpm
    

:white_check_mark: 2. 验证 PSR 配置完整性

  • 对照硬件实际拓扑,确认 PSR 文件中定义了所有需要升级的模块(如 Riser、背板等)。
  • 若某些模块物理上不存在(如多 CPU 配置只装一个),应在 SR 中移除对应升级项,避免尝试访问空路径。

:warning: 参考:社区中存在因“社区 SR 文件默认配置双 CPU,但实际单 CPU”导致异常的问题(见鲲鹏模组适配约束FAQ)。

:white_check_mark: 3. 升级前清理环境

# 清理临时升级目录
rm -rf /dev/shm/upgrade/*

# 检查 shm 空间是否充足
df -h /dev/shm

空间不足可能导致文件写入失败或截断。

:white_check_mark: 4. 查看完整日志链

使用“一键收集”功能获取完整日志包,并检查:

  • /AppDump/sensor/app.log
  • /AppDump/hwdiscovery/connectors.txt
  • /logDump/general_hardware.log

搜索关键词:

  • sr_upg_service
  • upgrade_subject
  • HWSR
  • FileNotFoundError

:white_check_mark: 5. 尝试命令行升级以获取更详细输出

# 使用 ipmcset 命令执行升级(可结合调试输出)
ipmcset -t firmwareupdate -d updatespt -v /tmp/PSR.hpm

或通过 Redfish API 手动触发,便于捕获精确返回码。


已知相关案例参考

问题 相似点 参考链接
自研RISER的SR升级失败 日志结构一致,报错 check header failed,最终失败 文档块 #3
固件升级机制及常见问题 说明了 HPM 解析、签名验证、路径匹配等机制 搜索结果 #9
鲲鹏模组适配FAQ 强调 SR 配置需与硬件一致,否则功能异常 搜索结果 #6

临时规避建议(适用于测试环境)

若暂时无法解决打包问题,可尝试:

  1. 只升级必需组件:将 HPM 包拆分为独立模块,逐个升级。
  2. 使用已验证的金样 SR 文件:从官方获取经过认证的 PSR 配置模板进行修改。
  3. 回滚至稳定版本测试:确认是否为 OpenUBMC2509 引入的新限制。

总结

项目 内容
故障类型 固件升级失败(PSR/HWSR)
根本原因 升级路径中缺少必要 .bin 文件或路径解析失败
所属系统 OpenUBMC2509 + Ubuntu 24.04
关键日志 sr_upg_service.lua(139): upgrade path can not get file
解决方案优先级 1. 重新打包 HPM
2. 验证 SR 完整性
3. 检查 BMC Studio 配置

:pushpin: 建议行动:请开发者使用正确的 prototype.py 脚本和 manifest.yml 配置重建固件包,并确保所有固件段均被正确包含(参考构建工程文档)。

如有更多上下文或完整日志包,可进一步深入定位。

相关链接:

  1. 25.09版本真机升级失败求助
  2. 手动web升级CPU模组 CPLD BMC上版本显示不对 需重启BMC后显示CPLD才正确
  3. 自研RISER的SR升级失败
  4. 25.09社区版本升级后web页面显示升级成功,但是实际失败
  5. 构建工程 | 文档中心 | openUBMC

uid对不上