OS下电后风扇转速过高问题

BMC版本:基于2509

问题现象:
OS下电后,风扇转速持续保持在80%,噪声过大

定位情况:
OS上电、风扇转速正常(30%)的前提下,执行下电操作,大概两秒钟左右,因为一张PCIE卡的温度触发了异常调速,该调速策略配置如下:


从现象分析,下电后scanner先检测到电源状态,将状态设置为disable,触发异常调速,将风扇转速拉高。然后cooling组件才判断电源状态为off,终止调速,风扇就一直保持着高转速。

由此是否可以引申出下面的情况:当执行OS下电时,部分pcie卡可能先下电,导致温度获取异常,从而触发异常调速,然后cooling组件在下电完成后终止调速,导致风扇会一直保持高转速,产生很大的噪声,并且影响整机功耗。

可能的解决方法
在thermal_mgmt组件里加一个电源状态监控线程,当off时,检查当前转速,不是30%就设置为30%。

但是问题按理说应该从cooling组件解决,希望社区帮忙判断这种情况是不是异常,有什么建议的解决方法。

答案内容由AI生成,仅供参考:

OS下电后风扇转速过高问题分析与解决方案

根据您提供的详细信息,以下是对**OS下电后风扇转速持续保持在80%**这一问题的全面分析和建议。


:magnifying_glass_tilted_left: 问题现象总结

  • BMC版本:基于2509版本
  • 问题表现
    • OS上电时,风扇转速正常(约30%);
    • 执行OS下电操作后,约两秒内风扇转速升至80%,并持续保持高转
    • 噪声显著增大,影响使用体验;
    • 下电完成后,风扇未恢复到预期低速或停转状态。

:puzzle_piece: 根本原因分析

1. 异常调速触发机制

从配置可见,存在一条针对 PCIe 卡(如网卡)温度的 CoolingRequirement 策略配置:

{
  "RequirementID": "52",
  "MonitoringStatus": "<=/Scanner_IBTemp.Status",
  "MonitoringValue": "<=/Scanner_IBTemp.Value",
  "FailedValue": 80,
  "TargetTemperatureCelsius": 80,
  "MaxAllowedTemperatureCelsius": 100
}
  • FailedValue: 80 表示当该温度传感器的 MonitoringStatus 异常(如变为 disable、无效或读取失败)时,将触发“异常调速”,强制风扇以 80% 转速运行。

2. 上下电过程中的时序竞争

结合运行流程分析,问题本质是一个时序竞态问题(Race Condition)

时间轴 事件
t=0 开始执行OS下电
t≈1s PCIe设备或其电源先于主板断电 → 温度传感器 Scanner_IBTemp 断连或状态失效 → Status 变为异常
t≈1.1s Thermal Management模块检测到MonitoringStatus异常 → 触发FailedValue=80%调速
t≈2s Cooling组件最终识别整机电源状态为OFF,但此时高转速已生效且无回退机制 → 风扇维持80%转速

:white_check_mark: 核心问题
异常调速(abnormal cooling)先于电源状态判断生效,且下电后未清除已触发的异常策略

3. :warning: 引申风险

正如您所指出,此问题具有普遍性风险

  • 所有依赖于FailedValue策略的传感器(尤其是PCIe扩展卡、OCP网卡、GPU等外设),在下电过程中由于供电时序不一致,可能先于主系统失联;
  • 导致传感器状态异常 → 触发异常调速 → 风扇升速;
  • Cooling组件虽然后续判断电源关闭,但由于异常调速是“事件驱动型”而非“状态同步型”,缺少主动清零机制
  • 结果:风扇持续运行在非预期高速状态

这不仅带来:

  • 噪声问题;
  • 整机功耗增加;
  • 可能影响风扇寿命;
  • 严重时违背电源管理规范(如下电应尽可能静默)。

:books: 相关知识支撑(来自上下文知识库)

:white_check_mark: FailedValue 的作用定义

FailedValue 是一种故障安全机制(fail-safe mechanism),用于在温度传感器失效、值异常或监测状态异常时,设置一个默认风扇速度,确保系统仍能获得基本散热保障。

  • 默认值一般为 80%100%(视配置而定);
  • 只要 MonitoringStatus 不为正常(normal),即可能触发;
  • 一旦触发,在无更高优先级策略覆盖或无显式清除前,将持续生效。

:white_check_mark: ContBin_H20L5:用于风扇转速异常检测

  • contbin_H20L5 是一种高稳定性防抖机制(连续20次高、5次低),用于重要故障的稳定检测
  • 应用于“风扇转速异常”、“电源在位监控”等关键场景;
  • 表明此类状态变更需经过严格确认,防止误判。

但当前问题是状态误判后的响应滞后,而非防抖精度问题。


:white_check_mark: 是否属于异常?—— 结论:是异常行为

尽管 FailedValue 的设计初衷合理,但在 电源关闭状态下仍维持高风扇转速 明显不符合预期逻辑。

判断依据:

维度 分析
功能需求 下电后系统无负载,无需主动散热 → 风扇应降速或停转
功耗规范 下电后应最小化功耗 → 持续80%转速违反节能原则
用户体验 噪声过大影响机房环境
安全性 虽然保障了散热冗余,但在断电场景中无实际意义

:backhand_index_pointing_right: 结论
这是一个典型的控制策略优先级与执行时序不匹配导致的功能缺陷,应视为需修复的异常。


:light_bulb: 解决方案建议

不应仅在 thermal_mgmt 组件中添加补丁式监控线程,而应在 cooling 组件层面进行根本性优化

推荐方案一:【最优解】在 Cooling 组件中引入 电源状态优先级策略

:white_check_mark: 改进思路:

电源状态是最高优先级的状态信号。当系统进入“OFF”状态时,所有基于传感器异常的调速策略(包括FailedValue)应被强制抑制或清除

实现方式:

-- 伪代码示意
if PowerState == "OFF" then
    -- 忽略所有传感器异常导致的调速请求
    -- 清除当前由 FailedValue 触发的调速状态
    clear_all_abnormal_cooling_requests()
    
    -- 下发固定低速或0%
    set_fan_speed(target = get_shutdown_fan_speed()) -- 如30%或0%
else
    -- 正常处理温度和异常调速逻辑
end

优势:

  • 根本解决竞态问题;
  • 统一由 cooling 模块管理,职责清晰;
  • 可复用于后续类似场景(如待机模式、维护模式)。

推荐方案二:【工程缓解】为 FailedValue 增加电源状态依赖条件

修改 CoolingRequirement 的生效条件,使其仅在电源开启时有效:

{
  "RequirementID": "52",
  "MonitoringStatus": "<=/Scanner_IBTemp.Status",
  "MonitoringValue": "<=/Scanner_IBTemp.Value",
  "FailedValue": 80,
  "TargetTemperatureCelsius": 80,
  "MaxAllowedTemperatureCelsius": 100,
  "EnableCondition": "<=/SystemPower.State |> expr($1 == 'ON')"
}

:light_bulb: 若当前 SR 配置不支持 EnableCondition,建议扩展支持——这对所有外设监控都至关重要。


方案三:补充 thermal_mgmt 监视线程(临时补丁)

如短期内无法修改 cooling 核心逻辑,可采用您的建议作为过渡方案:

# 伪代码
while True:
    sleep(1)
    if get_system_power_state() == "OFF":
        current_speed = get_fan_speed()
        if current_speed > SHUTDOWN_THRESHOLD:  # e.g., >35%
            set_fan_speed(30%)  # 或 0%
            log("Forced fan down after power-off due to potential race condition.")

:warning: 注意事项:

  • 必须确保该线程晚于 cooling 组件完成状态判断
  • 应记录日志用于问题追溯;
  • 仅作为临时方案,长期仍需在 cooling 层面解决。

:hammer_and_wrench: 预防建议:增强系统健壮性

措施 说明
统一上下电时序管理 在 CPLD/FPGA 中协调 PCIe 卡与主板的供电时序,避免传感器过早失效
为外设传感器配置更合理的防抖机制 如采用 contbin_H20L5 等机制,避免瞬时断连误判为永久故障
增加日志跟踪 记录每次异常调速的原因(如Sensor X Status Changed → AbnormalCooling Activated),便于定位
支持 SNMP/BMC API 查询当前主导调速源 帮助运维快速判断为何风扇高速旋转

:white_check_mark: 总结

项目 内容
问题性质 是异常,需修复
根本原因 FailedValue 触发时机早于电源状态判断,且无清除机制
最佳解决路径 cooling 组件中,以 电源状态为最高优先级,主动屏蔽/清除异常调速
临时方案 添加 thermal_mgmt 监视线程强制降速(仅过渡用)
长期建议 所有 CoolingRequirement 增加 EnableCondition,绑定电源状态

:pushpin: 社区建议
建议在 openUBMC 社区中提出此问题,推动以下改进:

  1. CoolingRequirement 中默认支持 EnableCondition 字段;
  2. 定义标准的 “下电风扇行为策略”;
  3. 增加对“传感器异常但系统已下电”场景的日志标识。

这样可防止同类问题在其他项目中重复出现。

如有更多日志或 cooling_device.lua 中的处理逻辑,欢迎进一步提供以深入分析。

相关链接:

  1. 在WEB固件升级界面重启BMC后,风扇一直满转。
  2. 【公示中,下次评议时间7.28】网卡驱动规范v0.1
  3. BMC重启十分钟内下发的转速无法生效
  4. 设置硬盘温度后,风扇不会自动进行调速
  5. CSR硬件监控防抖机制 | 文档中心 | openUBMC
  1. 确认风扇是否支持standby调速,若支持standby调速,下电后是支持调速的,产生异常调速后是会下发异常他调速转速;
  2. 若不支持standby调速,下电后,风扇不会工作;
  3. 当前所有调速策略计算并下发风扇转速值均在thermal_mgmt组件;

我们没有配置ActiveInStandby,这种影响下默认应该是false。

您的意思是现在不存在下面这种问题么:

调速策略变化判断是否有下电,代码均在thermal_mgmt组件

好的 感谢

若问题解决就麻烦闭环一下哈

关于我描述的问题和修改方法 我计划在下周一的sig会议上评审一下,确认问题是否存在,修改方式是否合理。 在得出结论后我就讲问题闭环。

您好,根据会上所说,我明白现在CoolingRequirement对象下ActiveInStandby的作用:它只针对MonitoringValue生效,逻辑应该大概如下:

if powerstatus == off and ActiveInStandby == false then
跳过根据MonitoringValue进行调速
end
还是会根据MonitoringStatus进行异常调速

我这边的问题诉求是:ActiveInStandby不只对MonitoringValue生效,也要对MonitoringStatus生效,即:

if powerstatus == off and ActiveInStandby == false then
跳过根据MonitoringValue进行调速
跳过根据MonitoringStatus进行异常调速
end

希望社区评审一下这个诉求是否合理。

建议先阅读thermal_mgmt代码,查看check_requirement_status函数的作用是什么,以及为什么会产生异常调速而生效

从check_requirement_status方法中看,调速策略不生效时,会删除对应温度点的异常调速,也就是说当前实现是满足我们的诉求的:


这和会上您这边描述的不太对得上,且与问题现象冲突。

不过调速配置相关实现是开源的话,我们就可以自己继续定位处理了,感谢

异常调速是在isValid无效时删除,从逻辑上讲,当前你的调速策略是生效状态

我们当前的isValid是无效的:因为ActiveInStandby是false,且电源状态为off,满足"PowerNotMeet"条件。

昨天会上提出,如果还有问题,提供对应日志。当前问题描述,80%由异常调速产生了

目前原因已定位到。
从如下log可以看出,当一个异常调速变为inValid时,虽然exp_speed_t表中删除了该异常调速,但是在通过PID获取PWM时,因为OS下电,导致PID读值为:66 0 0 6 64 0 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86,导致调速的fan_table为空,进而导致设置pwm失败:cooling_mgmt.lua(1001): Set fan pwm failed, fan_table:nil。

2026-01-13 09:17:13.170805 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.171722 thermal_mgmt NOTICE: cooling_policys.lua(386): [IsValid] Policy(Id:6) change to invalid
2026-01-13 09:17:13.174146 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x6) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.175483 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x7) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.176913 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x8) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.178200 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x14) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.179916 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x15) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.181295 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x16) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.182551 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x17) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.184580 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x18) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.186362 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x1e) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.187690 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x1f) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.189044 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x6207) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.190435 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x6107) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.191729 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x6307) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.193060 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x2a) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.194428 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x28) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.195735 thermal_mgmt DEBUG: cooling_requirememts.lua(576): Update requirement(id:0x29) IsValid to 0, Invalid cause: PowerNotMeet
2026-01-13 09:17:13.197274 thermal_mgmt NOTICE: cooling_pid_intf.lua(267): [cooling] send envtemp config to pid, cmd: 6 0 0 7 4 0
2026-01-13 09:17:13.205426 thermal_mgmt NOTICE: cooling_pid_intf.lua(166): Send target tempconfig to pid, cmd: 6 0 0 1 4 0
2026-01-13 09:17:13.210625 thermal_mgmt NOTICE: abnormal_fan.lua(128): Delete exception speed, key:(AbnormalRequirement:0x6207), policy(1:80 2:80 3:80 4:80 5:80 6:80 7:80 8:80 9:80 10:80 11:80 12:80 )
2026-01-13 09:17:13.212522 thermal_mgmt DEBUG: cooling_pid_intf.lua(342): [cooling] tell alarm speed to pid, cmd: 6 0 0 5 4 0
2026-01-13 09:17:13.222139 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.223141 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.225570 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.226567 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.231488 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.232372 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.239367 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.240512 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.241581 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.243983 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.248508 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.251356 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.252322 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.254915 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.256052 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.259035 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.260244 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.262790 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.263820 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.266572 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.267520 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.270176 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.271200 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.274301 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.275227 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.277720 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.279007 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.281997 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.282932 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.285693 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.286629 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.289421 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.290680 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.293122 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.294092 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.296848 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.297845 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.309356 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.310441 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.313276 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.314288 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.318548 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.319846 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.322636 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.324999 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_3_010103], interface[bmc.kepler.Systems.Fan], name[RearSpeed], value[18960]->[19020], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in function
2026-01-13 09:17:13.326609 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.328115 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_3_010103], interface[bmc.kepler.Systems.Fan], name[changed], value[nil]->[[“RearSpeed”]], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in func
2026-01-13 09:17:13.329878 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_3_010103], interface[bmc.kepler.Systems.Fan], name[in_emit_change], value[nil]->[true], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in functi
2026-01-13 09:17:13.348323 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_3_010103], interface[bmc.kepler.Systems.Fan], name[in_emit_change], value[nil]->[false], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in funct
2026-01-13 09:17:13.350082 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_3_010103], interface[bmc.kepler.Systems.Fan], name[changed], value[nil]->[false], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in function 'LO
2026-01-13 09:17:13.351354 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.356708 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.357712 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.363778 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.364774 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.373329 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.376701 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.390752 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.391791 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.394574 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.395676 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.398596 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.401741 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.408306 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.409234 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.417125 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_9_010103], interface[bmc.kepler.Systems.Fan], name[RearSpeed], value[18960]->[19020], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in function
2026-01-13 09:17:13.418723 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.420238 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_9_010103], interface[bmc.kepler.Systems.Fan], name[changed], value[nil]->[[“RearSpeed”]], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in func
2026-01-13 09:17:13.425845 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_9_010103], interface[bmc.kepler.Systems.Fan], name[in_emit_change], value[nil]->[true], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in functi
2026-01-13 09:17:13.428964 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_9_010103], interface[bmc.kepler.Systems.Fan], name[in_emit_change], value[nil]->[false], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in funct
2026-01-13 09:17:13.430635 thermal_mgmt DEBUG: object.lua(83): set property traceback: sender[bmc.kepler.thermal_mgmt], path[/bmc/kepler/Systems/1/Thermal/Fans/Fan_9_010103], interface[bmc.kepler.Systems.Fan], name[changed], value[nil]->[false], traceback[stack traceback: ./opt/bmc/libmc/lualib/traceback/object.lua:49: in function <./opt/bmc/libmc/lualib/traceback/object.lua:45> [C]: in function ‘pcall’ ./opt/bmc/libmc/lualib/mc/logging.lua:109: in function ‘format_msg’ ./opt/bmc/libmc/lualib/mc/logging.lua:253: in function 'LO
2026-01-13 09:17:13.431647 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.442014 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.443318 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.446117 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.447379 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.453266 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.454345 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.461403 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.462357 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.469727 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.470928 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x618899a99840]
2026-01-13 09:17:13.473890 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x618899a99840] exit
2026-01-13 09:17:13.474885 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.481232 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.482262 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.483507 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.486048 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.490363 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.495061 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c36c3c8]
2026-01-13 09:17:13.497622 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c99bdb0]
2026-01-13 09:17:13.503573 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c99bdb0] exit
2026-01-13 09:17:13.504544 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c36c3c8] exit
2026-01-13 09:17:13.505516 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c76fee0] exit
2026-01-13 09:17:13.510439 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c76fee0]
2026-01-13 09:17:13.511613 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c76fee0] exit
2026-01-13 09:17:13.512750 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c76fee0]
2026-01-13 09:17:13.514038 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c76fee0] exit
2026-01-13 09:17:13.515073 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c76fee0]
2026-01-13 09:17:13.516097 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c76fee0] exit
2026-01-13 09:17:13.553520 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_10_010102): (id: 9, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.604121 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_11_010102): (id: 10, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.618509 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_12_010102): (id: 11, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.631456 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_1_010102): (id: 0, presence: 1, protocol: 3, media type: 1, temperature: 255)
2026-01-13 09:17:13.644883 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_2_010102): (id: 1, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.667231 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_3_010102): (id: 2, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.692147 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_4_010102): (id: 3, presence: 1, protocol: 3, media type: 0, temperature: 255)
2026-01-13 09:17:13.726557 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_5_010102): (id: 4, presence: 1, protocol: 3, media type: 0, temperature: 255)
2026-01-13 09:17:13.743769 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_6_010102): (id: 5, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.758823 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_7_010102): (id: 6, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.775722 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_8_010102): (id: 7, presence: 1, protocol: 3, media type: 1, temperature: 255)
2026-01-13 09:17:13.793797 thermal_mgmt DEBUG: disks_data_keeping.lua(219): Get disk(path:/bmc/kepler/Systems/1/Storage/Drives/Drive_9_010102): (id: 8, presence: 0, protocol: 255, media type: 255, temperature: 255)
2026-01-13 09:17:13.796504 thermal_mgmt DEBUG: disks_data_keeping.lua(150): hdd maxTemp(0), ssd maxTemp(0), is_temp_avail(false), invalid_temp_num(4), all ssd maxTemp(0)
2026-01-13 09:17:13.797364 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c76fee0]
2026-01-13 09:17:13.809858 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c76fee0] exit
2026-01-13 09:17:13.812968 thermal_mgmt INFO: cooling_mgmt.lua(859): [Cooling] PID readinfo: 66 0 0 6 64 0 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86 255 255 0 0 86
2026-01-13 09:17:13.813788 thermal_mgmt DEBUG: cooling_mgmt.lua(676): No fan in fan table.
2026-01-13 09:17:13.814551 thermal_mgmt DEBUG: cooling_mgmt.lua(693): No pump in pump table.
2026-01-13 09:17:13.817667 thermal_mgmt DEBUG: cooling_mgmt.lua(780): policy() after the pwm is updated with manual speed.
2026-01-13 09:17:13.818385 thermal_mgmt DEBUG: cooling_mgmt.lua(964): Get pump ctrl mode property failed
2026-01-13 09:17:13.821627 thermal_mgmt DEBUG: app_preloader.lua(95): create co[thread: 0x61889c76fee0]
2026-01-13 09:17:13.824240 thermal_mgmt DEBUG: app_preloader.lua(206): …/apps/thermal_mgmt/lualib/basic_cooling/cooling_mgmt.lua:910: app(thermal_mgmt/service/main) fork func, count=1, delay=0
2026-01-13 09:17:13.827568 thermal_mgmt DEBUG: cooling_mgmt.lua(1001): Set fan pwm failed, fan_table:nil, fan_config:table: 0x6188991a6b60
2026-01-13 09:17:13.828330 thermal_mgmt DEBUG: app_preloader.lua(58): co[thread: 0x61889c76fee0] exit

读出来是有转速,但是无法下发

经确认,在风扇支持standby的情况下,需要至少有一个CoolingRequirement将ActiveInStandby设置为true,用来给PID提供温度点,否则就会出现PID返回所有fanid为255的情况,进而导致没有转速可以下发。

从现象来分析,是因为下电命令发出后,Scanner的Status先变化,此时后电状态还是上电,触发了异常调速80%,此时再下电,thermal_mgmt将异常调速从表中删除,再计算器件正常standby调速(此时无任何器件需要standby)+器件异常调速+风扇异常调速,没有任何调速生效。thermal拦截了调速下发,那么整机就保持了上一次下发的转速

1 个赞

那都触发了异常调速,没有出现卡温度获取失败的事件吗

以300I A2卡为例
触发异常调速的温度读值为32768,而在事件那里,所有大于255的值都转换为了40