风扇转速不稳定

问题描述

参与调速的温度是稳定的,风扇转速不稳定。
1 FanType未识别到,风扇HardwarePWM忽大忽小,几秒钟变化一次
2 修改自动、手动调速模式现象依旧
3 删除所有CoolingArea后现象依旧
4 Fan对象的RearSpeed、FrontSpeed可以读到值,SMC命令字使用正常,但是日志有报错

2026-04-01 08:32:48.100455 thermal_mgmt ERROR: synchronization.lua(187): set value failed when synchronizing property Fan_1_010103.RearSpeed, err:./opt/bmc/libmc/lualib/sd_bus/object.lua:832: ./opt/bmc/libmc/lualib/mc/signal.lua:289: emit signal: nesting is not allowed [repeated 30 times in 304s from 2026-04-01 08:27:43.837126 to 2026-04-01 08:32:48.100455]
2026-04-01 08:32:49.870611 thermal_mgmt ERROR: synchronization.lua(187): set value failed when synchronizing property Fan_2_010103.RearSpeed, err:./opt/bmc/libmc/lualib/sd_bus/object.lua:832: ./opt/bmc/libmc/lualib/mc/signal.lua:289: emit signal: nesting is not allowed [repeated 24 times in 308s from 2026-04-01 08:27:42.212551 to 2026-04-01 08:32:49.870611]

5 cooling_control.log 日志在设备启动后只有一次输出

2026-04-01 08:56:01.743519 DevID:0x101, Type:0, ReqId:20, T:43, TarT:90, TarPWM:55, EnvTNum:1, EnvReqId:10, EnvT:25, EnvPWM:68, PIDPWM:68, MaxAllowTemp:98, TarSensorName:, EnvSensorName:Inlet Temp, FrontSpeed:0, RearSpeed:0
2026-04-01 08:56:01.743783 DevID:0x202, Type:0, ReqId:20, T:43, TarT:90, TarPWM:55, EnvTNum:1, EnvReqId:10, EnvT:25, EnvPWM:68, PIDPWM:68, MaxAllowTemp:98, TarSensorName:, EnvSensorName:Inlet Temp, FrontSpeed:0, RearSpeed:0
2026-04-01 08:56:01.744317 DevID:0x303, Type:0, ReqId:20, T:43, TarT:90, TarPWM:55, EnvTNum:1, EnvReqId:10, EnvT:25, EnvPWM:68, PIDPWM:68, MaxAllowTemp:98, TarSensorName:, EnvSensorName:Inlet Temp, FrontSpeed:0, RearSpeed:0
2026-04-01 08:56:01.744786 DevID:0x404, Type:0, ReqId:20, T:43, TarT:90, TarPWM:55, EnvTNum:1, EnvReqId:10, EnvT:25, EnvPWM:68, PIDPWM:68, MaxAllowTemp:98, TarSensorName:, EnvSensorName:Inlet Temp, FrontSpeed:0, RearSpeed:0

6 PWM设置日志中生效的PWM是稳定的
2026-04-01 09:29:26.784999 thermal_mgmt INFO: cooling_mgmt.lua(1008): Set all fan pwm, pwm(80 80 80 80 80) [repeated 143 times in 302s from 2026-04-01 09:24:24.020180 to 2026-04-01 09:29:26.784999]

环境信息

  • 软件版本:OpenUBMC2512 + 高可用

null_12344321_20260401-1048.zip (8.2 MB)
麻烦分析下

答案内容由AI生成,仅供参考:

七、结论

项目 内容
问题类型 风扇控制异常
直接表现 转速不稳、PWM波动、日志报错
根本原因 风扇类型识别失败,导致无法进入正常 PID 控制流程,系统退化至不稳定控制模式
关键证据 cooling_control.log 仅输出一次、HardwarePWM 波动、FanType未识别RearSpeed同步失败
解决方案 检查风扇型号支持性、确保 IdentifySpeedLevel=35% 正确下发、修复识别流程、必要时手动注入类型验证

:white_check_mark: 建议优先处理风扇型号识别流程,恢复 PID 控制主线程运行,即可解决当前全部异常现象。

点击此处查看详细分析解答

风扇转速不稳定问题分析与定位

根据您提供的问题描述、日志信息及系统上下文,本文将对 “风扇转速不稳定” 问题进行深入分析,定位根本原因,并提出可能的解决方案。


一、问题现象总结

  • 温度稳定但风扇转速波动:参与调速的环境温度是稳定的,但风扇转速不稳定。
  • HardwarePWM 忽大忽小:风扇的 HardwarePWM 值在短时间内频繁波动(几秒一次)。
  • FanType 未识别:系统未能成功识别风扇型号。
  • 模式切换无效:无论自动或手动调速模式,问题依旧存在。
  • 删除 CoolingArea 无效:移除所有冷却区域配置后问题仍存在。
  • 日志报错:出现 emit signal: nesting is not allowed 错误,与 Fan_1_010103.RearSpeedFan_2_010103.RearSpeed 属性同步失败有关。
  • cooling_control.log 输出异常:仅在启动时输出一次,之后无更新。
  • PWM 设置日志显示稳定Set all fan pwm 日志中下发的 PWM 值稳定为 (80, 80, 80, 80, 80)
  • 软件版本:OpenUBMC2512。

二、关键日志分析

1. HardwarePWM 波动问题

现象HardwarePWM 忽大忽小,但上层下发的 PWM 指令是稳定的(均为80)。

从知识库信息可知:

  • HardwarePWM 是实际写入硬件寄存器的 PWM 占空比值,直接影响风扇物理转速。
  • 存在实体关系表明:Fan_2_010103HardwarePWM波动剧烈,导致风扇速度不稳定。
  • 该波动可能由于底层驱动、信号竞争、或控制逻辑异常引起。

尽管上层应用(如 cooling_mgmt.lua)持续下发稳定的 PWM 值(80),但 HardwarePWM 实际值不稳定,说明控制链路中间环节存在问题。


2. FanType 未识别导致控制异常

现象:FanType 未识别,HardwarePWM 不受控。

根据知识库和文档:

  • 风扇类型识别过程依赖于设置 IdentifySpeedLevel(通常为 35%)下的 PWM,然后采集转速判断所属类型。
  • 在识别完成前,系统可能使用默认或试探性控制策略。
  • 若识别失败,系统可能进入非预期控制路径,例如使用默认 MaxSupportedPWM 或不断尝试重试,导致 HardwarePWM 频繁跳变。

此外,日志显示 Fan_1_010103Fan_2_010103 在同步 RearSpeed 时失败,说明其状态未被正确更新,进一步佐证了 风扇未完成识别或注册流程异常


3. synchronization.lua 同步失败报错

报错信息:

set value failed when synchronizing property Fan_1_010103.RearSpeed, err: ... emit signal: nesting is not allowed

该错误表明:

  • BMC 中使用 D-Bus 机制进行对象属性同步。
  • emit signal: nesting is not allowed信号嵌套触发 的限制,即在一个信号处理过程中尝试再次发出信号,被系统禁止。
  • 常见于多线程或异步回调竞争访问资源树属性(如 RearSpeed)。

这暗示:

  • Fan_1_010103Fan_2_010103 的转速数据在更新时存在并发写入冲突。
  • 可能源于多个模块(如 thermal_mgmt, fan_service.lua)同时尝试更新风扇状态。

4. cooling_control.log 只输出一次

现象:设备启动后仅输出一次 cooling_control.log,此后无日志。

该日志记录的是 PID 控制器的核心调速决策过程(包含 T, TarT, TarPWM, PIDPWM 等字段)。其 只输出一次,说明:

  • PID 控制主线程可能 未能正常运行或卡死
  • 风扇控制停留在初始化阶段(例如下发固定 PWM),未进入动态调速流程。
  • 结合 FanType 识别失败的情况,可能导致 BasicCoolingPID 模块拒绝接管控制权。

5. 上层 PWM 下发稳定但底层响应异常

日志:

Set all fan pwm, pwm(80 80 80 80 80) [repeated 143 times...]

说明:

  • 应用层周期性调用类似 SetFanPWMSetFansPWM 方法,稳定设置 80%。
  • 但这些值是否真正应用到硬件,取决于 SetFanExpectedPWMSetFanHardwarePWM → 写入寄存器 的完整链路。

可能存在以下断点:

  • ExpectedPwmHardwarePWM 不一致(如预期为80,硬件写入异常)。
  • SetFanHardwarePWM 被其他进程干扰,覆盖了原值。
  • 寄存器访问存在竞争或失败。

三、根本原因分析

:white_check_mark: 综合判断:

根本原因是 风扇型号识别失败(FanType未识别),导致系统无法进入正常的 PID 调速流程,进而退回到临时或默认控制逻辑,引发 HardwarePWM 频繁波动和同步机制异常。

支持证据:

证据来源 内容
Knowledge Graph Fan_2_010103HardwarePWM 明确标注:“波动剧烈,导致不稳定”
Document Chunk 3 风扇识别需下发 IdentifySpeedLevel=35%,采集转速匹配识别区间
Document Chunk 1 故障现象直接指出:“FanType未识别到,风扇HardwarePWM忽大忽小”
KG Relationship HardwarePWMIdentifySpeedLevel 具有绑定关系,识别阶段应反映该值
Log cooling_control.log 无后续输出 → PID 未正常运行

四、可能的根本原因路径

graph TD
A[风扇上电] --> B{FanType识别成功?}
B -- 否 --> C[使用默认/试探PWM]
C --> D[HardwarePWM频繁变化]
D --> E[触发contbin_H20L5检测异常]
E --> F[RearSpeed/FrontSpeed更新失败]
F --> G[信号同步报: nesting not allowed]
G --> H[PWM虽下发稳定, 实际转速波动]
B -- 是 --> I[进入PID调速流程]
I --> J[cooling_control持续输出日志]
J --> K[转速稳定可控]

当前系统处于 分支 C → D → E → F → G 路径中。


五、解决方案建议

1. 检查风扇硬件连接与型号匹配性

  • 确保风扇已正确插入,物理连接稳定。
  • 确认风扇型号是否在系统支持列表内(即 FanType 对象中有对应配置)。
  • 检查风扇板 EEPROM 是否可读,是否包含有效的 ID 信息。

2. 查看风扇识别日志(fan_service.lua 相关)

  • 检查是否存在类似日志:
    fan identify: applying IdentifySpeedLevel=35%, measuring speed...
    
  • 查看是否采集到有效转速,以及是否落入任意 FanTypeIdentifyRangeLow ~ IdentifyRangeHigh 区间。

3. 验证 IdentifySpeedLevel 是否生效

  • 查询 Fan 对象的 IdentifySpeedLevel 是否为 35
  • 检查 HardwarePWM 在识别阶段是否等于 MaxSupportedPWM * 0.35

若未达到,说明 set_pwmHardwarePWM 链路中断。

4. 排查 contbin_H20L5 异常检测机制

  • contbin_H20L5 是专用于 “风扇转速异常” 检测的防抖模块(需20次高触发+5次低触发),用于稳定识别严重故障。
  • 若此模块持续触发,可能导致强制降级控制或异常模式运行。
  • 建议使用如下命令查看状态:
    busctl get-property xyz.openbmc_project.FanDetection /path/to/contbin_H20L5 Status
    

5. 修复信号嵌套问题(synchronization.lua)

  • 该问题是 结果而非原因,由底层状态不一致引发。
  • 建议不要直接修改同步代码,而是:
    • 确保风扇识别完成后,RearSpeed 等属性才能被更新。
    • 使用防抖机制(如中值滤波)避免频繁更新。
    • 检查是否存在多个线程同时调用 update_fan_status

6. 临时验证:手动注入风扇类型

为验证是否是 “识别失败” 导致的问题,可尝试 手动设置风扇类型,跳过自动识别:

busctl set-property xyz.openbmc_project.FanConfig /xyz/openbmc_project/Fan/1 \
    xyz.openbmc_project.Configuration.Fan Type s "FanType_A"

观察:

  • HardwarePWM 是否仍波动;
  • cooling_control.log 是否恢复输出;
  • RearSpeed 是否能正常更新。

六、推荐排查命令

功能 命令
查看风扇类型识别状态 busctl introspect xyz.openbmc_project.Fan /xyz/openbmc_project/Fan/1
查看 HardwarePWM 实际值 busctl get-property xyz.openbmc_project.HWMon /fan1 hwmon fan1_pwm
查看 ExpectedPwm busctl get-property xyz.openbmc_project.FanCtrl /fan1 ExpectedPwm
查看 IdentifySpeedLevel busctl get-property xyz.openbmc_project.Fan /fan1 IdentifySpeedLevel
查看当前风扇状态 busctl call xyz.openbmc_project.FanSnapshot /xyz/openbmc_project/FanSnapshot GetActivatedPolicyFactors

七、结论

项目 内容
问题类型 风扇控制异常
直接表现 转速不稳、PWM波动、日志报错
根本原因 风扇类型识别失败,导致无法进入正常 PID 控制流程,系统退化至不稳定控制模式
关键证据 cooling_control.log 仅输出一次、HardwarePWM 波动、FanType未识别RearSpeed同步失败
解决方案 检查风扇型号支持性、确保 IdentifySpeedLevel=35% 正确下发、修复识别流程、必要时手动注入类型验证

:white_check_mark: 建议优先处理风扇型号识别流程,恢复 PID 控制主线程运行,即可解决当前全部异常现象。

请提供相应日志

null_20260401-1407.zip (8.8 MB)
麻烦分析下

在fan_service:set_fans_pwm(l_pwm),fan_service:set_fan_pwm(system_id, fan_id, pwm_percent)、fan_obj_manager:set_pwm(pwm_percent)这三个函数加一点日志看看,打印下下发转速,是由哪里导致的风扇转速变化。日志级别用notice就行

在fan_obj_manager:set_pwm和fan_obj_manager:set_fan_hardware_pwm加的notice日志都没有打印,应该是风扇型号一直未识别到,fan_service:set_fans_pwm中无法调到fan_obj:set_fan_hardware_pwm进行PWM生效。
使用tracechip查看SMC对象又能够看到大量的PWM设置操作。
开了debug日志打印如下

2026-04-08 18:54:43.130943 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 20 20 41 [repeated 63 times in 395s from 2026-04-08 18:48:08.349937 to 2026-04-08 18:54:43.130943]
2026-04-08 18:54:45.176027 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 30 30 40 [repeated 293 times in 301s from 2026-04-08 18:49:44.472495 to 2026-04-08 18:54:45.176027]
2026-04-08 18:54:51.349273 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 31 31 43 [repeated 294 times in 301s from 2026-04-08 18:49:49.586681 to 2026-04-08 18:54:51.349273]
2026-04-08 18:54:59.720897 thermal_mgmt DEBUG: disks_data_keeping.lua(150): hdd maxTemp(0), ssd maxTemp(0), is_temp_avail(true), invalid_temp_num(0), all ssd maxTemp(0) [repeated 20 times in 305s from 2026-04-08 18:49:54.690729 to 2026-04-08 18:54:59.720897]
2026-04-08 18:55:00.563538 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 30 30 41 [repeated 130 times in 518s from 2026-04-08 18:46:23.093860 to 2026-04-08 18:55:00.563538]
2026-04-08 18:55:03.912287 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:93 [repeated 2 times in 301s from 2026-04-08 18:50:02.891993 to 2026-04-08 18:55:03.912287]
2026-04-08 18:55:11.793743 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 21 21 43 [repeated 116 times in 529s from 2026-04-08 18:46:23.085185 to 2026-04-08 18:55:11.793743]
2026-04-08 18:55:43.447327 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 31 31 44 [repeated 186 times in 560s from 2026-04-08 18:46:23.083333 to 2026-04-08 18:55:43.447327]
2026-04-08 18:55:50.678936 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 10 10 30 [repeated 74 times in 556s from 2026-04-08 18:46:35.347665 to 2026-04-08 18:55:50.678936]
2026-04-08 18:55:53.753437 thermal_mgmt INFO: cooling_mgmt.lua(860): [Cooling] PID readinfo: 26 0 0 6 24 0 1 1 0 0 94 2 2 0 0 94 3 3 0 0 94 4 4 0 0 94 [repeated 138 times in 573s from 2026-04-08 18:46:21.071138 to 2026-04-08 18:55:53.753437]
2026-04-08 18:56:01.907024 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 20 20 42 [repeated 87 times in 577s from 2026-04-08 18:46:25.137753 to 2026-04-08 18:56:01.907024]
2026-04-08 18:56:11.726408 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:49 [repeated 963 times in 301s from 2026-04-08 18:51:10.535481 to 2026-04-08 18:56:11.726408]
2026-04-08 18:56:12.596831 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:99 [repeated 546 times in 301s from 2026-04-08 18:51:11.598173 to 2026-04-08 18:56:12.596831]
2026-04-08 18:56:12.623647 thermal_mgmt DEBUG: cooling_requirememts.lua(571): Update requirement(id:0x1f) IsValid to 1, Invalid cause: None [repeated 290 times in 301s from 2026-04-08 18:51:12.272733 to 2026-04-08 18:56:12.623647]
2026-04-08 18:56:12.624422 thermal_mgmt DEBUG: cooling_requirememts.lua(571): Update requirement(id:0x15) IsValid to 1, Invalid cause: None [repeated 290 times in 301s from 2026-04-08 18:51:12.273756 to 2026-04-08 18:56:12.624422]
2026-04-08 18:56:12.625061 thermal_mgmt DEBUG: cooling_requirememts.lua(571): Update requirement(id:0x14) IsValid to 1, Invalid cause: None [repeated 290 times in 301s from 2026-04-08 18:51:12.275408 to 2026-04-08 18:56:12.625061]
2026-04-08 18:56:12.625670 thermal_mgmt DEBUG: cooling_requirememts.lua(571): Update requirement(id:0x1e) IsValid to 1, Invalid cause: None [repeated 290 times in 301s from 2026-04-08 18:51:12.276147 to 2026-04-08 18:56:12.625670]
2026-04-08 18:56:12.627140 thermal_mgmt DEBUG: cooling_requirememts.lua(571): Update requirement(id:0xa) IsValid to 1, Invalid cause: None [repeated 290 times in 301s from 2026-04-08 18:51:12.278503 to 2026-04-08 18:56:12.627140]
2026-04-08 18:56:18.797440 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:37 [repeated 56 times in 304s from 2026-04-08 18:51:14.801820 to 2026-04-08 18:56:18.797440]
2026-04-08 18:56:24.391641 thermal_mgmt DEBUG: exception_policy.lua(174): Requirement exp check: old status:255, new status:0 [repeated 1000 times in 204s from 2026-04-08 18:52:59.771913 to 2026-04-08 18:56:24.391641]
2026-04-08 18:56:25.409652 thermal_mgmt DEBUG: cooling_pid_intf.lua(342): [cooling] tell alarm speed to pid, cmd: 6 0 0 5 4 0 [repeated 295 times in 301s from 2026-04-08 18:51:23.575547 to 2026-04-08 18:56:25.409652]
2026-04-08 18:56:25.915616 thermal_mgmt DEBUG: cooling_mgmt.lua(237): Current power state:ON, power is changed:false [repeated 291 times in 301s from 2026-04-08 18:51:24.594507 to 2026-04-08 18:56:25.915616]
2026-04-08 18:56:25.917075 thermal_mgmt DEBUG: cooling_mgmt.lua(189): SmartCooling mode has not changed, pre(EnergySaving), cur(EnergySaving) [repeated 291 times in 301s from 2026-04-08 18:51:24.595828 to 2026-04-08 18:56:25.917075]
2026-04-08 18:56:36.133719 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:133 [repeated 4 times in 489s from 2026-04-08 18:48:27.001311 to 2026-04-08 18:56:36.133719]
2026-04-08 18:56:36.912950 thermal_mgmt INFO: fan_object.lua(218): Fan1 is not present! [repeated 6 times in 360s from 2026-04-08 18:50:36.891478 to 2026-04-08 18:56:36.912950]
2026-04-08 18:56:36.913874 thermal_mgmt INFO: fan_object.lua(218): Fan3 is not present! [repeated 6 times in 360s from 2026-04-08 18:50:36.892904 to 2026-04-08 18:56:36.913874]
2026-04-08 18:56:36.914764 thermal_mgmt INFO: fan_object.lua(218): Fan2 is not present! [repeated 6 times in 360s from 2026-04-08 18:50:36.894318 to 2026-04-08 18:56:36.914764]
2026-04-08 18:56:40.785629 thermal_mgmt DEBUG: cooling_mgmt.lua(694): No pump in pump table. [repeated 147 times in 301s from 2026-04-08 18:51:40.018853 to 2026-04-08 18:56:40.785629]
2026-04-08 18:56:40.786220 thermal_mgmt DEBUG: cooling_mgmt.lua(781): policy() after the pwm is updated with manual speed. [repeated 147 times in 301s from 2026-04-08 18:51:40.019762 to 2026-04-08 18:56:40.786220]
2026-04-08 18:56:40.786747 thermal_mgmt DEBUG: cooling_mgmt.lua(965): Get pump ctrl mode property failed [repeated 147 times in 301s from 2026-04-08 18:51:40.020450 to 2026-04-08 18:56:40.786747]
2026-04-08 18:56:40.790805 thermal_mgmt DEBUG: fan_service.lua(81): Start fan group speed adjust. [repeated 147 times in 301s from 2026-04-08 18:51:40.023650 to 2026-04-08 18:56:40.790805]
2026-04-08 18:56:58.428875 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:203 [repeated 15 times in 302s from 2026-04-08 18:51:56.470825 to 2026-04-08 18:56:58.428875]
2026-04-08 18:57:09.306268 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:121 [repeated 18 times in 324s from 2026-04-08 18:51:45.053355 to 2026-04-08 18:57:09.306268]
2026-04-08 18:57:09.418786 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:117 [repeated 8 times in 301s from 2026-04-08 18:52:08.149931 to 2026-04-08 18:57:09.418786]
2026-04-08 18:57:22.370202 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:147 [repeated 15 times in 347s from 2026-04-08 18:51:34.627007 to 2026-04-08 18:57:22.370202]
2026-04-08 18:57:25.750493 thermal_mgmt NOTICE: fan_service.lua(142): [HTY] fan 4 80 [repeated 147 times in 301s from 2026-04-08 18:52:25.091038 to 2026-04-08 18:57:25.750493]
2026-04-08 18:57:29.810639 thermal_mgmt NOTICE: fan_service.lua(142): [HTY] fan 1 80 [repeated 147 times in 301s from 2026-04-08 18:52:29.171399 to 2026-04-08 18:57:29.810639]
2026-04-08 18:57:29.812522 thermal_mgmt NOTICE: fan_service.lua(142): [HTY] fan 2 80 [repeated 147 times in 301s from 2026-04-08 18:52:29.187865 to 2026-04-08 18:57:29.812522]
2026-04-08 18:57:29.814499 thermal_mgmt NOTICE: fan_service.lua(142): [HTY] fan 3 80 [repeated 147 times in 301s from 2026-04-08 18:52:29.191415 to 2026-04-08 18:57:29.814499]
2026-04-08 18:57:33.642182 thermal_mgmt ERROR: synchronization.lua(187): set value failed when synchronizing property Fan_4_010103.RearSpeed, err:./opt/bmc/libmc/lualib/sd_bus/object.lua:832: ./opt/bmc/libmc/lualib/mc/signal.lua:289: emit signal: nesting is not allowed [repeated 5 times in 308s from 2026-04-08 18:52:26.010044 to 2026-04-08 18:57:33.642182]
2026-04-08 18:57:45.329218 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:173 [repeated 15 times in 323s from 2026-04-08 18:52:22.501661 to 2026-04-08 18:57:45.329218]
2026-04-08 18:57:54.386945 thermal_mgmt INFO: cooling_mgmt.lua(1008): Set all fan pwm, pwm(80 80 80 80) [repeated 148 times in 302s from 2026-04-08 18:52:51.622124 to 2026-04-08 18:57:54.386945]
2026-04-08 18:57:54.387257 thermal_mgmt NOTICE: fan_service.lua(119): [HTY] task_count 0 [repeated 148 times in 302s from 2026-04-08 18:52:51.622537 to 2026-04-08 18:57:54.387257]
2026-04-08 18:57:54.388795 thermal_mgmt NOTICE: fan_service.lua(121): [HTY] task_count 0 [repeated 148 times in 302s from 2026-04-08 18:52:51.624139 to 2026-04-08 18:57:54.388795]
2026-04-08 18:57:54.389232 thermal_mgmt NOTICE: fan_service.lua(133): [HTY] task_count 0 [repeated 148 times in 302s from 2026-04-08 18:52:51.624627 to 2026-04-08 18:57:54.389232]
2026-04-08 18:57:54.389513 thermal_mgmt NOTICE: fan_service.lua(137): [HTY] pwm_percent 0 [repeated 148 times in 302s from 2026-04-08 18:52:51.624923 to 2026-04-08 18:57:54.389513]
2026-04-08 18:57:54.397130 thermal_mgmt NOTICE: fan_service.lua(152): [HTY] task_count 0 [repeated 148 times in 302s from 2026-04-08 18:52:51.636576 to 2026-04-08 18:57:54.397130]
2026-04-08 18:58:24.066044 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:29 [repeated 10 times in 346s from 2026-04-08 18:52:38.168878 to 2026-04-08 18:58:24.066044]
2026-04-08 18:58:37.408028 thermal_mgmt DEBUG: fan_object.lua(911): Detect fan(4) HardwarePWM:217 [repeated 9 times in 418s from 2026-04-08 18:51:38.757064 to 2026-04-08 18:58:37.408028]
2026-04-08 18:59:06.150398 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 10 10 29 [repeated 214 times in 301s from 2026-04-08 18:54:05.354803 to 2026-04-08 18:59:06.150398]
2026-04-08 18:59:19.537176 thermal_mgmt DEBUG: cooling_pid_intf.lua(320): [cooling] send temp to pid, cmd: 11 0 0 3 9 0 0 0 21 21 42 [repeated 134 times in 301s from 2026-04-08 18:54:18.609856 to 2026-04-08 18:59:19.537176]

fan_service:set_fans_pwm会尝试走的批量接口下发,这个方法有没有打印,风扇识别会走fan_obj_manager:set_pwm(pwm_percent),看看是不是这两处导致的转速变化,截图的时候把添加日志的地方一起截上,开notice日志就行,用不限频的打印

打印位置



日志打印

~ # tail -f /var/log/app.log | grep HTY
2026-04-09 15:10:47.838270 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 2 80 0 false
2026-04-09 15:10:47.840803 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 3 80 1 false
2026-04-09 15:10:47.843209 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 4 80 0 false
2026-04-09 15:10:47.843911 thermal_mgmt NOTICE: fan_service.lua(154): [HTY] task_count 0
2026-04-09 15:10:49.855224 thermal_mgmt NOTICE: fan_service.lua(135): [HTY] pwm_percent 0
2026-04-09 15:10:49.857857 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 1 80 0 false
2026-04-09 15:10:49.861105 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 2 80 0 false
2026-04-09 15:10:49.864412 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 3 80 1 false
2026-04-09 15:10:49.867080 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 4 80 0 false
2026-04-09 15:10:49.867997 thermal_mgmt NOTICE: fan_service.lua(154): [HTY] task_count 0
2026-04-09 15:10:51.871526 thermal_mgmt NOTICE: fan_service.lua(135): [HTY] pwm_percent 0
2026-04-09 15:10:51.874127 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 1 80 0 false
2026-04-09 15:10:51.876897 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 2 80 0 false
2026-04-09 15:10:51.879829 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 3 80 1 false
2026-04-09 15:10:51.882512 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 4 80 0 false
2026-04-09 15:10:51.883137 thermal_mgmt NOTICE: fan_service.lua(154): [HTY] task_count 0
2026-04-09 15:10:53.887401 thermal_mgmt NOTICE: fan_service.lua(135): [HTY] pwm_percent 0
2026-04-09 15:10:53.889374 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 1 80 0 false
2026-04-09 15:10:53.891728 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 2 80 0 false
2026-04-09 15:10:53.894584 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 3 80 1 false
2026-04-09 15:10:53.897191 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 4 80 0 false
2026-04-09 15:10:53.897790 thermal_mgmt NOTICE: fan_service.lua(154): [HTY] task_count 0
2026-04-09 15:10:55.892035 thermal_mgmt NOTICE: fan_service.lua(135): [HTY] pwm_percent 0
2026-04-09 15:10:55.894265 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 1 80 0 false
2026-04-09 15:10:55.897051 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 2 80 0 false
2026-04-09 15:10:55.899145 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 3 80 1 false
2026-04-09 15:10:55.901279 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 4 80 0 false
2026-04-09 15:10:55.901972 thermal_mgmt NOTICE: fan_service.lua(154): [HTY] task_count 0
2026-04-09 15:10:57.906567 thermal_mgmt NOTICE: fan_service.lua(135): [HTY] pwm_percent 0
2026-04-09 15:10:57.908958 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 1 80 0 false
2026-04-09 15:10:57.911608 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 2 80 0 false
2026-04-09 15:10:57.914160 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 3 80 1 false
2026-04-09 15:10:57.918194 thermal_mgmt NOTICE: fan_service.lua(141): [HTY] fan 4 80 0 false
2026-04-09 15:10:57.918812 thermal_mgmt NOTICE: fan_service.lua(154): [HTY] task_count 0

1 CSR没有定义fans对象,没走批量接口
2 风扇型号未识别,单个风扇设置未执行

adapter下风扇接口加下打印呢,确定下是否是thermal组件进行的pwm设置操作,是否还有其他地方在进行转速设置

adapter的日志在BMC刚启动的时候打印了一次,且因为self:configurable判断没有真正生效,运行过程中没有持续的PWM设置日志

会和cooling组件有关吗?
还有一点比较奇怪,fan_obj_manager:start只有入口的日志输出了,后续日志都没有。

cooling组件下发风扇转速也是走的therma组件的接口下发的,start的712、714中间用pcall加上看看,有没有崩,或者错误打印

pcall未捕捉到异常,屏蔽self.ExpectedPWM = self.HardwarePWM后可以继续后续流程,在fan_obj_manager:set_pwm执行self.Fan.methods.SetFanHardwarePWM时又阻塞了

self:listen('HardwarePWM', function()
    log:debug('Detect fan(%s) HardwarePWM:%s', self.FanId, self.HardwarePWM)
    self:update_pwm_percentage()
end)

看看是不是这个监听回调里卡着了,使用组件版本是多少,有加其他的变更回调吗。并且用手动方式调用csr配置的chip接口读写风扇转速能正常读写吗

组件版本thermal_mgmt/1.71.0@openubmc/stable
HardwarePWM监听可以正常触发,没有新增其它监听

2026-04-10 12:49:08.236572 thermal_mgmt DEBUG: fan_object.lua(929): Detect fan(3) HardwarePWM:203
2026-04-10 12:49:19.073624 thermal_mgmt DEBUG: fan_object.lua(929): Detect fan(3) HardwarePWM:147
2026-04-10 12:49:34.522234 thermal_mgmt DEBUG: fan_object.lua(929): Detect fan(2) HardwarePWM:147
2026-04-10 12:49:37.500254 thermal_mgmt DEBUG: fan_object.lua(929): Detect fan(3) HardwarePWM:115
2026-04-10 12:49:37.582060 thermal_mgmt DEBUG: fan_object.lua(929): Detect fan(3) HardwarePWM:37

手动读写正常

~ ~ # busctl --user call bmc.kepler.hwproxy /bmc/kepler/Chip/Smc/Smc_ExpBoardSMC_0101 bmc.kepler.Chip.BlockIO Write a{ss}uay 0 402657281 1 120
~ ~ # busctl --user call bmc.kepler.hwproxy /bmc/kepler/Chip/Smc/Smc_ExpBoardSMC_0101 bmc.kepler.Chip.BlockIO Read a{ss}uu 0 402657537 1
ay 1 120
~ ~ # busctl --user call bmc.kepler.hwproxy /bmc/kepler/Chip/Smc/Smc_ExpBoardSMC_0101 bmc.kepler.Chip.BlockIO Write a{ss}uay 0 402657281 1 102
~ ~ # busctl --user call bmc.kepler.hwproxy /bmc/kepler/Chip/Smc/Smc_ExpBoardSMC_0101 bmc.kepler.Chip.BlockIO Read a{ss}uu 0 402657537 1
ay 1 102