问题背景
现象描述
跑OS reboot测试的时候ipmitool sdr list概率性出现读取网卡温度失败(FR:1/200)
PCIe NIC3 Temp | ns | No Reading
PCIe NIC1 Temp | ns | No Reading
拓扑结构
Hisport_12
├─Eeprom_3_9_010101
└─Pca9545_PCA9545_01010109
├─Channel_2
│ └─Chip_RaidChip_0101010901
└─Channel_3
├─Pca9555_IO_01010109
└─Chip_MCU1_01010109
└─Channel_0
└─Eeprom_IEU_01010109
Hisport_18
├─Eeprom_3_3_010101
└─Pca9545_PCA9545_01010103
├─Channel_0
│ └─Chip_TempChip_0101010301
├─Channel_1
│ └─Chip_I2c_0101010302
├─Channel_2
│ └─Chip_TempChip_0101010303
└─Channel_3
├─Pca9555_IO_01010103
├─Chip_MCU1_01010103
│ └─Channel_0
│ └─Eeprom_IEU_01010103
└─Lm75_LM75_01010103
初步分析
日志分析
查看app.log可以发现两张RAID卡对应的chip访问插件超时都很严重
2026-04-21 21:49:04.342799 storage WARNING: init.lua(767): service[bmc.kepler.storage] request timeout: remote service[bmc.kepler.hwproxy], path[/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010901], interface[bmc.kepler.Chip.BlockIO], method[PluginRequestEx], used time[116s]
2026-04-21 21:49:04.662633 storage WARNING: init.lua(767): service[bmc.kepler.storage] request timeout: remote service[bmc.kepler.hwproxy], path[/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010901], interface[bmc.kepler.Chip.BlockIO], method[PluginRequestEx], used time[116s]
2026-04-21 21:49:04.671878 storage WARNING: init.lua(767): service[bmc.kepler.storage] request timeout: remote service[bmc.kepler.hwproxy], path[/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010901], interface[bmc.kepler.Chip.BlockIO], method[PluginRequestEx], used time[114s]
2026-04-21 21:53:54.556930 storage WARNING: init.lua(767): service[bmc.kepler.storage] request timeout: remote service[bmc.kepler.hwproxy], path[/bmc/kepler/Chip/Complex/Chip_I2c_0101010302], interface[bmc.kepler.Chip.BlockIO], method[PluginRequestEx], used time[91s]
查看hw_stream.log也有很多Riser Pca9545切换通道失败和Scanner扫描周期延长的情况
2026-04-22 10:13:20.671196 storage_plugin ERROR: scan_context.lua(123): bus: Hisport_18, chip: Pca9545_PCA9545_01010103 open channel(0) failed, error: ./opt/bmc/apps/hwproxy/lualib/chip.lua:656: ./opt/bmc/apps/hwproxy/lualib/stream/hisport.lua:94: response error, ioctl(HISPORT_CMD_WRITE) failed: Unknown error 274[times:3]
2026-04-22 10:13:20.690183 storage_plugin ERROR: scan_context.lua(123): bus: Hisport_18, chip: Pca9545_PCA9545_01010103 open channel(2) failed, error: ./opt/bmc/apps/hwproxy/lualib/chip.lua:656: ./opt/bmc/apps/hwproxy/lualib/stream/hisport.lua:94: response error, ioctl(HISPORT_CMD_WRITE) failed: Unknown error 274[times:2]
2026-04-22 10:14:04.253037 storage_plugin ERROR: scan_context.lua(123): bus: Hisport_18, chip: Pca9545_PCA9545_01010103 open channel(1) failed, error: ./opt/bmc/apps/hwproxy/lualib/chip.lua:656: ./opt/bmc/apps/hwproxy/lualib/stream/hisport.lua:94: response error, ioctl(HISPORT_CMD_WRITE) failed: Unknown error 306[times:1]
2026-04-22 09:15:30.105787 storage_plugin INFO: scanner.lua(408): chip: Chip_TempChip_0101010301 change scan period of scanner: Scanner_Temp_0101010301 from 2000 ms to 4000 ms
2026-04-22 09:15:34.116699 storage_plugin INFO: scanner.lua(408): chip: Chip_TempChip_0101010301 change scan period of scanner: Scanner_Temp_0101010301 from 4000 ms to 8000 ms
2026-04-22 09:15:42.194168 storage_plugin INFO: scanner.lua(408): chip: Chip_TempChip_0101010301 change scan period of scanner: Scanner_Temp_0101010301 from 8000 ms to 16000 ms
2026-04-22 09:15:58.209892 storage_plugin INFO: scanner.lua(408): chip: Chip_TempChip_0101010301 change scan period of scanner: Scanner_Temp_0101010301 from 16000 ms to 32000 ms
总的来看Riser的I2C链路阻塞严重
求助内容
我想到的解决思路是:配置PluginRequestEx超时时间,丢弃超时严重的任务,及时释放对链路的占用
当前是否有接口支持配置该参数
