使用npu-smi工具查询pcie卡相关信息:
总共有两张npu卡,第一张的npu chip id为536,第二张的npu chip id为544。
544为300vpro,536为300ipro
查询300vpro的四元组信息:
查询300ipro的四元组信息:
发现两张卡的四元组信息是一致的,是操作问题还是设计本身如此呢,如果四元组信息一致 该如何在bmc测区分这两张卡
使用npu-smi工具查询pcie卡相关信息:
查询300vpro的四元组信息:
查询300ipro的四元组信息:
在csr中用boardid区分,可以参考v2中xml的区分方式
请问你的意思是,两张pcie卡公共同一个sr文件,然后通过boardid 表达式去判断是vpro还是ipro吗?
是的,v2不就是这么搞的
请问在开源v3中,如何在哪处代码中实现区分呢?
@zybwh 可以帮忙看看 提供一下思路吗?
看了一下,感觉能做的事情不多,只能通过boardid+表达式的方式来实现了
目前通过boardid+表达式的方式来实现,会出现以下情况:Atlas 300V pro NPU卡信息显示不全 - #5,来自 YMQMKK boardid是在npu卡的chip完全加载起来之后,再同步获取到boardid,所以第一次加载会很慢。
你的意思是web加载的时候会很慢是么?因为配的是引用导致读取了一次硬件?
试试同步呢?可能第一次会变快但是会是错的
是的,web-其他-pcie卡界面第一次加载会一直转圈圈,很久才能加载成功。如果将引用改为同步,Description、fruname等一直显示的都会有问题
"Name": "<=/NPUCard_1.BoardID |> expr( $1==182 ? 'Atlas 300V Video Analysis Card' : $1==175 ? 'Atlas 300V Pro Video Analysis Card' : 'Atlas 300I Pro Inference Card')",
"BoardName": "<=/FruData_NPUCard.BoardProductName;<=/FruData_NPUCard.BoardProductName |> string.cmp($1, '') |> expr($1 ? 'IT21PDDA' : $2)",
"Description": "<=/PCIeCard_1.Name |> string.format('%s PCI-E 1*16x (HHHL)', $1)",
boardid是配的accessor吗?试试改成scanner?
board id是通过Atlas 中心推理卡 带外管理接口说明 12读的吧,可以通过accessor或者scanner读吗?
"NPUCard_1":{
"Name": "<=/PCIeCard_1.Name",
"CardDescription": "<=/PCIeCard_1.Description",
"DeviceName": "<=/PCIeDevice_1.DeviceName",
"RefChip":"#/Chip_Dmini",
"RefEeprom":"#/Chip_Dmini_Elabel",
"RefFrudata": "#/FruData_NPUCard",
"Model": "Atlas_300I_Pro",
"SlotNumber": "${Slot}",
"PcbVersion": ".A",
"BoardID": 171,
"FirmwareVersion": "N/A",
"CardPartNumber": "03028DFH",
"SerialNumber":"<=/FruData_NPUCard.BoardSerialNumber"
},
这里的boardid,在加载一段时间后,如果是300vpro的卡,会自动更新为175
感觉配置还是有一定的问题,理论上就算是boardid读取速度慢,也不应该影响北向的查询,顶多是查询的数据是错误的。
只有那种直接关联Accessor的,才会导致北向查询的时候一定会访问一次硬件。如果是通过组件来获取的,只会读缓存值,不会触发硬件读取。
几个问题要解决一下,首先,所有的表达式都指向同一个数据源,这样子数据源更新后,其他值都会触发更新。如果是一层一层依赖的话,传递性会有影响。
同步不行的原因是因为boardid本身不发信号,所以同步配了后也没用,都要改成引用。
是的,第一次读的会转圈圈很久,后面再次访问就不会转圈圈了,直接可以显示300vpro和300ipro。
就是这个转圈圈没理解为什么,可能要排查一下哪个接口导致整体时间变长
出现的场景:刷BMC包 → 上电OS-> 一开始Pcie卡的状态是(0/20)-> 后面bios加载成功后变为(2/20),点击如下图web页面,转圈圈几分钟。
2025-08-25 09:18:07.260773 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010106's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010106
2025-08-25 09:18:07.263409 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:08.634888 power_strategy ERROR: power_strategy_utils.lua(89): Unhealthy power monitor status
2025-08-25 09:18:12.262747 pcie_device NOTICE: biz_topo.lua(570): [BizTopo] configuration list length: 1, uid: 00000055040139250005 [repeated 4 times in 0s from 2025-08-25 09:12:52.059892 to 2025-08-25 09:12:52.404066][flush]
2025-08-25 09:18:13.204525 network_adapter NOTICE: device_manager.lua(434): get network adapter model for b=124 d=0 f=0. model=CPU Integration
2025-08-25 09:18:21.997324 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010104's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010104
2025-08-25 09:18:21.998796 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010104], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:22.048557 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010106's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010106
2025-08-25 09:18:22.050126 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:26.375258 thermal_mgmt WARNING: init.lua(574): service[unknown] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010104], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:29.688357 network_adapter NOTICE: device_manager.lua(434): get network adapter model for b=124 d=0 f=0. model=CPU Integration
2025-08-25 09:18:48.469877 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010104], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:48.530441 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010106's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010106
2025-08-25 09:18:48.532361 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:20:19.569438 event NOTICE: object_manage.lua(236): add objects completely, path: /bmc/kepler/ObjectGroup/0101010106, position: 0101010106, life cycle id: 1, took 0ms [repeated 35 times in 185s from 2025-08-25 09:13:53.320080 to 2025-08-25 09:16:58.492166][flush]
2025-08-25 09:20:22.408348 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
这是你自己加的lua?看上去像是pcie_device卡住了,是做了什么同步操作么,而且还找不到NPUCard的两个对象
这不是我加的lua,目前只在vpd使用引用和同步语法 csr如下:
14140130_19e5d500_02000100.sr:
"PCIeDevice_1": {
"Segment": 1,
"DeviceName": "PCIe Card $ (NPU)",
"DiagnosticFault": 0,
"PredictiveFault": 0,
"FunctionClass": 9,
"LinkSpeedReduced": 0,
"CorrectableError": 0,
"UncorrectableError": 0,
"FatalError": 0,
"Position": "",
"Container": "${Container}",
"GroupPosition": "PCIeDevice_${GroupPosition}",
"DeviceType": 8,
"PCIeDeviceType": "SingleFunction",
"SlotType": "FullLength",
"FunctionProtocol": "PCIe",
"FunctionType": "Physical"
},
"FruData_NPUCard": {
"FruId": 1,
"StorageType": "MCU",
"FruDev": "#/Chip_Dmini_Elabel",
"FruName":"<=/PCIeCard_1.Name"
},
"Fru_NPUCard": {
"PcbVersion": ".A",
"FruId": 1,
"PowerState": 1,
"FruName": "<=/PCIeCard_1.Name",
"Health": 0,
"EepStatus": 1,
"Type": 8,
"FruDataId": "#/FruData_NPUCard"
},
"NPUCard_1":{
"Name": "<=/PCIeCard_1.Name",
"CardDescription": "<=/PCIeCard_1.Description",
"DeviceName": "<=/PCIeDevice_1.DeviceName",
"RefChip":"#/Chip_Dmini",
"RefEeprom":"#/Chip_Dmini_Elabel",
"RefFrudata": "#/FruData_NPUCard",
"Model": "Atlas_300I_Pro",
"SlotNumber": "${Slot}",
"PcbVersion": ".A",
"BoardID": 171,
"FirmwareVersion": "N/A",
"CardPartNumber": "03028DFH",
"SerialNumber":"<=/FruData_NPUCard.BoardSerialNumber"
},
"PCIeCard_1": {
"DeviceName": "<=/PCIeDevice_1.DeviceName",
"SlotID": "<=/PCIeDevice_1.SlotID",
"NodeID": "<=/PCIeDevice_1.SlotID |> string.format('PCIeCard%s',$1)",
"Health": "<=/Component_PCIeCard.Health",
"Name": "#/NPUCard_1.BoardID |> expr( $1==182 ? 'Atlas 300V Video Analysis Card' : $1==175 ? 'Atlas 300V Pro Video Analysis Card' : 'Atlas 300I Pro Inference Card')",
"BoardName": "<=/FruData_NPUCard.BoardProductName;<=/FruData_NPUCard.BoardProductName |> string.cmp($1, '') |> expr($1 ? 'IT21PDDA' : $2)",
"Description": "#/PCIeCard_1.Name |> string.format('%s PCI-E 1*16x (HHHL)', $1)",
"FunctionClass": 9,
"VendorID": 6629,
"DeviceID": 54528,
"SubVendorID": 512,
"SubDeviceID": 256,
"Position": "<=/PCIeDevice_1.Position",
"LaneOwner": "<=/PCIeDevice_1.SocketID",
"FirmwareVersion": "#/NPUCard_1.FirmwareVersion",
"Manufacturer": "Huawei",
"PartNumber": "03028DFH",
"MaxFrameLen": 64,
"LinkSpeed": "N/A",
"LinkSpeedCapability": "N/A",
"PcbVersion": "#/NPUCard_1.PcbVersion",
"BoardID": "#/NPUCard_1.BoardID",
"DevBus": "<=/PCIeDevice_1.DevBus",
"DevDevice": "<=/PCIeDevice_1.DevDevice",
"SerialNumber": "<=/FruData_NPUCard.BoardSerialNumber",
"DevFunction": "<=/PCIeDevice_1.DevFunction"
},
你把这个表达式删掉,圈还会转么
"Name": "#/NPUCard_1.BoardID |> expr( $1==182 ? 'Atlas 300V Video Analysis Card' : $1==175 ? 'Atlas 300V Pro Video Analysis Card' : 'Atlas 300I Pro Inference Card')",
修改为
"Name": "Atlas 300I Pro Inference Card",
不会转圈,但是显示内容的会有问题