300ipro与300vpro在os下查询四元组信息一致,bmc该如何区分

使用npu-smi工具查询pcie卡相关信息:


总共有两张npu卡,第一张的npu chip id为536,第二张的npu chip id为544。

544为300vpro,536为300ipro

查询300vpro的四元组信息:

查询300ipro的四元组信息:


发现两张卡的四元组信息是一致的,是操作问题还是设计本身如此呢,如果四元组信息一致 该如何在bmc测区分这两张卡

在csr中用boardid区分,可以参考v2中xml的区分方式

请问你的意思是,两张pcie卡公共同一个sr文件,然后通过boardid 表达式去判断是vpro还是ipro吗?

是的,v2不就是这么搞的

请问在开源v3中,如何在哪处代码中实现区分呢?

@zybwh 可以帮忙看看 提供一下思路吗?

看了一下,感觉能做的事情不多,只能通过boardid+表达式的方式来实现了

目前通过boardid+表达式的方式来实现,会出现以下情况:Atlas 300V pro NPU卡信息显示不全 - #5,来自 YMQMKK boardid是在npu卡的chip完全加载起来之后,再同步获取到boardid,所以第一次加载会很慢。

你的意思是web加载的时候会很慢是么?因为配的是引用导致读取了一次硬件?
试试同步呢?可能第一次会变快但是会是错的

是的,web-其他-pcie卡界面第一次加载会一直转圈圈,很久才能加载成功。如果将引用改为同步,Description、fruname等一直显示的都会有问题

            "Name": "<=/NPUCard_1.BoardID |> expr( $1==182 ? 'Atlas 300V Video Analysis Card' : $1==175 ? 'Atlas 300V Pro Video Analysis Card' : 'Atlas 300I Pro Inference Card')",
            "BoardName": "<=/FruData_NPUCard.BoardProductName;<=/FruData_NPUCard.BoardProductName |> string.cmp($1, '') |> expr($1 ? 'IT21PDDA' : $2)",
            "Description": "<=/PCIeCard_1.Name |> string.format('%s PCI-E 1*16x (HHHL)', $1)",

boardid是配的accessor吗?试试改成scanner?

board id是通过Atlas 中心推理卡 带外管理接口说明 12读的吧,可以通过accessor或者scanner读吗?

        "NPUCard_1":{
            "Name": "<=/PCIeCard_1.Name",
            "CardDescription": "<=/PCIeCard_1.Description",
            "DeviceName": "<=/PCIeDevice_1.DeviceName",
            "RefChip":"#/Chip_Dmini",
            "RefEeprom":"#/Chip_Dmini_Elabel",
            "RefFrudata": "#/FruData_NPUCard",
            "Model": "Atlas_300I_Pro",
            "SlotNumber": "${Slot}",
            "PcbVersion": ".A",
            "BoardID": 171,
            "FirmwareVersion": "N/A",
            "CardPartNumber": "03028DFH",
            "SerialNumber":"<=/FruData_NPUCard.BoardSerialNumber"
        },

这里的boardid,在加载一段时间后,如果是300vpro的卡,会自动更新为175

感觉配置还是有一定的问题,理论上就算是boardid读取速度慢,也不应该影响北向的查询,顶多是查询的数据是错误的。
只有那种直接关联Accessor的,才会导致北向查询的时候一定会访问一次硬件。如果是通过组件来获取的,只会读缓存值,不会触发硬件读取。

几个问题要解决一下,首先,所有的表达式都指向同一个数据源,这样子数据源更新后,其他值都会触发更新。如果是一层一层依赖的话,传递性会有影响。

同步不行的原因是因为boardid本身不发信号,所以同步配了后也没用,都要改成引用。

是的,第一次读的会转圈圈很久,后面再次访问就不会转圈圈了,直接可以显示300vpro和300ipro。

就是这个转圈圈没理解为什么,可能要排查一下哪个接口导致整体时间变长

出现的场景:刷BMC包 → 上电OS-> 一开始Pcie卡的状态是(0/20)-> 后面bios加载成功后变为(2/20),点击如下图web页面,转圈圈几分钟。



以下是在转圈圈的时候,我使用 tail -f /var/log/app.log 抓的貌似相关的日志:

2025-08-25 09:18:07.260773 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010106's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010106
2025-08-25 09:18:07.263409 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:08.634888 power_strategy ERROR: power_strategy_utils.lua(89): Unhealthy power monitor status
2025-08-25 09:18:12.262747 pcie_device NOTICE: biz_topo.lua(570): [BizTopo] configuration list length: 1, uid: 00000055040139250005 [repeated 4 times in 0s from 2025-08-25 09:12:52.059892 to 2025-08-25 09:12:52.404066][flush]
2025-08-25 09:18:13.204525 network_adapter NOTICE: device_manager.lua(434): get network adapter model for b=124 d=0 f=0. model=CPU Integration
2025-08-25 09:18:21.997324 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010104's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010104
2025-08-25 09:18:21.998796 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010104], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:22.048557 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010106's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010106
2025-08-25 09:18:22.050126 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:26.375258 thermal_mgmt WARNING: init.lua(574): service[unknown] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010104], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:29.688357 network_adapter NOTICE: device_manager.lua(434): get network adapter model for b=124 d=0 f=0. model=CPU Integration
2025-08-25 09:18:48.469877 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010104], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:18:48.530441 pcie_device ERROR: remote_rp.lua(174): Get object PCIeCard_1_0101010106's reference property Name failed, err: org.freedesktop.DBus.Error.UnknownObject: Unknown object path /bmc/kepler/Systems/1/PCIeDevices/PCIeCards/NPUCards/NPUCard_1_0101010106
2025-08-25 09:18:48.532361 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]
2025-08-25 09:20:19.569438 event NOTICE: object_manage.lua(236): add objects completely, path: /bmc/kepler/ObjectGroup/0101010106, position: 0101010106, life cycle id: 1, took 0ms [repeated 35 times in 185s from 2025-08-25 09:13:53.320080 to 2025-08-25 09:16:58.492166][flush]
2025-08-25 09:20:22.408348 web_backend WARNING: init.lua(574): service[bmc.kepler.web_backend] request timeout: remote service[bmc.kepler.pcie_device], path[/bmc/kepler/Systems/1/PCIeDevices/PCIeCards/PCIeCard_1_0101010106], interface[bmc.kepler.Object.Properties], method[GetWithContext], used time[60s]

这是你自己加的lua?看上去像是pcie_device卡住了,是做了什么同步操作么,而且还找不到NPUCard的两个对象

这不是我加的lua,目前只在vpd使用引用和同步语法 csr如下:
14140130_19e5d500_02000100.sr:

        "PCIeDevice_1": {
            "Segment": 1,
            "DeviceName": "PCIe Card $ (NPU)",
            "DiagnosticFault": 0,
            "PredictiveFault": 0,
            "FunctionClass": 9,
            "LinkSpeedReduced": 0,
            "CorrectableError": 0,
            "UncorrectableError": 0,
            "FatalError": 0,
            "Position": "",
            "Container": "${Container}",
            "GroupPosition": "PCIeDevice_${GroupPosition}",
            "DeviceType": 8,
            "PCIeDeviceType": "SingleFunction",
            "SlotType": "FullLength",
            "FunctionProtocol": "PCIe",
            "FunctionType": "Physical"
        },
        "FruData_NPUCard": {
            "FruId": 1,
            "StorageType": "MCU",
            "FruDev": "#/Chip_Dmini_Elabel",
            "FruName":"<=/PCIeCard_1.Name"
        },
        "Fru_NPUCard": {
            "PcbVersion": ".A",
            "FruId": 1,
            "PowerState": 1,
            "FruName": "<=/PCIeCard_1.Name",
            "Health": 0,
            "EepStatus": 1,
            "Type": 8,
            "FruDataId": "#/FruData_NPUCard"
        },
        "NPUCard_1":{
            "Name": "<=/PCIeCard_1.Name",
            "CardDescription": "<=/PCIeCard_1.Description",
            "DeviceName": "<=/PCIeDevice_1.DeviceName",
            "RefChip":"#/Chip_Dmini",
            "RefEeprom":"#/Chip_Dmini_Elabel",
            "RefFrudata": "#/FruData_NPUCard",
            "Model": "Atlas_300I_Pro",
            "SlotNumber": "${Slot}",
            "PcbVersion": ".A",
            "BoardID": 171,
            "FirmwareVersion": "N/A",
            "CardPartNumber": "03028DFH",
            "SerialNumber":"<=/FruData_NPUCard.BoardSerialNumber"
        },
        "PCIeCard_1": {
            "DeviceName": "<=/PCIeDevice_1.DeviceName",
            "SlotID": "<=/PCIeDevice_1.SlotID",
            "NodeID": "<=/PCIeDevice_1.SlotID |> string.format('PCIeCard%s',$1)",
            "Health": "<=/Component_PCIeCard.Health",
            "Name": "#/NPUCard_1.BoardID |> expr( $1==182 ? 'Atlas 300V Video Analysis Card' : $1==175 ? 'Atlas 300V Pro Video Analysis Card' : 'Atlas 300I Pro Inference Card')",
            "BoardName": "<=/FruData_NPUCard.BoardProductName;<=/FruData_NPUCard.BoardProductName |> string.cmp($1, '') |> expr($1 ? 'IT21PDDA' : $2)",
            "Description": "#/PCIeCard_1.Name |> string.format('%s PCI-E 1*16x (HHHL)', $1)",
            "FunctionClass": 9,
            "VendorID": 6629,
            "DeviceID": 54528,
            "SubVendorID": 512,
            "SubDeviceID": 256,
            "Position": "<=/PCIeDevice_1.Position",
            "LaneOwner": "<=/PCIeDevice_1.SocketID",
            "FirmwareVersion": "#/NPUCard_1.FirmwareVersion",
            "Manufacturer": "Huawei",
            "PartNumber": "03028DFH",
            "MaxFrameLen": 64,
            "LinkSpeed": "N/A",
            "LinkSpeedCapability": "N/A",
            "PcbVersion": "#/NPUCard_1.PcbVersion",
            "BoardID": "#/NPUCard_1.BoardID",
            "DevBus": "<=/PCIeDevice_1.DevBus",
            "DevDevice": "<=/PCIeDevice_1.DevDevice",
            "SerialNumber": "<=/FruData_NPUCard.BoardSerialNumber",
            "DevFunction": "<=/PCIeDevice_1.DevFunction"
        },

你把这个表达式删掉,圈还会转么

        "Name": "#/NPUCard_1.BoardID |> expr( $1==182 ? 'Atlas 300V Video Analysis Card' : $1==175 ? 'Atlas 300V Pro Video Analysis Card' : 'Atlas 300I Pro Inference Card')",

修改为

        "Name": "Atlas 300I Pro Inference Card",

不会转圈,但是显示内容的会有问题