不经过Riser卡的NPU卡CSR应该如何配置?

问题描述

目前需要适配Atlas 300I Duo和Atlas 300I A2两个NPU卡,没有连接Riser卡,都是直接插在Pcie槽上的
根据官方PCIE配置文档,并没有直连配置的相关教程https://www.openubmc.cn/docs/zh/development/develop_guide/feature_development/pcie_configuration.html

想咨询下应该如何配置?下面是我目前的配置

1.在BCU的Hisport_5添加了两个Connector

"Hisport_5": {
            "Chips":[
            ],
            "Connectors": [
                "Connector_PCIe_1",
                "Connector_PCIe_2"
            ]
        },

2.由于没有Riser卡,所以我跳过了Riser,UBCDD和IEU的配置,直接配置了
BusinessConnector_1和BusinessConnector_2,用来将Serdes和具体的PCIE卡关联

        "BusinessConnector_1": {
            "Name": "Down_1",
            "Direction": "Downstream",
            "Slot": 1,
            "LinkWidth": "X8",
            "MaxLinkRate": "PCIe 4.0",
            "ConnectorType": "PCIe CEM",
            "UpstreamResources": [
                {"Name": "SerDes_1_8","ID": 8,"Offset": 0,"Width": 8}
            ],
            "RefMgmtConnector": "#/Connector_PCIE_1",
            "RefPCIeAddrInfo": "#/PcieAddrInfo_1"
        },
        "BusinessConnector_2": {
            "Name": "Down_2",
            "Direction": "Downstream",
            "Slot": 1,
            "LinkWidth": "X8",
            "MaxLinkRate": "PCIe 4.0",
            "ConnectorType": "PCIe CEM",
            "UpstreamResources": [
                {"Name": "SerDes_1_7","ID": 7,"Offset": 0,"Width": 4}, 
                {"Name": "SerDes_1_10","ID": 10,"Offset": 0,"Width": 4}
            ],
            "RefMgmtConnector": "#/Connector_PCIE_2",
            "RefPCIeAddrInfo": "#/PcieAddrInfo_2"
        },
        "PcieAddrInfo_1": {
            "Location": 1,
            "ComponentType": 8,
            "ContainerSlot": "${Slot}",
            "ContainerUID": "00000001040302074260",
            "ContainerUnitType": "IEU",
            "GroupPosition": "PcieAddrInfo_1_${GroupPosition}"
        },
        "PcieAddrInfo_2": {
            "Location": 2,
            "ComponentType": 8,
            "ContainerSlot": "${Slot}",
            "ContainerUID": "00000001040302074260",
            "ContainerUnitType": "IEU",
            "GroupPosition": "PcieAddrInfo_2_${GroupPosition}"
        },
       "Connector_PCIe_1": {
            "Bom": "14140130", 
            "Id": "19e5d500", 
            "AuxId": "02000110",
            "Slot": 1, 
            "Position": 2,
            "Presence": 1,
            "Buses": [
                "Hisport_5"
            ],
            "SystemId": 1,
            "SilkText": "{Slot}",
            "IdentifyMode": 2,
            "Container": "Component_RiserCard",
            "Type": "PCIe" 
        },
        "Connector_PCIe_2": {
            "Bom": "14140130", 
            "Id": "19e5d802", 
            "AuxId": "19e54000",
            "Slot": 2, 
            "Position": 3,
            "Presence": 1,
            "Buses": [
                "Hisport_5"
            ],
            "SystemId": 1,
            "SilkText": "{Slot}",
            "IdentifyMode": 2,
            "Container": "Component_RiserCard",
            "Type": "PCIe" 
        },

3.在profile声明这两张卡的csr文件

4.在mdbctl 下使用命令查看,发现两张NPU卡均成功加载

% lsobj Chip
Chip_Dmini_01010102
Chip_Dmini_01010103
Chip_Dmini_Elabel_01010102
Chip_Dmini_Elabel_01010103

有几个问题想请问下
1.直接插在PCIE槽上的的PCIE卡的详细加载流程是什么样的?
2.跳过Riser,UBCDD和IEU的上述配置是否有问题?比如BusinessConnector和PcieAddrInfo是否
有字段错误?
3.目前在开发板上虽然没有插对应的NPU卡,按理来讲配置了应该也会有相关NPU卡出现,只是不能显示信息,但目前web上没有
image

麻烦大佬们解答下,谢谢

环境信息

OpenUBMC2503

在BCU上的连接器不经过拓扑建立,因此需要在pcieaddrinfo上预先配好对应的portid、slotid、Bus、Device、Function,bios组件会根据这些内容生成丝印,带内拿到丝印后会在启动阶段上报NPU的BDF,pcie_device会通过上报来加载设备。

按照我现在的理解,这种直连的NPU配置需要:
1.配置连接器连接到对应NPU卡的CSR
2.配置NPU卡的PcieAddrInfo,特别是ComponentType,portid、slotid、Bus、Device、Function这些属性
3.配置BusinessConnector,将关联配置好的 CSR的Connector和PcieAddrInfo绑定起来
4.bios组件根据get_pcie_silk_config和prepare_silk_config生成丝印文件silkconfig,然后到pcie_device根据silkconfig来加载

一:请问下流程是这样的吗?如果哪里不对麻烦更正一下

二:还有一个问题就是配置PcieAddrInfo不生效,我配置了三个PcieAddrInfo对象,两个NPU对象均没有成功加载,但SAS的PcieAddrInfo却加载成功,配置如下,请问PcieAddrInfo哪里出了问题?

        "BusinessConnector_1": {
            "Name": "Down_1",
            "Direction": "Downstream",
            "Slot": 1,
            "LinkWidth": "X8",
            "MaxLinkRate": "PCIe 4.0",
            "ConnectorType": "PCIe CEM",
            "UpstreamResources": [
                {"Name": "SerDes_1_8","ID": 8,"Offset": 0,"Width": 8}
            ],
            "RefMgmtConnector": "#/Connector_PCIE_1",
            "RefPCIeAddrInfo": "#/PcieAddrInfo_NPU_1"
        },
        "BusinessConnector_2": {
            "Name": "Down_2",
            "Direction": "Downstream",
            "Slot": 2,
            "LinkWidth": "X8",
            "MaxLinkRate": "PCIe 4.0",
            "ConnectorType": "PCIe CEM",
            "UpstreamResources": [
                {"Name": "SerDes_1_7","ID": 7,"Offset": 0,"Width": 4}, 
                {"Name": "SerDes_1_10","ID": 10,"Offset": 0,"Width": 4}
            ],
            "RefMgmtConnector": "#/Connector_PCIE_2",
            "RefPCIeAddrInfo": "#/PcieAddrInfo_NPU_2"
        },
        "PcieAddrInfo_NPU_1": {
            "Location": 1,
            "ComponentType": 8,
            "ContainerSlot": 1,
            "ContainerUnitType": "IEU",
            "GroupPosition": "PcieAddrInfo_NPU_1_${GroupPosition}",
            "SocketID": 0,
            "SlotID": 1,
            "PortID":1,
            "Bus":0,
            "DeviceID":3,
            "Function":0
        },
        "PcieAddrInfo_NPU_2": {
            "Location": 2,
            "ComponentType": 8,
            "ContainerSlot": 2,
            "ContainerUnitType": "IEU",
            "GroupPosition": "PcieAddrInfo_NPU_2_${GroupPosition}",
            "SocketID": 0,
            "SlotID": 2,
            "PortID":2,
            "Bus":0,
            "DeviceID":3,
            "Function":0
        },
        "PcieAddrInfo_SAS_1": {
            "Location": "HddBackplane${Slot}",
            "ComponentType": 71,
            "ControllerIndex": 0,
            "ControllerType": 2,
            "ContainerSlot": "${Slot}",
            "GroupPosition": "PcieAddrInfo_SAS_1_${GroupPosition}",
            "ContainerUID": "00000001030302023934",
            "ContainerUnitType": "SEU SAS",
            "SocketID": 0,
            "SlotID": 0,
            "PortID": 2,
            "Bus": 50,
            "Device": 5,
            "Function": 0
        },
  1. 流程没什么问题
  2. 没分发可能是因为部分字段的类型和定义不对,比如Loaction是string,建议排查一下

现在lsobj 能看到配置的PcieAddrInfo_NPU对象了,但是丝印文件里面没有上报对应的NPU对象,下面是完整的丝印文件,请问下丝印文件的生成是只需要配置PcieAddrInfo对象就可以生成吗,是否需要配置其他内容?

cat /data/opt/bmc/conf/bios/silkconfig.json

{
    "Properties": {
        "Type": "BIOS SILK CFG",
        "Version": 1
    },
    "MemSilk": [
        {
            "SocketId": 0,
            "PhysicalChannelId": 0,
            "LogicalChannelId": 3,
            "DimmId": 0,
            "Silk": "DIMM000"
        },
        {
            "SocketId": 0,
            "PhysicalChannelId": 1,
            "LogicalChannelId": 6,
            "DimmId": 0,
            "Silk": "DIMM010"
        },
        {
            "SocketId": 0,
            "PhysicalChannelId": 2,
            "LogicalChannelId": 1,
            "DimmId": 0,
            "Silk": "DIMM020"
        },
        {
            "SocketId": 0,
            "PhysicalChannelId": 3,
            "LogicalChannelId": 4,
            "DimmId": 0,
            "Silk": "DIMM030"
        }
    ],
    "PCIeSilk": [
        {
            "Segment": 0,
            "SocketId": 1,
            "RootPortDeviceId": 16,
            "SlotId": 2,
            "DeviceType": "OCP",
            "Silk": ""
        }
    ],
    "NICSilk": [
        {
            "SocketId": 0,
            "SlotId": 1,
            "Silk": ""
        },
        {
            "SocketId": 1,
            "SlotId": 2,
            "Silk": ""
        },
        {
            "SocketId": 2,
            "SlotId": 5,
            "Silk": ""
        }
    ],
    "DiskSilk": []
}

理论上如果有pcieaddrinfo,在丝印中就会有对应的项目,可以在bios组件中的cache_pcie_silk_info附近打印看一下是否拿取了pcieaddrinfo生成丝印

你好,目前生成了丝印
cat /data/opt/bmc/conf/bios/silkconfig.json

"PCIeSilk":[{"Silk":"","RootPortDeviceId":1,"SocketId":1,"Segment":0,"DeviceType":"PCIe","SlotId":1},{"Silk":"","RootPortDeviceId":2,"SocketId":1,"Segment":0,"DeviceType":"PCIe","SlotId":2}]

os下的300i duo信息,bdf为04:00.0
image

配置的NPU对象的bdf也为 04:00.0

        "PcieAddrInfo_NPU_1": {
            "Location": "HddBackplane${Slot}",
            "ComponentType": 8,
            "ContainerSlot": "${Slot}",
            "ContainerUnitType": "IEU",
            "GroupPosition": "PcieAddrInfo_NPU_1_${GroupPosition}",
            "Segment": 0,
            "GroupID": 1,
            "SocketID": 1,
            "SlotID": 1,
            "PortID":1,
            "Bus":4,
            "DeviceID":0,
            "Function":0
        },

但是在串口信息下面没有找到相关的bios返回给bmc的信息
image
带内也没有上报卡的bdf到bmc

可能是什么问题?bdf已和系统下统一

PcieAddrInfo中应该是rootBDF,不过这里的重点是PortID,建议咨询硬件或者看一下其他rootBDF是这个的卡PortID应该怎么配置。

你好,请教下一个问题
这个PcieAddrInfo的portid是不是根据实际的卡插在的hilink来配置的?
目前这张卡插在hilink7和hilink10并成的x8链路上,portid我现在配置成了serdes_0_7的最小device 12 ,
并通过上下行连接器2 连接了Connector_PCIE_1和PcieAddrInfo_NPU_1这两个对象,同时Connector_PCIE_1只填写了bomid,没有填写id和Auxid,希望通过上报soketid和portid组成的rootBDF来实现自动加载300i duo这张卡的csr,但目前也没有加载成功,lsobj Chip下没有出现300i duo对应的Chip_Dmini对象

"BusinessConnector_2": {
            "Name": "Down_2",
            "Direction": "Downstream",
            "Slot": 2,
            "LinkWidth": "X8",
            "MaxLinkRate": "PCIe 4.0",
            "ConnectorType": "PCIe CEM",
            "UpstreamResources": [
                {"Name": "SerDes_0_7","ID": 7,"Offset": 0,"Width": 4}, 
                {"Name": "SerDes_0_10","ID": 10,"Offset": 0,"Width": 4}
            ],
            "RefMgmtConnector": "#/Connector_PCIE_1",
            "RefPCIeAddrInfo": "#/PcieAddrInfo_NPU_1"
        },
        "PcieAddrInfo_NPU_1": {
            "Location": "HddBackplane${Slot}",
            "ComponentType": 8,
            "ContainerSlot": "${Slot}",
            "ContainerUnitType": "IEU",
            "GroupPosition": "PcieAddrInfo_NPU_1_${GroupPosition}",
            "Segment": 0,
            "GroupID": 1,
            "SocketID": 0,
            "SlotID": 1,
            "PortID":12,
            "Bus":4,
            "DeviceID":0,
            "Function":0
        },
        "Connector_PCIe_1": {
            "Bom": "14140130",
            "Id": "",
            "AuxId": "",
            "Slot": 1,
            "Position": 2,
            "Presence": 1,
            "Buses": [
                "Hisport_5"
            ],
            "SystemId": 1,
            "SilkText": "{Slot}",
            "IdentifyMode": 2,
            "Container": "Component_RiserCard",
            "Type": "PCIe"
        },

查看bios是否上报了设备的BDF,如果没上报的话查看一下丝印,和硬件确定一下PortId的预期。

请问下如何查看bios上报的设备bdf,这块不是很清楚

在os里lspci查看带内的,带外查看bios下一般是Bios_1_010101对象里的PcieCardBDF

目前是识别到这张卡的
image

bmc上报的丝印如下

"PCIeSilk":[{"Silk":"","DeviceType":"PCIe","Segment":0,"RootPortDeviceId":12,"SocketId":0,"SlotId":1},{"Silk":"","DeviceType":"PCIe","Segment":0,"RootPortDeviceId":8,"SocketId":0,"SlotId":2}]

portid具体应该如何确定呢,能详细讲讲吗

带内bios也上报了300i duo卡
image

目前在os下执行 dmidecode 也是有槽位相关信息的,并且busAdddress和带内卡的bdf一致了

但是目前网页仍没有显示出对应的加速卡的信息

这是一键收集的日志,能麻烦看看到底哪里出了问题吗

null_19700101-0837.tar.gz (6.2 MB)

从日志来看已经上报了,PCIe加载流程没有问题,Presence也正常置位,但是对象没有分发。
建议排查一下是不是配置的这个Container找不到导致的"Container": “Component_RiserCard”

目前在位状态是手动置为1的,之前没有对象分发大概率是因为我没有填id和Auxid,
目前填了,
1.web界面有对应的卡出现,但是信息是缺失的,请问下信息是什么原因?

  "Connector_PCIe_1": {
            "Bom": "14140130",
            "Id": "19e5d500",
            "AuxId": "02000110",
            "Slot": 2,
            "Position": 2,
            "Presence": 1,
            "Buses": [
                "Hisport_5"
            ],
            "SystemId": 1,
            "SilkText": "{Slot}",
            "IdentifyMode": 2,
            "Type": "PCIe"
        },

2.是PCIEAddrInfo的portid有误吗?如果是Portid配置错误,那为什么dmidecode下又有正确的设备BDF出现呢?bios上报的加速卡的bdf也为04:00.0

3.直连状态下,Pcie卡应该怎么实现动态加载呢,比如槽位1插卡和槽位2插卡都能正确显示,而不是通过Connector直接写死Auxid和id

目前的日志:
null_19700101-0635.tar.gz (5.7 MB)

位置信息依赖"Container"的配置,需要配置为当前板的Component对象。

直连PCIE也需要配置上下行连接器吗,我现在只配置了两个下行连接器,分别用来绑定两个PCIE槽位,无法实现根据BIOS上报的四元组自行加载sr和上下行连接器有关吗

        "BusinessConnector_1": {
            "Name": "Down_1",
            "Direction": "Downstream",
            "Slot": 1,
            "LinkWidth": "X8",
            "MaxLinkRate": "PCIe 4.0",
            "ConnectorType": "PCIe CEM",
            "UpstreamResources": [
                {"Name": "SerDes_0_8","ID": 8,"Offset": 0,"Width": 8}
            ],
            "RefMgmtConnector": "#/Connector_PCIe_1",
            "RefPCIeAddrInfo": "#/PcieAddrInfo_NPU_1"
        },
        "BusinessConnector_2": {
            "Name": "Down_2",
            "Direction": "Downstream",
            "Slot": 2,
            "LinkWidth": "X8",
            "MaxLinkRate": "PCIe 4.0",
            "ConnectorType": "PCIe CEM",
            "UpstreamResources": [
                {"Name": "SerDes_0_7","ID": 7,"Offset": 0,"Width": 4}, 
                {"Name": "SerDes_0_10","ID": 10,"Offset": 0,"Width": 4}
            ],
            "RefMgmtConnector": "#/Connector_PCIe_2",
            "RefPCIeAddrInfo": "#/PcieAddrInfo_NPU_2"
        },

docs/docs/zh/development/develop_guide/feature_development/pcie_device_topo_create.md-代码预览-docs:基于 Node.js 的社区文档中心项目 - AtomGit | GitCode
FAQ中有说明。

我目前是想通过上报BDF的方式进行卡的加载,现在按照QA的提示,配置了下行连接器,PcieAddrInfo和Connector,同时根据拓扑文档中删除了代码会自动填充的相关字段(PortID,SocketID和Rootbdf),原先是手动填充的,现在的结果是silkconfig的PcieSilk属性为空,很明显bmc没有上报丝印成功,不太清楚是下行连接器,PcieAddrInfo和Connector中的哪个配置出了问题导致上报失败(个人推测可能是代码没能填充对应的PortID,SocketID和Rootbdf),社区里的只有含Riser卡的适配,不太清楚直连相关配置的更改 能否给个邮箱我将日志发给您帮忙分析下问题所在,我之前没有接触过这方面,也不太了解相关代码和硬件,不知道如何分析日志,麻烦了