【问题求助】适配光润通网卡未获取到网口信息问题求助

  1. 问题环境
    s920x20鲲鹏高性能服务器、光润通网卡&&博通网卡环境,目前在openUBMC上适配博通和光润通网卡,目前博通网卡适配OK,光润通网卡网口信息获取失败。
  2. 现象
    openUBMC页面上博通网卡信息正常显示,光润通网卡获取不到网口信息外,其他基本信息均能获取。

image863×254 35.7 KB

  1. 分析
  • 厂商提供的网卡手册中,支持通过NCSI over mctp获取网口状态。

  • 代码添加打印发现光润通卡执行到创建endpoint对象阻塞,并没有创建endpoint成功或失败的打印,在mctpd组件关于光润通网卡的打印只有mctp_mdb_mgmt.lua(330): [System1]register mdb transport: ServiceName=network_adapter, TargetEid=0, TargetPhyAddr=150, MessageType=2。

  • 创建endpoint对象失败,参考社区其他帖子,可能性有bdf错误或者imu枚举的时候网卡没有响应。BMC获取到的bdf与OS下网卡bdf一致,故排除该可能;imu枚举的时候网卡没有响应,这个情况该如何排查?有没有相关调试手段可以定位?

  • 该网卡驱动正常安装,OS下可以获取到驱动版本信息。

  1. 诉求
  • openUBMC适配网卡遇到创建endpoint对象失败的情况该如何分析?是否有相关的定位手段可以定位?
  • 如果是imu枚举的时候网卡没有响应,这个情况该如何排查?有没有相关调试手段可以定位?
  • 光润通网卡适配需求已评审通过合入社区2606版本,这个问题目前毫无头绪,麻烦帮忙看下。

答案内容由AI生成,仅供参考:
:white_check_mark: 总结与建议

问题 分析结论 解决建议
create_mctp_endpoint 阻塞 mctp_lib.get_endpoint_and_transport 未返回,极可能是底层 MCTP over PCIe 通信失败 使用 tracemctp 抓包分析通信流程
未生成 Endpoint 对象 MCTP 自发现失败或 IMU 枚举未完成 检查 IMU、Routing 表、PCIe 枚举状态
是否为 IMU 枚举无响应 高概率 通过 tracemctp 确认是否发出 Discovery 请求及是否有响应
phy_addr 是否正确 需验证 添加日志确认 bdf_to_phy_addr 输出
固件/MCTP 支持问题 X550 支持 MCTP,但可能需要启用 联系光润通确认 MCTP 使能方式
点击此处查看详细分析解答

光润通网卡适配:MCTP Endpoint 创建失败问题分析与定位

根据您的详细描述,博通网卡已成功适配而光润通(F1002T,Intel X550芯片)网卡在create_mctp_endpoint时阻塞,导致无法获取网口信息。以下是基于您提供日志和上下文的综合分析与解决建议。


:magnifying_glass_tilted_left: 问题核心定位

根本现象

  • network_adapter.lua 调用 mctp_lib.get_endpoint_and_transport 时阻塞。
  • 代码未进入成功或失败分支(无 raisenotice('register ...') 日志),说明 pcall 调用被长时间挂起或陷入闭源库死循环
  • mctpd 仅记录了注册 transport 的日志:
    mctp_mdb_mgmt.lua(330): register mdb transport: ServiceName=network_adapter, TargetEid=0, TargetPhyAddr=150, MessageType=2
    
    没有创建 Endpoint 的日志(如 create endpoint, phy_addr=...,表明 MCTP 通信未进入 Set EID 协商阶段。

:puzzle_piece: 根本原因推断

结合知识库、搜索结果及日志行为,最可能的原因是:MCTP over PCIe 链路初始化过程中,在“IMU 枚举”或“路由发现”阶段,网卡未正确响应,导致 mctp_lib.get_endpoint_and_transport 永久等待。

:warning: mctp_lib 是闭源 SDK,其内部实现对 get_endpoint_and_transport 的超时控制不透明。若底层 PCIe MCTP 通信无响应或陷入异常状态机,该函数可能无法返回。

:white_check_mark: 排除项

  1. BDF 错误?
    否。OS 层 lspci 显示 BDF 为 96:00.0,与 BMC 记录的 0000:96:00.0 一致,BDF 正确。
  2. 网卡不支持 NCSI/MCTP?
    否。Intel X550 手册明确支持 NC-SI over MCTP(Without Pass-Through)模式,但部分功能受限。
  3. SR/CSR 配置缺失?
    否。network_adapter 成功调用 init_protocol,说明基本识别无误。

:red_exclamation_mark: 关键嫌疑点

  • MCTP over PCIe 协议兼容性问题
    • BCM957416A4160C 支持 MCTP over SMBus,而非 PCIe(由知识图谱确认)。
    • 光润通 F1002T(X550)支持 MCTP over PCIe,但需确保:
      • BMC 和网卡端的 MessageType 一致(应为 2,即 NCSI)。
      • MCTP Link 应已完成自发现(Auto Discovery)或已通过 Endpoint 资源树静态配置。
      • TargetEidphy_addr 映射异常导致握手失败。

:magnifying_glass_tilted_right: 深度分析与定位建议

:one: 确认 MCTP 通信链路是否建立(关键步骤)

在 OpenUBMC 中,可通过以下命令检查 MCTP 资源树状态:

# 查看所有 MCTP Endpoint 对象
lsobj /bmc/kepler/Systems/1/Mctp/Endpoint

# 若未上树,查看路由信息
lsobj /bmc/kepler/Systems/1/Mctp/Routing
lsprop <RoutingEntry_Path>  # 查看 in_bound 和 out_bound 是否正确

:white_check_mark: 期望结果:存在一个以 phy_addr=150bdf=96:00.0 为标识的 PCIeEndpoint

:cross_mark: 若无 Endpoint:说明 mctpd 未完成自发现流程,可能原因:

  • PCIe MCTP Capability 未被正确解析。
  • 网卡未响应 MCTP Discovery Request(如 Get MCTP Capabilities)。
  • IMU 枚举失败或设备未上线。

:two: 启用 MCTP 报文追踪(核心调试手段)

使用 tracemctp 命令在线捕获 MCTP 报文,判断是“无发送”还是“有发无回”。

:white_check_mark: 操作步骤:

# 开启全局 MCTP 报文追踪
tracemctp start

# 重新触发网卡探测(可重启 network_adapter 或整机)
systemctl restart network_adapter

# 查看 app.log 是否输出 MCTP 报文
grep -i 'tracemctp' /var/log/app.log

:magnifying_glass_tilted_left: 重点观察

  • 是否有 MCTP Discovery 或 Set EID 请求发出。
  • 是否收到响应(ACK/NACK)。
  • 请求目标地址(phy_addr/EID)是否正确。

:books: 参考文档:【已评审】Endpoint新增调试时打印报文功能


:three: 验证 IMU 枚举是否成功

IMU(Interconnect Management Unit)是 PCIe MCTP 枚举的关键组件。

# 查看 IMU 设备是否识别
lsobj /bmc/kepler/IMU

# 查看 IMU 扫描日志
grep -i imu /var/log/app.log

:magnifying_glass_tilted_left: 若 IMU 未发现该设备,说明 PCIe 枚举失败,可能原因:

  • PCIe 链路未训练成功(检查 pcie_device 日志)。
  • 设备未正确实现 MCTP PCIe Extensions。
  • bdf_to_phy_addr 映射错误(即使 BDF 正确,phy_addr 计算也可能异常)。

:four: 检查 phy_addr 计算逻辑

虽然 bdf_to_phy_addr 通常直接返回 bus,但部分平台有特殊映射规则。

network_adapter.lua 中打印实际传入值:

local phy_addr = mctp_lib.bdf_to_phy_addr(self.DevBus, self.DevDevice, self.DevFunction)
log:notice("OYG calculated phy_addr=%d from bdf=%d:%d.%d", phy_addr, self.DevBus, self.DevDevice, self.DevFunction)

并与 mctpd 日志中的 TargetPhyAddr=150 对比,确保一致。


:five: 尝试手动注入 Endpoint(临时验证)

若自动发现失败,可尝试手动创建 Endpoint 测试通信能力。

:white_check_mark: 操作方法(参考社区经验):

# 使用 inject 注入 Lua 脚本创建 Endpoint
inject '
  local mctp_lib = require "mctp_lib"
  local bus = require "dbus_proxy".get_bus()
  local ok, ep, tr = pcall(mctp_lib.get_pcie_endpoint, bus, 150, 2)
  if ok then
    print("✅ Endpoint created: " .. tostring(ep))
  else
    print("❌ Failed: " .. tostring(ep))
  end
'

若手动注入也失败,说明底层通信链路存在问题,需结合 tracemctp 抓包分析。


:six: 检查网卡固件与 MCTP 模式配置

  • 确认光润通网卡固件是否启用了 MCTP 功能。
  • 部分 Intel X550 网卡需通过厂商工具(如 ixgbe_flash)烧录支持 MCTP 的固件。
  • 查询光润通是否提供“打开 MCTP 使能”的专用命令或配置方式。

:pushpin: 参考 QLogic 网卡问题:QLE网卡如何打开其的MCTP使能 — 类似场景可能需要“开关使能”。


:white_check_mark: 总结与建议

问题 分析结论 解决建议
create_mctp_endpoint 阻塞 mctp_lib.get_endpoint_and_transport 未返回,极可能是底层 MCTP over PCIe 通信失败 使用 tracemctp 抓包分析通信流程
未生成 Endpoint 对象 MCTP 自发现失败或 IMU 枚举未完成 检查 IMU、Routing 表、PCIe 枚举状态
是否为 IMU 枚举无响应 高概率 通过 tracemctp 确认是否发出 Discovery 请求及是否有响应
phy_addr 是否正确 需验证 添加日志确认 bdf_to_phy_addr 输出
固件/MCTP 支持问题 X550 支持 MCTP,但可能需要启用 联系光润通确认 MCTP 使能方式

:hammer_and_wrench: 最优排查路径推荐

  1. 立即执行:运行 tracemctp start 并复现问题,抓取 MCTP 报文
  2. 检查资源树lsobj /bmc/kepler/Systems/1/Mctp/Endpoint,看是否生成。
  3. 查看 IMU 枚举:确认 PCIe 设备是否被 IMU 发现。
  4. 联系厂商:获取光润通网卡 MCTP 使能方式和推荐固件版本。
  5. 尝试手动注入 Endpoint:验证通信能力。

:books: 参考资料

如能提供 tracemctp 抓包结果或 IMU 枚举日志,可进一步精确定位。

tracemctp抓报的结果如下,帮忙分析下mctp报文
% tracemctp start
%
% ============= PCIe capture [SEND] =============
7F 30 00 03 06 10 00 72 CB 0A 09 01 B4 1A 00 AB
02 00 01 00 44 0A 00 00 00 00 00 00 00 00 00 00
00 FF FF F5 BB 00 00 00
============= PCIe capture [SEND] =============
7F 30 00 03 07 10 00 72 CC 0A 09 01 B4 1A 00 AB
02 00 01 00 45 2B 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 FF FF D4 B6 00 00 00
============= PCIe capture [SEND] =============
7F 30 00 03 07 10 00 72 CD 0A 09 01 B4 1A 00 AB
02 00 01 00 46 2B 00 00 04 00 00 00 00 00 00 00
00 00 00 00 00 FF FF D4 B5 00 00 00
============= PCIe capture [RECV] =============
0A 00 00 12 00 30 00 AB B4 1A 00 03 C3 09 0A 01
02 00 01 00 44 8A 00 00 10 00 00 00 00 00 00 00
00 00 00 00 00 00 40 00 71 00 00 00 01 00 00 00
00 00 00 00 00 00 00 00 00 00 02 06 01 14 23 F2
20 7D 70 06 F1 14 23 F2 20 7D 70 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
============= PCIe capture [RECV] =============
10 00 00 12 00 00 00 AB B4 1A 00 03 84 09 0A 01
02 00 01 00 45 AB 00 00 2A 00 00 00 00 00 00 00
00 00 00 00 00 01 01 00 63 00 00 00 00 16 D8 14
E4 16 02 14 E4 00 AB 00 00 00 02 06 01 14 23 F2
20 7D 70 06 F1 14 23 F2 20 7D 70 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
============= PCIe capture [RECV] =============
01 00 00 12 00 30 00 AB B4 1A 00 03 54 09 0A 01
00 00 00 00 E7 8A 01 00 10 00 00 00 00 00 00 00
00 00 00 00 00 00 40 00 71 00 00 00 01 00 00 00
00 00 00 00 00 00 00 00 01 00 02 06 01 14 23 F2
20 7D 71 06 F1 14 23 F2 20 7D 71 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
============= PCIe capture [RECV] =============
10 00 00 12 00 00 00 AB B4 1A 00 03 85 09 0A 01
02 00 01 00 46 AB 00 00 2A 00 00 00 00 00 00 00
00 00 00 00 00 01 01 00 63 00 00 00 00 16 D8 14
E4 16 02 14 E4 00 AB 00 00 00 02 06 01 14 23 F2
20 7D 70 06 F1 14 23 F2 20 7D 70 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
============= PCIe capture [RECV] =============
01 00 00 12 00 30 00 AB B4 1A 00 03 55 09 0A 01
00 00 00 00 18 AB 01 00 2A 00 00 00 00 00 00 00
00 00 00 00 00 01 01 00 63 00 00 00 00 16 D8 14
E4 16 02 14 E4 00 AB 00 01 00 02 06 01 14 23 F2
20 7D 71 06 F1 14 23 F2 20 7D 71 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00