背景
通过在多个大模型训练过程中对AI服务器的实时资源使用率进行监测,发现在长时间稳定训练阶段,NPU一直处于高负荷工作状态,而CPU则运行在低负荷。BMC侧需提供北向Redfish接口等,允许上层网管或客户设置及查询节能模式和节能参数,并根据模式和参数向BMA/NPM传递调频参数,以实现在大模型训练时的节能能力。
关联ISSUE
[需求]: 支持训练场景xpu调频调压能力 - openUBMC/mdb_interface - GitCode
评审点
1、新增场景化节能是否支持、节能模式的资源协作接口属性
2、新增获取场景化节能使能状态的资源协作接口方法
3、新增场景化节能模式查询Redfish资源
4、新增场景化节能模式设置Redfish资源
5、新增场景化节能使能状态查询Redfish资源
6、新增查询OS能效信息的IPMI命令
7、新增设置OS能效信息的IPMI命令
详细描述
1、新增资源协作接口属性IsPowerModeSupported、PowerMode
资源path: /bmc/kepler/Chassis/:ChassisId/EnergySavingScene(已存在)
资源interface: bmc.kepler.Chassis.EnergySavingScene(已存在)
变化类型:新增属性
应用场景:上层网管或客户设置/查询节能模式
详细描述:
| 属性名称 | 变化类型 | 签名 | 读写&权限 | 持久化 | 变化通知 | 属性来源 | 接口说明 | 接口约束 |
|---|---|---|---|---|---|---|---|---|
| IsPowerModeSupported | 新增属性 | b | 读:ReadOnly | 不持久化 | false | CSR | 表示系统当前是否支持场景化节能 false 不支持(默认) true 支持 |
无 |
| PowerMode | 新增属性 | s | 写:PowerMgmt 读:ReadOnly |
掉电持久化 | false | 用户设置 | 表示系统当前的节能模式 取值范围: BalancedPerformance(默认) OSControlled EfficiencyFavorPerformance EfficiencyFavorPower MaximumPerformance PowerSaving Static OEM |
无 |
2、新增资源协作接口方法GetEnergySavingStatus
资源path: /bmc/kepler/Chassis/:ChassisId/EnergySavingScene(已存在)
资源interface: bmc.kepler.Chassis.EnergySavingScene(已存在)
变化类型:新增方法
应用场景:上层网管或客户查询场景化节能使能状态
详细描述:
| 方法名称 | 变化类型 | 请求签名 | 请求参数 | 响应签名 | 响应参数 | 访问权限 | 接口说明 | 接口约束 |
|---|---|---|---|---|---|---|---|---|
| GetEnergySavingStatus | 新增方法 | 无 | 无 | s | EnergySavingStatus | ReadOnly | 获取场景化节能使能状态 EnergySavingStatus取值范围: Activated 生效 Inactivated 不生效 Unknown 未知 |
无 |
3、新增场景化节能模式查询Redfish资源(标准资源)
uri:https://device_ip/redfish/v1/Systems/{system_id}
变化类型:新增属性
操作类型:GET
应用场景:上层网管或客户查询节能模式
详细描述:
| 属性名称 | 取值类型 | 说明 | 取值范围 | 默认值 | 操作权限 | 是否频繁变化并需要屏蔽变化事件 | 约束 |
|---|---|---|---|---|---|---|---|
| PowerMode | string null |
表示系统当前的节能模式 | BalancedPerformance(默认) OSControlled EfficiencyFavorPerformance EfficiencyFavorPower MaximumPerformance PowerSaving Static OEM |
支持场景化节能时,默认值为BalancedPerformance; 不支持场景化节能时,取值为null |
ReadOnly | 否 | 无 |
schema说明
标准资源,schema需升级至v1_22_0
redfish.dmtf.org/schemas/v1/ComputerSystem.v1_22_0.json
"PowerMode": {
"anyOf": [
{
"$ref": "#/definitions/PowerMode"
},
{
"type": "null"
}
],
"description": "The power mode setting of the computer system.",
"longDescription": "This property shall contain the computer system power mode setting.",
"readonly": false,
"versionAdded": "v1_15_0"
}
"PowerMode": {
"enum": [
"MaximumPerformance",
"BalancedPerformance",
"PowerSaving",
"Static",
"OSControlled",
"OEM",
"EfficiencyFavorPower",
"EfficiencyFavorPerformance"
],
"enumDescriptions": {
"BalancedPerformance": "The system performs at the highest speeds while utilization is high and performs at reduced speeds when the utilization is low.",
"EfficiencyFavorPerformance": "The system performs at reduced speeds at all utilizations to save power while attempting to maintain performance. This mode differs from `EfficiencyFavorPower` in that more performance is retained but less power is saved.",
"EfficiencyFavorPower": "The system performs at reduced speeds at all utilizations to save power at the cost of performance. This mode differs from `PowerSaving` in that more performance is retained and less power is saved. This mode differs from `EfficiencyFavorPerformance` in that less performance is retained but more power is saved.",
"MaximumPerformance": "The system performs at the highest speeds possible.",
"OEM": "The system power mode is OEM-defined.",
"OSControlled": "The system power mode is controlled by the operating system.",
"PowerSaving": "The system performs at reduced speeds to save power.",
"Static": "The system power mode is static."
},
"enumLongDescriptions": {
"BalancedPerformance": "This value shall indicate the system performs at the highest speeds possible when the utilization is high and performs at reduced speeds when the utilization is low to save power. This mode is a compromise between `MaximumPerformance` and `PowerSaving`.",
"EfficiencyFavorPerformance": "This value shall indicate the system performs at reduced speeds at all utilizations to save power while attempting to maintain performance. This mode differs from `EfficiencyFavorPower` in that more performance is retained but less power is saved. This mode differs from 'MaximumPerformance' in that power is saved at the cost of some performance. This mode differs from 'BalancedPerformance' in that power saving occurs at all utilizations.",
"EfficiencyFavorPower": "This value shall indicate the system performs at reduced speeds at all utilizations to save power at the cost of performance. This mode differs from `PowerSaving` in that more performance is retained and less power is saved. This mode differs from `EfficiencyFavorPerformance` in that less performance is retained but more power is saved. This mode differs from 'BalancedPerformance' in that power saving occurs at all utilizations.",
"MaximumPerformance": "This value shall indicate the system performs at the highest speeds possible. This mode should be used when performance is the top priority.",
"OEM": "This value shall indicate the system performs at an OEM-defined power mode.",
"OSControlled": "This value shall indicate the system performs at an operating system-controlled power mode.",
"PowerSaving": "This value shall indicate the system performs at reduced speeds to save power. This mode should be used when power saving is the top priority.",
"Static": "This value shall indicate the system performs at a static base speed."
},
"enumVersionAdded": {
"EfficiencyFavorPerformance": "v1_22_0",
"EfficiencyFavorPower": "v1_22_0"
},
"type": "string"
}
示例
请求头: X-Auth-Token: auth_value
请求消息体:无
响应样例:
{
"@odata.context": "/redfish/v1/$metadata#ComputerSystem.ComputerSystem",
"@odata.id":"/redfish/v1/Systems/1",
"@odata.type":"#ComputerSystem.v1_2_0.ComputerSystem",
"Id":"1",
"Name": "ComputerSystem",
...
"PowerMode": "BalancedPerformance",
...
}
4、新增场景化节能模式设置Redfish资源(标准资源)
uri:https://device_ip/redfish/v1/Systems/{system_id}
变化类型:新增属性
操作类型:PATCH
应用场景:上层网管或客户设置节能模式
详细描述:
| 属性名称 | 取值类型 | 说明 | 取值范围 | 默认值 | 操作权限 | 是否频繁变化并需要屏蔽变化事件 | 约束 |
|---|---|---|---|---|---|---|---|
| PowerMode | string null |
表示系统当前的节能模式 | BalancedPerformance(默认) OSControlled EfficiencyFavorPerformance EfficiencyFavorPower MaximumPerformance PowerSaving Static OEM |
支持场景化节能时,默认值为BalancedPerformance; 不支持场景化节能时,取值为null |
PowerMgmt | 否 | 无 |
schema说明
标准资源,schema需升级至v1_22_0
redfish.dmtf.org/schemas/v1/ComputerSystem.v1_22_0.json
"PowerMode": {
"anyOf": [
{
"$ref": "#/definitions/PowerMode"
},
{
"type": "null"
}
],
"description": "The power mode setting of the computer system.",
"longDescription": "This property shall contain the computer system power mode setting.",
"readonly": false,
"versionAdded": "v1_15_0"
}
"PowerMode": {
"enum": [
"MaximumPerformance",
"BalancedPerformance",
"PowerSaving",
"Static",
"OSControlled",
"OEM",
"EfficiencyFavorPower",
"EfficiencyFavorPerformance"
],
"enumDescriptions": {
"BalancedPerformance": "The system performs at the highest speeds while utilization is high and performs at reduced speeds when the utilization is low.",
"EfficiencyFavorPerformance": "The system performs at reduced speeds at all utilizations to save power while attempting to maintain performance. This mode differs from `EfficiencyFavorPower` in that more performance is retained but less power is saved.",
"EfficiencyFavorPower": "The system performs at reduced speeds at all utilizations to save power at the cost of performance. This mode differs from `PowerSaving` in that more performance is retained and less power is saved. This mode differs from `EfficiencyFavorPerformance` in that less performance is retained but more power is saved.",
"MaximumPerformance": "The system performs at the highest speeds possible.",
"OEM": "The system power mode is OEM-defined.",
"OSControlled": "The system power mode is controlled by the operating system.",
"PowerSaving": "The system performs at reduced speeds to save power.",
"Static": "The system power mode is static."
},
"enumLongDescriptions": {
"BalancedPerformance": "This value shall indicate the system performs at the highest speeds possible when the utilization is high and performs at reduced speeds when the utilization is low to save power. This mode is a compromise between `MaximumPerformance` and `PowerSaving`.",
"EfficiencyFavorPerformance": "This value shall indicate the system performs at reduced speeds at all utilizations to save power while attempting to maintain performance. This mode differs from `EfficiencyFavorPower` in that more performance is retained but less power is saved. This mode differs from 'MaximumPerformance' in that power is saved at the cost of some performance. This mode differs from 'BalancedPerformance' in that power saving occurs at all utilizations.",
"EfficiencyFavorPower": "This value shall indicate the system performs at reduced speeds at all utilizations to save power at the cost of performance. This mode differs from `PowerSaving` in that more performance is retained and less power is saved. This mode differs from `EfficiencyFavorPerformance` in that less performance is retained but more power is saved. This mode differs from 'BalancedPerformance' in that power saving occurs at all utilizations.",
"MaximumPerformance": "This value shall indicate the system performs at the highest speeds possible. This mode should be used when performance is the top priority.",
"OEM": "This value shall indicate the system performs at an OEM-defined power mode.",
"OSControlled": "This value shall indicate the system performs at an operating system-controlled power mode.",
"PowerSaving": "This value shall indicate the system performs at reduced speeds to save power. This mode should be used when power saving is the top priority.",
"Static": "This value shall indicate the system performs at a static base speed."
},
"enumVersionAdded": {
"EfficiencyFavorPerformance": "v1_22_0",
"EfficiencyFavorPower": "v1_22_0"
},
"type": "string"
}
示例
请求头:
X-Auth-Token: auth_value
Content-Type: header_type
If-Match: ifmatch_value
请求消息体:
{
"PowerMode": "BalancedPerformance"
}
响应样例:
{
"@odata.context": "/redfish/v1/$metadata#ComputerSystem.ComputerSystem",
"@odata.id":"/redfish/v1/Systems/1",
"@odata.type":"#ComputerSystem.v1_2_0.ComputerSystem",
"Id":"1",
"Name": "ComputerSystem",
...
"PowerMode": "BalancedPerformance",
...
}
5、新增场景化节能使能状态查询Redfish资源(OEM资源)
uri:https://device_ip/redfish/v1/Managers/{manager_id}/EnergySavingService
变化类型:新增属性
操作类型:GET
应用场景:上层网管或客户查询场景化节能使能状态
详细描述:
| 属性名称 | 取值类型 | 说明 | 取值范围 | 默认值 | 操作权限 | 是否频繁变化并需要屏蔽变化事件 | 约束 |
|---|---|---|---|---|---|---|---|
| EnergySavingStatus | string null |
表示系统当前的场景化节能使能状态 | Activated 生效 Inactivated 不生效 Unknown 未知 |
支持场景化节能时,为实时获取的场景化节能使能状态;不支持场景化节能时,取值为null | ReadOnly | 否 | 无 |
schema说明
"EnergySavingStatus": {
"type": [
"string",
"null"
],
"readonly": true,
"description": "The energy saving status of the computer system.",
"longDescription": "This property shall contain the computer system energy saving status.",
"enum": [
"Activated",
"Inactivated",
"Unknown"
],
"enumDescriptions": {
"Activated": "The system energy saving status is activated.",
"Inactivated": "The system energy saving status is inactivated.",
"Unknown": "The system energy saving status is unknown."
},
"enumLongDescriptions": {
"Activated": "This value indicates that the energy-saving system is present and actived.",
"Inactivated": "This value indicates that the energy-saving system is present but not activated.",
"Unknown": "This value indicates that an energy-saving system is present, but its activation status cannot be obtained at the moment."
},
}
示例
请求头: X-Auth-Token: auth_value
请求消息体:无
响应样例:
{
"@odata.context": "/redfish/v1/$metadata#EnergySavingService.EnergySavingService",
"@odata.id": "/redfish/v1/Managers/1/EnergySavingService",
"@odata.type": "#EnergySavingService.v1_0_0.EnergySavingService",
"Description": "EnergySavingService Settings",
"Id": "EnergySavingService",
"Name": "EnergySavingService",
"DynamicEnergySavingScene": "Default",
"EnergySavingStatus": "Unknown",
"Actions": {
"#EnergySavingService.SetScene": {
"target": "/redfish/v1/Managers/1/EnergySavingService/Actions/EnergySavingService.SetScene",
"@Redfish.ActionInfo": "/redfish/v1/Managers/1/EnergySavingService/SetSceneActionInfo"
}
}
}
6、新增查询OS能效信息的IPMI命令(OS进程调用)
IPMI命令字:netfn 30h,cmd 92h
变化类型:新增参数取值
应用场景:OS进程调用查询能效信息
操作类型:GET
操作权限:ReadOnly
详细描述:
参数说明
| 字节 | 名称 | 取值说明 |
|---|---|---|
| 1:3 | Manufacturer ID | 0x0007db,低字节优先 |
| 4 | Sub Command | 子命令,68h |
| 5 | Requester Identifier | 01h - Node Power Management 02h - System Management Software |
| 6 | Parameter Selector | 01h - OS能效配置 02h - OS节能状态 |
| 7:8 | Offset | 读取的数据相对于起始位置的偏移,低字节优先 |
| 9 | Length | 读取的数据长度 |
响应说明
| 字节 | 名称 | 取值说明 |
|---|---|---|
| 1 | Completion Code | 完成码 00h Command Completed Normally D5h Cannot execute command |
| 2:4 | Manufacturer ID | 0x0007db,低字节优先 |
| 5 | Flag | 完成标识符 00h - 未完成 01h - 已完成 |
| 6 | Data Format | 数据类型 01h - json 02h - xml 03h - tlv |
| 7 | Length | 返回的数据长度 |
| 8:N | Data | 返回的数据 |
示例
请求
ipmitool raw 0x30 0x92 0xdb 0x07 0x00 0x68 0x01 0x01 0x00 0x00 0x17
响应
00 db 07 00 01 01 17 7b 22 43 50 55 46 72 65 71 75 65 6e 63 79 4c 65 76 65 6c 22 3a 34 7d
7、新增设置OS能效信息的IPMI命令(OS进程调用)
IPMI命令字:netfn 30h,cmd 92h
变化类型:新增参数取值
应用场景:OS进程调用设置能效信息
操作类型:SET
操作权限:PowerMgmt
详细描述:
参数说明
| 字节 | 名称 | 取值说明 |
|---|---|---|
| 1:3 | Manufacturer ID | 0x0007db,低字节优先 |
| 4 | Sub command | 子命令,69h |
| 5 | Requester Identifier | 01h - Node Power Management 02h - System Management Software |
| 6 | Parameter Selector | 01h - OS能效配置 02h - OS节能状态 |
| 7 | Data Format | 数据类型 01h - json 02h - xml 03h - tlv |
| 8 | Operation | 00h - Write Prepare 01h - Write Data 03h - Write Finish |
| (9) | Data Checksum | 当Operation为Write Data时,本字段表示写入的数据内容累加和。 当Operation为Write Finish时,本字段表示整个文件的累加和。 本字段可选,仅当Operation为Write Data和Write Finish时需要提供本字段。 |
| (10:11) | Offset | 当Operation为Write Prepare时:File Size, LSB First,文件大小。 当Operation为Write Data时:Offset to write,LSB First,数据段相对于File开始位置的偏移。 本字段可选,仅当Operation为Write Prepare,Write Data时需要提供本字段。 |
| (12:n) | Data | 写入的数据 本字段可选,仅当Operation为Write Data时需要提供本字段。 |
响应说明
| 字节 | 名称 | 取值说明 |
|---|---|---|
| 1 | Completion Code | 完成码 00h Command Completed Normally 80h Checksum Failed D5h Cannot execute command |
| 2:4 | Manufacturer ID | 0x0007db,低字节优先 |
示例
请求
ipmitool raw 0x30 0x92 0xdb 0x07 0x00 0x69 0x01 0x02 0x01 0x01 0x01 0x01 0x00 0x01
响应
00 db 07 00
评审结论
决策点1:同意新增资源协作接口属性表示是否支持场景化节能,以及当前的节能模式,具体如下:
path: /bmc/kepler/Chassis/:ChassisId/EnergySavingScene
interface: bmc.kepler.Chassis.EnergySavingScene
变化类型:新增属性
| 属性名称 | 签名 | 读写&权限 | 持久化 | 变化通知 | 属性来源 |
|---|---|---|---|---|---|
| IsPowerModeSupported | b | 只读 读:ReadOnly |
不持久化 | false | CSR |
| PowerMode | s | 读写 写:PowerMgmt 读:ReadOnly |
掉电持久化 | false | 用户设置 |
决策点2:同意新增资源协作接口方法获取系统当前场景化节能使能状态,具体如下:
path: /bmc/kepler/Chassis/:ChassisId/EnergySavingScene
interface: bmc.kepler.Chassis.EnergySavingScene
变化类型:新增方法
| 方法名称 | 请求签名 | 响应签名 | 访问权限 | 接口说明 |
|---|---|---|---|---|
| GetEnergySavingStatus | NA | s | ReadOnly | 获取场景化节能使能状态 |
决策点3&4:同意redfish接口新增属性查询和设置场景化节能模式,具体如下:
uri:/redfish/v1/Systems/{SystemId}
操作类型:GET、PATCH
变化类型:新增属性
| 属性名称 | 取值类型 | 取值范围 | 操作权限 | 是否频繁变化并需要屏蔽变化事件 | 约束 |
|---|---|---|---|---|---|
| PowerMode | string null |
BalancedPerformance(默认) OSControlled EfficiencyFavorPerformance EfficiencyFavorPower MaximumPerformance PowerSaving Static OEM |
GET:ReadOnly PATCH:PowerMgmt |
否 | 不支持场景化节能时取值为null |
决策点5:同意redfish接口新增OEM属性查询场景化节能使能状态,具体如下:
uri:/redfish/v1/Managers/{ManagerId}/EnergySavingService
操作类型:GET
变化类型:新增属性
| 属性名称 | 取值类型 | 取值范围 | 操作权限 | 是否频繁变化并需要屏蔽变化事件 | 约束 |
|---|---|---|---|---|---|
| EnergySavingStatus | string null |
Activated 生效 Inactivated 不生效 Unknown 未知 |
ReadOnly | 否 | 无 |
决策点6:同意新增查询OS能效信息的IPMI命令(仅带内通道,具体定义参见评审点描述)
| netfn | cmd | sub command | 操作类型 | 操作权限 |
|---|---|---|---|---|
| 30h | 92h | 68h | GET | ReadOnly |
决策点7:同意新增设置OS能效信息的IPMI命令(仅带内通道,具体定义参见评审点描述)
| netfn | cmd | sub command | 操作类型 | 操作权限 |
|---|---|---|---|---|
| 30h | 92h | 69h | SET | PowerMgmt |