背景
为支持内存故障预测规则可配置,需要提供资源协作接口及redfish接口供查询及配置内存故障预测规则,配置内容包括时间阈值及次数阈值。
关联ISSUE
https://gitcode.com/openUBMC/rackmount/issues/795
整体方案
新增 /redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies 集合资源 ,支持GET、PATCH操作来查询、配置内存故障预测规则。
新增资源树协作接口及接口下方法、属性,供获取及设置内存故障预测规则。
评审点
评审点1:
URI:/redfish/v1/Managers/{ManagerId}/FDMService(已有)下新增MemHardFailureDetectionPolicies资源链接
评审点2:
新增集合资源MemHardFailureDetectionPolicies
URI: /redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies
评审点3:
新增集合下资源
/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}
评审点4:
新增path:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}
新增资源树协作接口bmc.kepler.Managers.FDMService.Memory.Diagnosis
接口下新增属性:MemType
接口下新增方法:GetMemHardFailureDetectionPolicyIds、GetMemHardFailureDetectionPolicy、SetMemHardFailureDetectionPolicy
详细描述
评审点1:新增MemHardFailureDetectionPolicies资源链接
资源URI:/redfish/v1/Managers/manager_id/FDMService
资源版本:HwFDMService.v1_1_0
属性列表:
| 属性名 | 类型 | 示例/默认值/取值约束 | readonly | 易变属性 | 实现PATCH | 操作权限 | 描述 |
|---|---|---|---|---|---|---|---|
| MemHardFailureDetectionPolicies | object | { “@odata.id”: /redfish/v1/Managers/manager_id/FDMService/MemHardFailureDetectionPolicies" } |
/ | / | / | ReadOnly | 此属性包含指向 MemHardFailureDetectionPolicies资源集合的链接 |
Schema定义:
schema文件名:hwfdmservice.v1_1_0.json
"MemHardFailureDetectionPolicies": {
"additionalProperties": false,
"description": "The link to the collection of memory hard failure detection policies used to query and configure memory hard failure detection policies.",
"longDescription": "The link to the collection of memory hard failure detection policies used to query and configure memory hard failure detection policies.",
"patternProperties": {
"^([a-zA-Z_][a-zA-Z0-9_]*)?@(odata|Redfish|Message|Privileges)\\.[a-zA-Z_][a-zA-Z0-9_.]+$": {
"description": "This property shall specify a valid odata or Redfish property",
"type": [
"array",
"boolean",
"integer",
"number",
"null",
"object",
"string"
]
}
},
"properties": {
"@odata.id": {
"$ref": "http://redfish.dmtf.org/schemas/odata-v4.json#/definitions/id"
}
},
"required": [
"@odata.id"
],
"type": "object"
}
请求样例
操作类型:GET
https://device_ip/redfish/v1/Managers/1/FDMService
请求消息体:无
响应样例
{
{
"@odata.context": "/redfish/v1/$metadata#Managers/1/FDMService/$entity",
"@odata.id": "/redfish/v1/Managers/1/FDMService",
"@odata.type": "#HwFDMService.v1_1_0.HwFDMService",
"Id": "FDMService",
"Name": "Fault Diagnostic Management Servcie",
"MemPoorContactAlarmEnabled": true,
"DiagnoseFailurePolicy": "NoAction",
"DiagnoseSuccessPolicy": "NoAction",
"MemHardFailureDetectionMode": "ExpertRule",
"MemHardFailureDetectionPolicies": {
"@odata.id":"/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies"
},
"MemFaultIsolationEnabled": false,
"MemFaultIsolationMode": "SelfDecision",
"MemFaultIsolationSubFunctionSwitch": {
"MemPageOfflineEnabled": false,
"MemADDDCEnabled": false,
"MemSoftPPREnabled": false,
"MemHardPPREnabled": false,
"MemACLSEnabled": false,
"MemRowSparingEnabled": true
},
"MaxMemPageOfflineCount": 10000,
…
}
评审点2:新增集合资源/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies
资源URI:/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies
资源版本:openUBMCMemoryHardFailureDetectionPolicyCollection
属性列表:
- 注1 :下表隐去Redfish规范强制要求的通用基础属性
- 注2 :列表内属性默认均支持GET 操作。
| 属性名 | 类型 | 示例/默认值/取值约束 | readonly | 易变属性 | 实现PATCH | 操作权限 | 描述 |
|---|---|---|---|---|---|---|---|
| Members@odata.count | integer | 3 | / | / | / | ReadOnly | 此属性表示MemHardFailureDetectionPolicies中@odata的数量 |
| Members | array | [{“@odata.id”: “/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1”},{“@odata.id”: “/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/2”},{“@odata.id”: “/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/3”}] | / | / | / | ReadOnly | 此属性包含指向资源集合MemHardFailureDetectionPolicies中各资源的链接 |
Schema定义:
新增schema文件:openubmcmemoryhardfailuredetectionpolicycollection.json
"@odata.context": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/context"
},
"@odata.etag": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/etag"
},
"@odata.id": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/id"
},
"@odata.type": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/type"
},
"Description": {
"anyOf": [
{
"$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Description"
},
{
"type": "null"
}
],
"readonly": true
},
"Members": {
"description": "The members of this collection.",
"items": {
"$ref": "http://redfish.dmtf.org/schemas/v1/PowerDistribution.json#/definitions/PowerDistribution"
},
"longDescription": "This property shall contain an array of links to the members of this collection.",
"readonly": true,
"type": "array"
},
"Members@odata.count": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/count"
},
"Members@odata.nextLink": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/nextLink"
},
"Name": {
"$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Name",
"readonly": true
},
"Oem": {
"$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Oem",
"description": "The OEM extension property.",
"longDescription": "This property shall contain the OEM extensions. All values for properties contained in this object shall conform to the Redfish Specification-described requirements."
}
},
"required": [
"Members",
"Members@odata.count",
"@odata.id",
"@odata.type",
"Name"
],
"type": "object"
}
请求样例
操作类型:GET
https://ip/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies
请求消息体:无
响应样例
{
"@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies",
"@odata.type": "#openUBMCMemoryHardFailureDetectionPolicyCollection.openUBMCMemoryHardFailureDetectionPolicyCollection",
"@odata.context": "/redfish/v1/$metadata#openUBMCMemoryHardFailureDetectionPolicyCollection.openUBMCMemoryHardFailureDetectionPolicyCollection",
"Name": "Memory Hard Failure Detection Policy Collection",
"Members@odata.count": 3,
"Members": [
{
"@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1"
},
{
"@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/2"
},
{
"@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/3"
}
]
}
评审点3:新增资源/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}
资源URI:/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}
资源版本:openUBMCMemoryHardFailureDetectionPolicy.v1_0_0
属性列表:
- @odata.type注1 :下表隐去Redfish规范强制要求的通用基础属性(如@odata.id 、@odata.type 、Id 、Name ,集合资源还包括:Members、Members@odata.count),默认均需实现。
- 注2 :列表内属性默认均支持GET 操作。
| 属性名 | 类型 | 示例/默认值/取值约束 | readonly | 易变属性 | 实现PATCH | 操作权限 | 描述 |
|---|---|---|---|---|---|---|---|
| FailureType | string(enum) | 故障类型,取值范围: - CellFailure,cell故障 - RowFailure,行故障 |
true | 否 | 否 | ReadOnly | 用于指定资源对应的故障类型 |
| RuleName | string(enum) | 预测规则名称,取值范围: - CEInSameAddress,同地址可纠正错误数量规则,仅对cell故障适用 - CEInSameRow,同行可纠正错误数量规则,仅对行故障适用 - MultiBurstError,单个CE出现多burst错误规则,仅对行故障适用 |
true | 否 | 否 | ReadOnly | 用于指定资源对应的预测规则 |
| PeriodSeconds | Integer | 观察周期,单位: 秒,取值范围(0,2592000] | false | 否 | 是 | GET: ReadOnly PATCH: DiagnoseMgmt |
预测规则的观察周期 |
| Threshold | Integer | 观察周期内的阈值,取值范围[2,9] | false | 否 | 是 | GET: ReadOnly PATCH: DiagnoseMgmt |
观察周期内的阈值 |
Schema定义:
新增schema文件:openubmcmemoryhardfailuredetectionpolicy.v1_0_0.json
{
"$schema": "http://redfish.dmtf.org/schemas/v1/redfish-schema-v1.json",
"$id": "http://redfish.dmtf.org/schemas/v1/openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.json",
"title": "#openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.openUBMCMemoryHardFailureDetectionPolicy"
"$ref": "#/definitions/openUBMCMemoryHardFailureDetectionPolicy",
"copyright": "Copyright 2014-2026 DMTF. For the full DMTF copyright policy, see http://www.dmtf.org/about/policies/copyright",
"owningEntity": "openUBMC",
"definitions": {
"openUBMCMemoryHardFailureDetectionPolicy": {
"type": "object",
"patternProperties": {
"^([a-zA-Z_][a-zA-Z0-9_]*)?@(odata|Redfish|Message|Privileges)\\.[a-zA-Z_][a-zA-Z0-9_.]+$": {
"type": [
"array",
"boolean",
"number",
"null",
"object",
"string"
],
"description": "This property shall specify a valid odata or Redfish property."
}
},
"additionalProperties": false,
"properties": {
"@odata.context": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata.4.0.0.json#/definitions/context"
},
"@odata.id": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata.4.0.0.json#/definitions/id"
},
"@odata.type": {
"$ref": "http://redfish.dmtf.org/schemas/v1/odata.4.0.0.json#/definitions/type"
},
"Id": {
"$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Id",
"readonly": true
},
"Name": {
"$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Name",
"readonly": true
},
"FailureType": {
"type": "string",
"readonly": true,
"description": "The memory hard failure type.",
"longDescription": "The memory hard failure type."
},
"RuleName": {
"type": "string",
"readonly": true,
"description": "The memory hard failure detection rule name.",
"longDescription": "The memory hard failure detection rule name."
},
"PeriodSeconds": {
"type": [
"number",
"null"
],
"readonly": false,
"description": "Detection period by seconds.",
"longDescription": "Detection period by seconds."
},
"Threshold": {
"type": [
"number",
"null"
],
"readonly": false,
"description": "The count threshold for the rule.",
"longDescription": "The count threshold for the rule."
},
},
"required": [
"@odata.context",
"@odata.id",
"@odata.type",
"Id",
"Name",
"FailureType",
"RuleName",
"PeriodSeconds",
"Threshold"
],
"description": "The MemoryHardFailureDetectionPolicy resource.",
"longDescription": "The MemoryHardFailureDetectionPolicy resource."
}
}
}
GET请求样例
操作类型:GET
https://ip/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1
请求消息体:无
GET响应样例
{
"@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1",
"@odata.context": "/redfish/v1/$metadata#openUBMCMemoryHardFailureDetectionPolicy.openUBMCMemoryHardFailureDetectionPolicy",
"@odata.type": "#openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.openUBMCMemoryHardFailureDetectionPolicy",
"Id": "1",
"Name": "Memory Hard Failure Detection Policy",
"FailureType": "CellFailure",
"RuleName": "CEInSameAddress",
"PeriodSeconds": 3600,
"Threshold": 3
}
PATCH请求样例
操作类型:PATCH
https://ip/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1
权限: DiagnoseMgmt
请求消息体:
{
"PeriodSeconds": 3600,
"Threshold": 2
}
PATCH响应样例
{
"@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1",
"@odata.context": "/redfish/v1/$metadata#openUBMCMemoryHardFailureDetectionPolicy.openUBMCMemoryHardFailureDetectionPolicy",
"@odata.type": "#openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.openUBMCMemoryHardFailureDetectionPolicy",
"Id": "1",
"Name": "Memory Hard Failure Detection Policy",
"FailureType": "CellFailure",
"RuleName": "CEInSameAddress",
"PeriodSeconds": 3600,
"Threshold": 2
}
评审点4:新增资源协作接口及接口下属性、方法
path:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}(新增)
interface: bmc.kepler.Managers.FDMService.Memory.Diagnosis(新增)
变化类型:新增接口及接口下方法
应用场景:查询及配置内存故障预测规则
新增属性
| 属性名称 | 签名 | 只读 | 变化通知 | 属性描述 | 访问权限 | 属性来源 | 持久化类型 | 易变属性 |
|---|---|---|---|---|---|---|---|---|
| MemType | s | true | false | 介质类型,示例:“DDR”,默认值:“” | Read: ReadOnly | csr | false | false |
新增方法
| 方法名称 | 请求签名 | 请求参数描述 | 响应签名 | 响应参数描述 | 方法描述 | 访问权限 |
|---|---|---|---|---|---|---|
| GetMemHardFailureDetectionPolicyIds | / | / | ay | 所有规则的Id编号,如[1,2,3] | 获取所有规则的Id编号 | ReadOnly |
| GetMemHardFailureDetectionPolicy | y | 标志规则的Id编号 | a{ss} | 对应规则的观察周期及观察周期内的阈值, 如: [{“PeriodSeconds”: “3600”}, {“Threshold”: “3”}] |
获取指定预测规则的观察周期及观察周期内的阈值 | ReadOnly |
| SetMemHardFailureDetectionPolicy | ya{ss} | 第一个参数y标志规则的Id编号,取值范围:[1,3] 第二个参数a{ss}标志需设置的阈值类型及设置的阈值,如:[{“PeriodSeconds”: “3600”}, {“Threshold”: “3”}] |
/ | / | 设置指定规则的阈值 | DiagnoseMgmt |
是否准备好AI预审
是
评审结论
通过,具体结论如下:
1、同意在FDMService资源新增下层MemHardFailureDetectionPolicies资源的链接
URI:/redfish/v1/Managers/{ManagerId}/FDMService
操作类型:GET
响应说明:增加MemHardFailureDetectionPolicies的链接,详细说明见评审点1
2、同意新增集合资源MemHardFailureDetectionPolicies
URI: /redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies
操作类型:GET
操作权限:ReadOnly
响应说明:详细说明见评审点2
3、同意新增MemHardFailureDetectionPolicy资源
URI:/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}
资源下包括如下属性:
- FailureType: 用于指定资源对应的故障类型,属性类型为 string(enum),只读属性,仅支持GET操作,操作权限为ReadOnly
- RuleName: 用于指定资源对应的预测规则,属性类型为 string(enum),只读属性,仅支持GET操作,操作权限为ReadOnly
- PeriodSeconds: 表示预测规则的观察周期,属性类型为 integer,支持GET和PATHC操作,操作权限为GET: ReadOnly PATCH: DiagnoseMgmt
- Threshold:表示观察周期内的阈值,属性类型为 integer,支持GET和PATHC操作,操作权限为GET: ReadOnly PATCH: DiagnoseMgmt。
4、同意新增如下资源协作对象路径和接口
path:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}
interface:bmc.kepler.Managers.FDMService.Memory.Diagnosis
属性:
- MemType:表示介质类型,只读属性,权限ReadOnly
方法:
- GetMemHardFailureDetectionPolicyIds:用于获取所有规则的Id编号;方法权限为 ReadOnly;无请求参数;响应签名为 ay表示所有规则的Id编号;
- GetMemHardFailureDetectionPolicy:用于获取指定预测规则的观察周期及观察周期内的阈值;方法权限为 ReadOnly;请求参数为y表示规则的Id编号;响应签名为 a{ss}表示规则的观察周期及观察周期内的阈值;
- SetMemHardFailureDetectionPolicy:用于设置指定规则的阈值;方法权限为 DiagnoseMgmt;请求参数为ya{ss},其中y表示规则的Id编号,a{ss}表示需设置的阈值类型及设置的阈值;无响应参数。
遗留问题
1、资源协作对象FDMConfig为“上帝类”, 考虑能否分层设计
结论:分层设计,新增资源协作对象路径改为:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}