【已评审】新增资源协作接口及redfish接口,支持查询及配置内存故障预测规则

背景

为支持内存故障预测规则可配置,需要提供资源协作接口及redfish接口供查询及配置内存故障预测规则,配置内容包括时间阈值及次数阈值。

关联ISSUE

https://gitcode.com/openUBMC/rackmount/issues/795

整体方案

新增 /redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies 集合资源 ,支持GET、PATCH操作来查询、配置内存故障预测规则。

新增资源树协作接口及接口下方法、属性,供获取及设置内存故障预测规则。

评审点

评审点1:

URI:/redfish/v1/Managers/{ManagerId}/FDMService(已有)下新增MemHardFailureDetectionPolicies资源链接

评审点2:

新增集合资源MemHardFailureDetectionPolicies

URI: /redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies

评审点3:

新增集合下资源

/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}

评审点4:

新增path:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}

新增资源树协作接口bmc.kepler.Managers.FDMService.Memory.Diagnosis
接口下新增属性:MemType
接口下新增方法:GetMemHardFailureDetectionPolicyIds、GetMemHardFailureDetectionPolicy、SetMemHardFailureDetectionPolicy

详细描述

评审点1:新增MemHardFailureDetectionPolicies资源链接

资源URI:/redfish/v1/Managers/manager_id/FDMService
资源版本:HwFDMService.v1_1_0

属性列表

属性名 类型 示例/默认值/取值约束 readonly 易变属性 实现PATCH 操作权限 描述
MemHardFailureDetectionPolicies object {
@odata.id”: /redfish/v1/Managers/manager_id/FDMService/MemHardFailureDetectionPolicies"
}
/ / / ReadOnly 此属性包含指向 MemHardFailureDetectionPolicies资源集合的链接

Schema定义
schema文件名:hwfdmservice.v1_1_0.json

"MemHardFailureDetectionPolicies": {
                    "additionalProperties": false,
                    "description": "The link to the collection of memory hard failure detection policies used to query and configure memory hard failure detection policies.",
                    "longDescription": "The link to the collection of memory hard failure detection policies used to query and configure memory hard failure detection policies.",
                    "patternProperties": {
                        "^([a-zA-Z_][a-zA-Z0-9_]*)?@(odata|Redfish|Message|Privileges)\\.[a-zA-Z_][a-zA-Z0-9_.]+$": {
                            "description": "This property shall specify a valid odata or Redfish property",
                            "type": [
                                "array",
                                "boolean",
                                "integer",
                                "number",
                                "null",
                                "object",
                                "string"
                            ]
                        }
                    },
                    "properties": {
                        "@odata.id": {
                            "$ref": "http://redfish.dmtf.org/schemas/odata-v4.json#/definitions/id"
                        }
                    },
                    "required": [
                        "@odata.id"
                    ],
                    "type": "object"
                }

请求样例

操作类型:GET
https://device_ip/redfish/v1/Managers/1/FDMService
请求消息体:无

响应样例

{
    {
  "@odata.context": "/redfish/v1/$metadata#Managers/1/FDMService/$entity",
  "@odata.id": "/redfish/v1/Managers/1/FDMService",
  "@odata.type": "#HwFDMService.v1_1_0.HwFDMService",
  "Id": "FDMService",
  "Name": "Fault Diagnostic Management Servcie",
  "MemPoorContactAlarmEnabled": true,
  "DiagnoseFailurePolicy": "NoAction",
  "DiagnoseSuccessPolicy": "NoAction",
  "MemHardFailureDetectionMode": "ExpertRule",
  "MemHardFailureDetectionPolicies": {
     "@odata.id":"/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies"
  },
  "MemFaultIsolationEnabled": false, 
  "MemFaultIsolationMode": "SelfDecision",
  "MemFaultIsolationSubFunctionSwitch": {
    "MemPageOfflineEnabled": false,
    "MemADDDCEnabled": false,
    "MemSoftPPREnabled": false,
    "MemHardPPREnabled": false,
    "MemACLSEnabled": false,
    "MemRowSparingEnabled": true
  },
  "MaxMemPageOfflineCount": 10000,
  …
}

评审点2:新增集合资源/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies

资源URI:/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies

资源版本:openUBMCMemoryHardFailureDetectionPolicyCollection

属性列表

  • 注1 :下表隐去Redfish规范强制要求的通用基础属性
  • 注2 :列表内属性默认均支持GET 操作。
属性名 类型 示例/默认值/取值约束 readonly 易变属性 实现PATCH 操作权限 描述
Members@odata.count integer 3 / / / ReadOnly 此属性表示MemHardFailureDetectionPolicies中@odata的数量
Members array [{“@odata.id”: “/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1”},{“@odata.id”: “/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/2”},{“@odata.id”: “/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/3”}] / / / ReadOnly 此属性包含指向资源集合MemHardFailureDetectionPolicies中各资源的链接

Schema定义
新增schema文件:openubmcmemoryhardfailuredetectionpolicycollection.json

"@odata.context": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/context"
                        },
                        "@odata.etag": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/etag"
                        },
                        "@odata.id": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/id"
                        },
                        "@odata.type": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/type"
                        },
                        "Description": {
                            "anyOf": [
                                {
                                    "$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Description"
                                },
                                {
                                    "type": "null"
                                }
                            ],
                            "readonly": true
                        },
                        "Members": {
                            "description": "The members of this collection.",
                            "items": {
                                "$ref": "http://redfish.dmtf.org/schemas/v1/PowerDistribution.json#/definitions/PowerDistribution"
                            },
                            "longDescription": "This property shall contain an array of links to the members of this collection.",
                            "readonly": true,
                            "type": "array"
                        },
                        "Members@odata.count": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/count"
                        },
                        "Members@odata.nextLink": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/odata-v4.json#/definitions/nextLink"
                        },
                        "Name": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Name",
                            "readonly": true
                        },
                        "Oem": {
                            "$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Oem",
                            "description": "The OEM extension property.",
                            "longDescription": "This property shall contain the OEM extensions.  All values for properties contained in this object shall conform to the Redfish Specification-described requirements."
                        }
                    },
                    "required": [
                        "Members",
                        "Members@odata.count",
                        "@odata.id",
                        "@odata.type",
                        "Name"
                    ],
                    "type": "object"
                }

请求样例

操作类型:GET
https://ip/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies
请求消息体:无

响应样例

{
    "@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies",
    "@odata.type": "#openUBMCMemoryHardFailureDetectionPolicyCollection.openUBMCMemoryHardFailureDetectionPolicyCollection",
    "@odata.context": "/redfish/v1/$metadata#openUBMCMemoryHardFailureDetectionPolicyCollection.openUBMCMemoryHardFailureDetectionPolicyCollection",
    "Name": "Memory Hard Failure Detection Policy Collection",
    "Members@odata.count": 3,
    "Members": [
        {
            "@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1"
        },
        {
            "@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/2"
        },
        {
            "@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/3"
        }
    ]
}

评审点3:新增资源/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}

资源URI:/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}
资源版本:openUBMCMemoryHardFailureDetectionPolicy.v1_0_0
属性列表

  • @odata.type注1 :下表隐去Redfish规范强制要求的通用基础属性(如@odata.id 、@odata.type 、Id 、Name ,集合资源还包括:Members、Members@odata.count),默认均需实现。
  • 注2 :列表内属性默认均支持GET 操作。
属性名 类型 示例/默认值/取值约束 readonly 易变属性 实现PATCH 操作权限 描述
FailureType string(enum) 故障类型,取值范围:
- CellFailure,cell故障
- RowFailure,行故障
true ReadOnly 用于指定资源对应的故障类型
RuleName string(enum) 预测规则名称,取值范围:
- CEInSameAddress,同地址可纠正错误数量规则,仅对cell故障适用
- CEInSameRow,同行可纠正错误数量规则,仅对行故障适用
- MultiBurstError,单个CE出现多burst错误规则,仅对行故障适用
true ReadOnly 用于指定资源对应的预测规则
PeriodSeconds Integer 观察周期,单位: 秒,取值范围(0,2592000] false GET: ReadOnly
PATCH: DiagnoseMgmt
预测规则的观察周期
Threshold Integer 观察周期内的阈值,取值范围[2,9] false GET: ReadOnly
PATCH: DiagnoseMgmt
观察周期内的阈值

Schema定义
新增schema文件:openubmcmemoryhardfailuredetectionpolicy.v1_0_0.json

{
    "$schema": "http://redfish.dmtf.org/schemas/v1/redfish-schema-v1.json",
    "$id": "http://redfish.dmtf.org/schemas/v1/openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.json",
    "title": "#openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.openUBMCMemoryHardFailureDetectionPolicy"
    "$ref": "#/definitions/openUBMCMemoryHardFailureDetectionPolicy",
    "copyright": "Copyright 2014-2026 DMTF. For the full DMTF copyright policy, see http://www.dmtf.org/about/policies/copyright",
    "owningEntity": "openUBMC",
    "definitions": {
        "openUBMCMemoryHardFailureDetectionPolicy": {
            "type": "object",
            "patternProperties": {
                "^([a-zA-Z_][a-zA-Z0-9_]*)?@(odata|Redfish|Message|Privileges)\\.[a-zA-Z_][a-zA-Z0-9_.]+$": {
                    "type": [
                        "array",
                        "boolean",
                        "number",
                        "null",
                        "object",
                        "string"
                    ],
                    "description": "This property shall specify a valid odata or Redfish property."
                }
            },
            "additionalProperties": false,
            "properties": {
                "@odata.context": {
                    "$ref": "http://redfish.dmtf.org/schemas/v1/odata.4.0.0.json#/definitions/context"
                },
                "@odata.id": {
                    "$ref": "http://redfish.dmtf.org/schemas/v1/odata.4.0.0.json#/definitions/id"
                },
                "@odata.type": {
                    "$ref": "http://redfish.dmtf.org/schemas/v1/odata.4.0.0.json#/definitions/type"
                },
                "Id": {
                    "$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Id",
                    "readonly": true
                },
                "Name": {
                    "$ref": "http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/Name",
                    "readonly": true
                },
                "FailureType": {
                    "type": "string",
                    "readonly": true,
                    "description": "The memory hard failure type.",
                    "longDescription": "The memory hard failure type."
                },
                "RuleName": {
                    "type": "string",
                    "readonly": true,
                    "description": "The memory hard failure detection rule name.",
                    "longDescription": "The memory hard failure detection rule name."
                },
                "PeriodSeconds": {
                    "type": [
                        "number",
                        "null"
                    ],
                    "readonly": false,
                    "description": "Detection period by seconds.",
                    "longDescription": "Detection period by seconds."
                },
                "Threshold": {
                    "type": [
                        "number",
                        "null"
                    ],
                    "readonly": false,
                    "description": "The count threshold for the rule.",
                    "longDescription": "The count threshold for the rule."
                },
            },
            "required": [
                "@odata.context",
                "@odata.id",
                "@odata.type",
                "Id",
                "Name",
                "FailureType",
                "RuleName",
                "PeriodSeconds",
                "Threshold"
            ],
            "description": "The MemoryHardFailureDetectionPolicy resource.",
            "longDescription": "The MemoryHardFailureDetectionPolicy resource."
        }
    }
}

GET请求样例

操作类型:GET
https://ip/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1
请求消息体:无

GET响应样例

{
    "@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1",
    "@odata.context": "/redfish/v1/$metadata#openUBMCMemoryHardFailureDetectionPolicy.openUBMCMemoryHardFailureDetectionPolicy",
    "@odata.type": "#openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.openUBMCMemoryHardFailureDetectionPolicy",
    "Id": "1",
    "Name": "Memory Hard Failure Detection Policy",
    "FailureType": "CellFailure",
    "RuleName": "CEInSameAddress",
    "PeriodSeconds": 3600,
    "Threshold": 3
}

PATCH请求样例

操作类型:PATCH
https://ip/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1
权限: DiagnoseMgmt
请求消息体:
{
     "PeriodSeconds": 3600,
     "Threshold": 2
}

PATCH响应样例

{
    "@odata.id": "/redfish/v1/Managers/1/FDMService/MemHardFailureDetectionPolicies/1",
    "@odata.context": "/redfish/v1/$metadata#openUBMCMemoryHardFailureDetectionPolicy.openUBMCMemoryHardFailureDetectionPolicy",
    "@odata.type": "#openUBMCMemoryHardFailureDetectionPolicy.v1_0_0.openUBMCMemoryHardFailureDetectionPolicy",
    "Id": "1",
    "Name": "Memory Hard Failure Detection Policy",
    "FailureType": "CellFailure",
    "RuleName": "CEInSameAddress",
    "PeriodSeconds": 3600,
    "Threshold": 2
}

评审点4:新增资源协作接口及接口下属性、方法

path:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}(新增)
interface: bmc.kepler.Managers.FDMService.Memory.Diagnosis(新增)
变化类型:新增接口及接口下方法
应用场景:查询及配置内存故障预测规则

新增属性

属性名称 签名 只读 变化通知 属性描述 访问权限 属性来源 持久化类型 易变属性
MemType s true false 介质类型,示例:“DDR”,默认值:“” Read: ReadOnly csr false false

新增方法

方法名称 请求签名 请求参数描述 响应签名 响应参数描述 方法描述 访问权限
GetMemHardFailureDetectionPolicyIds / / ay 所有规则的Id编号,如[1,2,3] 获取所有规则的Id编号 ReadOnly
GetMemHardFailureDetectionPolicy y 标志规则的Id编号 a{ss} 对应规则的观察周期及观察周期内的阈值,
如: [{“PeriodSeconds”: “3600”}, {“Threshold”: “3”}]
获取指定预测规则的观察周期及观察周期内的阈值 ReadOnly
SetMemHardFailureDetectionPolicy ya{ss} 第一个参数y标志规则的Id编号,取值范围:[1,3]
第二个参数a{ss}标志需设置的阈值类型及设置的阈值,如:[{“PeriodSeconds”: “3600”}, {“Threshold”: “3”}]
/ / 设置指定规则的阈值 DiagnoseMgmt

是否准备好AI预审

评审结论

通过,具体结论如下:

1、同意在FDMService资源新增下层MemHardFailureDetectionPolicies资源的链接
URI:/redfish/v1/Managers/{ManagerId}/FDMService
操作类型:GET
响应说明:增加MemHardFailureDetectionPolicies的链接,详细说明见评审点1

2、同意新增集合资源MemHardFailureDetectionPolicies
URI: /redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies
操作类型:GET
操作权限:ReadOnly
响应说明:详细说明见评审点2

3、同意新增MemHardFailureDetectionPolicy资源
URI:/redfish/v1/Managers/{ManagerId}/FDMService/MemHardFailureDetectionPolicies/{PolicyId}
资源下包括如下属性:

  • FailureType: 用于指定资源对应的故障类型,属性类型为 string(enum),只读属性,仅支持GET操作,操作权限为ReadOnly
  • RuleName: 用于指定资源对应的预测规则,属性类型为 string(enum),只读属性,仅支持GET操作,操作权限为ReadOnly
  • PeriodSeconds: 表示预测规则的观察周期,属性类型为 integer,支持GET和PATHC操作,操作权限为GET: ReadOnly PATCH: DiagnoseMgmt
  • Threshold:表示观察周期内的阈值,属性类型为 integer,支持GET和PATHC操作,操作权限为GET: ReadOnly PATCH: DiagnoseMgmt。

4、同意新增如下资源协作对象路径和接口
path:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}
interface:bmc.kepler.Managers.FDMService.Memory.Diagnosis

属性:

  • MemType:表示介质类型,只读属性,权限ReadOnly

方法:

  • GetMemHardFailureDetectionPolicyIds:用于获取所有规则的Id编号;方法权限为 ReadOnly;无请求参数;响应签名为 ay表示所有规则的Id编号;
  • GetMemHardFailureDetectionPolicy:用于获取指定预测规则的观察周期及观察周期内的阈值;方法权限为 ReadOnly;请求参数为y表示规则的Id编号;响应签名为 a{ss}表示规则的观察周期及观察周期内的阈值;
  • SetMemHardFailureDetectionPolicy:用于设置指定规则的阈值;方法权限为 DiagnoseMgmt;请求参数为ya{ss},其中y表示规则的Id编号,a{ss}表示需设置的阈值类型及设置的阈值;无响应参数。

遗留问题

1、资源协作对象FDMConfig为“上帝类”, 考虑能否分层设计
结论:分层设计,新增资源协作对象路径改为:/bmc/kepler/Managers/{ManagerId}/FDMService/MemoryConfig/{Id}

预审答案内容由AI生成,仅供参考:

帖子评审结论:【待评审】新增资源协作接口及redfish接口,支持查询及配置内存故障预测规则

帖子相关性:与 Redfish 相关(帖子内容包含 ‘redfish’ 关键字)

  • 总评审点数:6
  • Redfish 相关评审点:6
  • 非 Redfish 评审点:0

总体结果:不通过(4/6 个评审点通过)

评审点 1:

结果:通过

  • URI 示例生成:通过
  • Schema 验证:出错(未知错误)
  • 规则合规性检查:通过(15 项检查全部通过)

评审点 2:

结果:通过

  • URI 示例生成:通过
  • Schema 验证:出错(未知错误)
  • 规则合规性检查:通过(15 项检查全部通过)

评审点 3:

结果:不通过

  • URI 示例生成:通过
  • Schema 验证:出错(未知错误)
  • 规则合规性检查:不通过(1/15 项失败)
    • 规则合规性失败项(必须项)
      • [RULE-015] 资源操作权限规范 (规则合规性检查):评审点内容仅提供了 JSON 响应示例,未包含属性表格或操作权限(ReadOnly/ReadWrite)标注列,不符合必须明确标注可操作性的要求。
        • 建议:在评审点中补充属性定义表格,并增加「操作权限」或「ReadOnly」列,为每个属性明确标注 ReadOnly 或 ReadWrite(例如:PolicyEnabled: ReadWrite, FailureThreshold: ReadWrite, Status: ReadOnly 等)。

评审点 4:新增MemHardFailureDetectionPolicies资源链接

结果:通过

  • URI 示例生成:通过
  • Schema 验证:通过(0 项通过)
  • 规则合规性检查:通过(15 项检查全部通过)

评审点 5:新增集合资源/redfish/v1/Managers/manager_id/FDMService/MemHardFailureDetectionPolicies

结果:通过

  • URI 示例生成:通过
  • Schema 验证:出错(未知错误)
  • 规则合规性检查:通过(15 项检查全部通过)

评审点 6:新增资源/redfish/v1/Managers/manager_id/FDMService/MemHardFailureDetectionPolicies/X

结果:不通过

  • URI 示例生成:通过
  • Schema 验证:出错(未知错误)
  • 规则合规性检查:不通过(1/15 项失败)
    • 规则合规性失败项(必须项)
      • [RULE-010] 数据类型规范 (规则合规性检查):Id 属性示例值为整数 1,不符合 Redfish Schema 中 Id 必须为 string 类型的语义要求
        • 建议:将 “Id”: 1 修改为 “Id”: “1”,确保 Id 属性值为字符串类型,以符合 Redfish 规范对标准 Id 字段的类型定义