北向事件日志相关问题咨询

目前发现部分event日志,在REDFISH、WEB和SNMP实现的情况不统一。
例如PCIeCardOMOverTemp,
在vpd\vendor\event_def.json中的定义为
{
“EventCode”: “0x08000063”,
“ReportChannel”: 65535,
“OldEventCode”: “”,
“EventType”: 0,
“LifeCycleId”: 0,
“DeassertFlag”: 1,
“EventKeyId”: “PcieCard.PCIeCardOMOverTemp”,
“SeverityId”: 1,
“ActionId”: 0,
“EventName”: “PCIeCardOMOverTemp”
},
这条event,在rackmount\interface_config\redfish\static_resource\redfish\v1\registrystore\messages\en\ibmcevents.v3_0_0.json中 可以找到相关定义

    "PCIeCardOMOverTemp": {
        "Description": null,
        "Message": "The %1 %2 optical module %3 temperature (%4 degrees C) exceeds the overtemperature threshold (%5 degrees C).",
        "Severity": "Warning",
        "NumberOfArgs": 5,
        "ParamTypes": [
            "string",
            "string",
            "string",
            "string",
            "string"
        ],
        "Resolution": "1. Check for fan alarms. 2. Check the equipment room temperature. 3. Check for air inlet or outlet blockage. 4. Replace the optical module.",
        "Oem": {
            "{{OemIdentifier}}": {
                "@odata.type": "#HwBMCEvent.v1_0_0.HwBMCEvent",
                "EventId": "0x08000063",
                "EventName": "PCIeCardOMOverTemp",
                "EventEffect": null,
                "EventCause": null
            }
在rackmount\interface_config\snmp\mib\HUAWEI-SERVER-iBMC-MIB.mib中也可以找到相关定义,		

hwPCIeCardOMOverTemp NOTIFICATION-TYPE
OBJECTS { hwTrapSeq, hwTrapSensorName, hwTrapEvent, hwTrapSeverity, hwTrapEventCode, hwTrapEventData2, hwTrapEventData3, hwTrapServerIdentity, hwTrapLocation, hwTrapTime }
STATUS current
DESCRIPTION
“PCIe card optical module overheating minor alarm. (Generated)”
::= { hwPCIeCardEvent 99 }
hwPCIeCardOMOverTempDeassert NOTIFICATION-TYPE
OBJECTS { hwTrapSeq, hwTrapSensorName, hwTrapEvent, hwTrapSeverity, hwTrapEventCode, hwTrapEventData2, hwTrapEventData3, hwTrapServerIdentity, hwTrapLocation, hwTrapTime }
STATUS current
DESCRIPTION
“PCIe card optical module overheating minor alarm. (Cleared)”
::= { hwPCIeCardEvent 100 }
这个日志是否就意味着在REDFISH、WEB和SNMP三个接口都有实现?

但是像例如 PCIeCardTempFail,只在vpd\vendor\event_def.json中有定义,在REDFISH和SNMP里面都没有找到,是否就意味着在REDFISH和SNMP接口没有实现这个事件告警?类似的还有很多,例如DiskLinkSpeedReduced、DiskMaxTempMajorInfo、VPDReadFail、Event_FirmwareFailure、Event_InspectFail

    {
        "EventCode": "0x08000005",
        "ReportChannel": 65535,
        "OldEventCode": "0x2800FFFF",
        "EventType": 0,
        "LifeCycleId": 1,
        "DeassertFlag": 1,
        "EventKeyId": "PcieCard.PCIeCardTempFail",
        "SeverityId": 1,
        "ActionId": 0,
        "EventName": "PCIeCardTempFail"
    },

综上,想了解一下上述问题的解释。
然后我们一般情况下新增一个事件,应该做哪些步骤。

关于不一致的问题

  1. ibmcevents.v3_0_0.json文件会在构建阶段根据event_def.json动态生成,所以redfish接口查询的和event_def是一致的,static_resource目录下的文件可以理解为旧文件,不需要维护
  2. snmp和告警码肯定是一一对应的(错配漏配除外),只是定义方式不同,通过你这边的关键字是无法索引的,而是通过hwPCIeCardEvent 100匹配,snmp的具体配置逻辑可以咨询rackmount onwer

新增事件参考事件定制DOC

如果还有疑问可再提,我会补充到文档中