目前发现部分event日志,在REDFISH、WEB和SNMP实现的情况不统一。
例如PCIeCardOMOverTemp,
在vpd\vendor\event_def.json中的定义为
{
“EventCode”: “0x08000063”,
“ReportChannel”: 65535,
“OldEventCode”: “”,
“EventType”: 0,
“LifeCycleId”: 0,
“DeassertFlag”: 1,
“EventKeyId”: “PcieCard.PCIeCardOMOverTemp”,
“SeverityId”: 1,
“ActionId”: 0,
“EventName”: “PCIeCardOMOverTemp”
},
这条event,在rackmount\interface_config\redfish\static_resource\redfish\v1\registrystore\messages\en\ibmcevents.v3_0_0.json中 可以找到相关定义
"PCIeCardOMOverTemp": {
"Description": null,
"Message": "The %1 %2 optical module %3 temperature (%4 degrees C) exceeds the overtemperature threshold (%5 degrees C).",
"Severity": "Warning",
"NumberOfArgs": 5,
"ParamTypes": [
"string",
"string",
"string",
"string",
"string"
],
"Resolution": "1. Check for fan alarms. 2. Check the equipment room temperature. 3. Check for air inlet or outlet blockage. 4. Replace the optical module.",
"Oem": {
"{{OemIdentifier}}": {
"@odata.type": "#HwBMCEvent.v1_0_0.HwBMCEvent",
"EventId": "0x08000063",
"EventName": "PCIeCardOMOverTemp",
"EventEffect": null,
"EventCause": null
}
在rackmount\interface_config\snmp\mib\HUAWEI-SERVER-iBMC-MIB.mib中也可以找到相关定义,
hwPCIeCardOMOverTemp NOTIFICATION-TYPE
OBJECTS { hwTrapSeq, hwTrapSensorName, hwTrapEvent, hwTrapSeverity, hwTrapEventCode, hwTrapEventData2, hwTrapEventData3, hwTrapServerIdentity, hwTrapLocation, hwTrapTime }
STATUS current
DESCRIPTION
“PCIe card optical module overheating minor alarm. (Generated)”
::= { hwPCIeCardEvent 99 }
hwPCIeCardOMOverTempDeassert NOTIFICATION-TYPE
OBJECTS { hwTrapSeq, hwTrapSensorName, hwTrapEvent, hwTrapSeverity, hwTrapEventCode, hwTrapEventData2, hwTrapEventData3, hwTrapServerIdentity, hwTrapLocation, hwTrapTime }
STATUS current
DESCRIPTION
“PCIe card optical module overheating minor alarm. (Cleared)”
::= { hwPCIeCardEvent 100 }
这个日志是否就意味着在REDFISH、WEB和SNMP三个接口都有实现?
但是像例如 PCIeCardTempFail,只在vpd\vendor\event_def.json中有定义,在REDFISH和SNMP里面都没有找到,是否就意味着在REDFISH和SNMP接口没有实现这个事件告警?类似的还有很多,例如DiskLinkSpeedReduced、DiskMaxTempMajorInfo、VPDReadFail、Event_FirmwareFailure、Event_InspectFail
{
"EventCode": "0x08000005",
"ReportChannel": 65535,
"OldEventCode": "0x2800FFFF",
"EventType": 0,
"LifeCycleId": 1,
"DeassertFlag": 1,
"EventKeyId": "PcieCard.PCIeCardTempFail",
"SeverityId": 1,
"ActionId": 0,
"EventName": "PCIeCardTempFail"
},
综上,想了解一下上述问题的解释。
然后我们一般情况下新增一个事件,应该做哪些步骤。