0%

Zabbix(10)使用zabbix监控IPMI

IPMI介绍

IPMI(Intelligent Platform Management Interface)即智能平台管理接口是使硬件管理具备“智能化”的新一代通用接口标准。用户可以利用 IPMI 监视服务器的物理特征,如温度、电压、电扇工作状态、电源供应以及机箱入侵等。Ipmi 最大的优势在于它是独立于 CPU BIOS 和 OS 的,所以用户无论在开机还是关机的状态下,只要接通电源就可以实现对服务器的监控。Ipmi 是一种规范的标准,其中最重要的物理部件就是BMC(Baseboard Management Controller 如图1),一种嵌入式管理微控制器,它相当于整个平台管理的“大脑”,通过它 ipmi 可以监控各个传感器的数据并记录各种事件的日志。

配置服务器IPMI

使用服务器默认的ipmi地址登录,然后进入到ipmi登录界面,修改规划的ip地址,我这里以浪潮服务器为例

01

修改zabbix server配置文件

1
sed -i '/# StartIPMIPollers=0/aStartIPMIPollers=3' /etc/zabbix/zabbix_server.conf

/etc/init.d/zabbix-server restart

安装IPMI相关软件

yum -y install OpenIPMI OpenIPMI-devel ipmitool freeipmi

获取IPMI传感器参数

ipmitool -I lanplus -H 10.206.2.52 -U admin -P passwd sensor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
CPU0_Temp        | 55.000     | degrees C  | ok    | na        | na        | na        | 101.000   | 103.000   | na        
CPU1_Temp | 52.000 | degrees C | ok | na | na | na | 101.000 | 103.000 | na
PCH_Temp | 50.000 | degrees C | ok | na | na | na | 95.000 | 100.000 | na
DIMMG0_Temp | 45.000 | degrees C | ok | na | na | na | 83.000 | 93.000 | na
DIMMG1_Temp | 43.000 | degrees C | ok | na | na | na | 83.000 | 93.000 | na
Inlet_Temp | 25.000 | degrees C | ok | na | na | na | 40.000 | 42.000 | na
Outlet_Temp | 48.000 | degrees C | ok | na | na | na | na | na | na
SYS0_Temp | 42.000 | degrees C | ok | na | na | na | na | na | na
SYS1_Temp | 35.000 | degrees C | ok | na | na | na | na | na | na
RAID_Temp | na | degrees C | na | na | na | na | 105.000 | 115.000 | na
GPU0_Temp | na | degrees C | na | na | na | na | 82.000 | 92.000 | na
GPU1_Temp | na | degrees C | na | na | na | na | 82.000 | 92.000 | na
MIC0_Temp | na | degrees C | na | na | na | na | 104.000 | 114.000 | na
MIC1_Temp | na | degrees C | na | na | na | na | 104.000 | 114.000 | na
SYS_VCCIO | 0.950 | Volts | ok | 0.690 | 0.770 | 0.850 | 1.170 | 1.250 | 1.330
SYS_12V | 12.126 | Volts | ok | 9.024 | 9.776 | 10.528 | 13.536 | 14.288 | 15.040
SYS_3.3V | 3.360 | Volts | ok | 2.660 | 2.800 | 2.940 | 3.658 | 3.798 | 3.938
SYS_5V | 5.220 | Volts | ok | 3.888 | 4.176 | 4.464 | 5.544 | 5.832 | 6.120
PCH_P1V05 | 1.050 | Volts | ok | 0.770 | 0.850 | 0.930 | 1.170 | 1.250 | 1.330
PCH_P1V5 | 1.540 | Volts | ok | 1.180 | 1.260 | 1.340 | 1.670 | 1.750 | 1.830
CPU0_VCORE | 1.770 | Volts | ok | 1.040 | 1.120 | 1.200 | 2.300 | 2.380 | 2.460
CPU1_VCORE | 1.780 | Volts | ok | 1.040 | 1.120 | 1.200 | 2.300 | 2.380 | 2.460
CPU0_DDR_VDDQAB | 1.210 | Volts | ok | 0.910 | 0.990 | 1.070 | 1.330 | 1.410 | 1.490
CPU0_DDR_VDDQCD | 1.230 | Volts | ok | 0.910 | 0.990 | 1.070 | 1.330 | 1.410 | 1.490
CPU1_DDR_VDDQEF | 1.230 | Volts | ok | 0.910 | 0.990 | 1.070 | 1.330 | 1.410 | 1.490
CPU1_DDR_VDDQGH | 1.220 | Volts | ok | 0.910 | 0.990 | 1.070 | 1.330 | 1.410 | 1.490
FAN_0 | 5760.000 | RPM | ok | na | 0.000 | na | na | na | na
FAN_1 | na | RPM | na | na | 0.000 | na | na | na | na
FAN_2 | 5760.000 | RPM | ok | na | 0.000 | na | na | na | na
FAN_3 | na | RPM | na | na | 0.000 | na | na | na | na
FAN_4 | 5952.000 | RPM | ok | na | 0.000 | na | na | na | na
FAN_5 | na | RPM | na | na | 0.000 | na | na | na | na
FAN_6 | 5952.000 | RPM | ok | na | 0.000 | na | na | na | na
FAN_7 | na | RPM | na | na | 0.000 | na | na | na | na
CPU0_Status | 0x0 | discrete | 0x8080| na | na | na | na | na | na
CPU1_Status | 0x0 | discrete | 0x8080| na | na | na | na | na | na
MEM_CHA0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHA1_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHA2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHB0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHB1_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHB2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHC0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHC1_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHC2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHD0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHD1_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHD2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHE0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHE1_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHE2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHF0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHF1_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHF2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHG0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHG1_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHG2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHH0_Status | 0x0 | discrete | 0x4080| na | na | na | na | na | na
MEM_CHH1_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
MEM_CHH2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD0_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD1_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD2_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD3_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD4_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD5_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD6_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD7_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD8_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD9_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD10_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD11_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD12_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD13_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD14_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD15_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD16_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD17_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD18_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD19_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD20_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD21_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD22_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD23_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD24_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD0_Rear_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD1_Rear_Status | 0x0 | discrete | 0x0180| na | na | na | na | na | na
HDD2_Rear_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
HDD3_Rear_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
Total_Power | 335.000 | Watts | ok | na | na | na | na | na | na
PSU0_Supply | 0x0 | discrete | 0x0180| na | na | na | na | na | na
PSU1_Supply | 0x0 | discrete | 0x0180| na | na | na | na | na | na
PSU0_Unit | 0x0 | discrete | 0x0080| na | na | na | na | na | na
PSU1_Unit | 0x0 | discrete | 0x0080| na | na | na | na | na | na
EventLog | 0x0 | discrete | 0x0080| na | na | na | na | na | na
ME_FW_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
BMC_Boot_Up | 0x0 | discrete | 0x0280| na | na | na | na | na | na
IPMI_Watchdog | 0x0 | discrete | 0x0080| na | na | na | na | na | na
RISER0_Temp | 48.000 | degrees C | ok | na | na | na | na | na | na
RISER1_Temp | na | degrees C | na | na | na | na | na | na | na
HDD_REAR0_Temp | 36.000 | degrees C | ok | na | na | na | 60.000 | 70.000 | na
HDD_REAR1_Temp | na | degrees C | na | na | na | na | 60.000 | 70.000 | na
NVME_0_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
NVME_1_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
NVME_2_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
NVME_3_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
NVME_4_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
NVME_5_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
NVME_6_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
NVME_7_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na
All_Unit_Status | 0x0 | discrete | 0x0080| na | na | na | na | na | na

Zabbix web界面添加主机

01

01

01

修改模板,添加监控项

由于默认自带的模板很多监控项获取不到数据,这里我手动监测一下传感器,自己把传感器能获取的数据添加到模板里去

ipmitool -I lanplus -H 10.206.2.52 -U admin -P admin sensor list
01

ipmitool -I lanplus -H 10.206.2.52 -U admin -P admin sensor get "CPU0_Temp"

01

01

01

扩展

网络接口相关命令

1
2
3
4
5
6
7
8
9
ipmitool lan print 1 #显示channel1的网络配置信息
ipmitool lan set 1 ipsrc static #使用静态IP地址
ipmitool lan set 1 ipaddr 172.16.10.205 #设置IP地址
ipmitool lan set 1 netmask 255.255.0.0 #设置子网掩码
ipmitool lan set 1 access on
ipmitool lan set 1 defgw ipaddr 172.16.10.1 #配置网关

ipmitool lan set 2 defgw macaddr  #设置channel2的网关mac address
ipmitool lan set 2 ipsrc dhcp #设置channel2的ip 源在DHCP

用户管理相关命令

1
2
3
4
5
6
7
8
9
10
ipmitool user list chan-id  #显示某通道上的所有用户
ipmitool user list 1 #列出当前用户列表

ipmitool set password [] #修改某用户的密码
ipmitool user set password 2 “111.com” #其中 2 为要设置密原的用户 ID 号,设置密码为 111.com

ipmitool disable   #禁止掉某用户
ipmitool enable   #启用某用户
ipmitool priv []  #修改某用户在某通道上的权限
ipmitool test <16|20>[<password]> #测试用户

远程电源控制类

1
2
3
4
5
6
7
8
ipmitool -I lanplus –H 10.206.2.52 –U username –P Passwordchassis power off
ipmitool -I lanplus –H 10.206.2.52 –U username –P Passwordchassis power on
ipmitool -I lanplus –H 10.206.2.52 –U username –P Passwordchassis power reset
ipmitool -I lanplus –H 10.206.2.52 –U username –P Passwordchassis power cycle

注:power cycle 和power reset的区别在于前者从掉电到上电有1秒钟的间隔,而后者是很快上电

ipmitool -I lanplus -H 10.206.2.52 -U admin -P admin mc reset warm #重启BMC

启动设置类

1
2
ipmitool chassis bootdev bios 重启后停在BIOS 菜单
ipmitool chassis bootdev pxe 重启后从PXE启动

系统相关的命令

1
2
3
ipmitool mc info 显示BMC版本信息
ipmitool bmc reset cold BMC 热启动
ipmitool bmc reset warmBMC冷启动

读取系统状态类

1
2
3
4
ipmitool sensor list  显示系统所有传感器列表
ipmitool fru list   显示系统所有现场可替代器件的列表
ipmitool sdr list   显示系统所有SDRRepository设备列表 
ipmitool pef list   显示系统平台时间过滤的列表

系统日志类

1
2
3
4
5
ipmitool sel elist   显示所有系统事件日志
ipmitool sel clear   删除所有系统时间日志
ipmitool sel delete ID 删除第ID条SEL
ipmitool sel time get  显示当前BMC的时间
ipmitool sel time set XXX 设置当前BMC的时间

通道相关命令

1
2
3
4
ipmitool channel info 显示系统默认channel
ipmitool channel authcap channel-number privilege  修改通道的优先级别
ipmitool channel getaccess channel-number user-id 读取用户在通道上的权限
ipmitool channel setacccess channel-number user-id callin=on ipmi=on link=onprivilege=5 // 设置用户在通道上的权限

看门狗相关命令

1
2
3
ipmitool mc watchdog get 读取当前看门狗的设置
ipmitool watchdog off   关掉看门狗
ipmitool watchdog reset   在最近设置的计数器的基础上重启看门狗

参考资料:

https://www.cnblogs.com/kaishirenshi/p/9703127.html

https://zhuanlan.zhihu.com/p/477139556