Safe monitoring with Zabbix
February 20, 2017Zabbix: Integration and plugins with other tools
April 30, 2017
How to avoid alarm flapping in Zabbix 3.2
One of the most common problems we face when we first start monitoring services is the so-called "alarm flapping".
When contracting the cloud monitoring system based on Zabbix with Muutech, the installation of a "muubox" is included in the DPC. This is a Zabbix proxy to which several temperatures and humidity sensors have been added. One of the measures the air inlet temperature to the rack.
The ASHRAE recommends not to go below 18ºC for this type of room (until 2005 the recommendation was not to go below 20ºC) and therefore our system warns our clients when this fact occurs -beyond the risk of corrosion of the server plates, there is the saving that comes from knowing that we can increase the thermostat of our air conditioning system by 1 or 2 degrees.
The fact is that if we simply establish in Zabbix a rule that sends an alarm every time the temperature drops below 18ºC, it can happen that at a given moment the temperature oscillates above and below this threshold:
Which translates into our mail, into this:
If this happens, it makes it very difficult to use the alarms by any means, and the recipient may end up looking like a false alarm, or your email client may decide to mark them as spam. Our work at Muutech is to help you adjust these alarms so that they are really useful for you using techniques like the one we are telling you now: the histeresis.
Hysteresis consists essentially of establishing a different threshold or condition for the disappearance of the alarm. In this way, we warn when we go below 18ºC but we do not consider the alarm resolved until 18.5ºC is exceeded, for example. Configuring and displaying this in Zabbix 3.2 is very simple, as a recovery criterion can be entered directly, in this case:
There are other techniques to increase the accuracy of these alarms that can be used in combination with hysteresis: for example, searching for maximums and minimums or averages overtime periods or number of samples as opposed to using only the last value. In the example, we use "max(300)" to check that for 5 minutes the temperature has always been below 18 degrees.
You can find some of these techniques, as well as configure this if your Zabbix version is lower than 3.2
here.
For any doubt or help we can give you, do not hesitate to contact us:
info@muutech.com