Demo
Since my employer has an interest in Sensu and I have been doing my work within their network I had the opportunity to explain what I have worked on so far, how the environment is configured, and had a discussion on next steps and future planning. This project is one that will definitely continue after my course has completed next week, and has been a valuable learning experience. Here is a summary of what was covered in the demo:
What is Sensu: Sensu is a system monitoring solution designed for the cloud. Sensu is easily scaled and provides a simple web ui to visually 'connect the dots' with what is going on within the monitoring environments. One big advantage is, since it was designed for the cloud, clients automatically can check themselves in and add them to you monitoring. The Sensu project has come a long way and compared to other monitoring solutions like Nagios, does things better, but may be lacking in a few aspects. (This may just be to lack of knowledge on my part, or I may need to explore additional docs for what Sensu is capable of).
How is Sensu set up: The Sensu environment I have set up is an 8 machine environment that includes: 3 Sensu servers (which also run Sensu API and Uchiwa dashboard), 2 RabittMQ servers, and 3 Redis.
Currently Sensu is setup in a centralized fashion with directly connected clients. What this means is that the clients run checks locally (standalone checks) and publish results up to RMQ and are then picked up by Sensu which handles notification routing. The check data is also stored in Redis and the dashboard (Uchiwa) is updated through the Sensu API. Setting up Sensu in this fashion prevents arbitrary and potentially malicious checks to be run due to a system breach., and it also decreases infrastructure complexity.
High Availabiliy: High availability is achieved by:
- Sensu: master election is internal within Sensu with no additional configuration needed by the user. This is achieved by Sensu-servers being aware of each other through checking connections to RabbitMQ and Redis. Master election and failover is relatively seamless in the eyes of the user.
- RabbitMQ: RabbitMQ provides for cluster failover built in and Sensu does support this feature; but unfortunately the Sensu Puppet module does not support this feature and a fairly significant rework is needed in order to get this to function properly. In order to work around this pitfall load balancing the 2 RabbitMQ instances will provide HA.
- Redis: In order to achieve a clustering like end goal, we are using Sentinel to provide master election and failover. Sentinel is built in to newer version of Redis and operates on a quorum election to choose a master in the event of a master failure.
- NOTE: these three components will all be load balanced eventually to provide for additional protection during machine or service failure.
- Overall Look/Feel: The look and feel of Uchiwa is very slick, modern, and simple. It provides useful information while taking a minimalist approach.
Going from a Nagios UI
To Uchiwa's
- Stashes: Stashes are used to schedule downtime, and can be used to silence a check or a machine from alert notification. While a check/machine is silenced (stashed) the event data will still show, but all alert handling is turned off to reduce noise during things such as maintenance.
Moving Forward: After some discussion we talked about if we choose to move forward with a larger implementation of Sensu how it will take a pilot or test run in order to ensure Sensu can do what we need it to do. This test run would include propping up a production like instance of Sensu and hammering on it with realistic-like events and event handling to put it through its paces. Also moving forward we would like to replace the auth provided by Uchiwa which provides only for a single user, and put LDAP with Apache in front of Uchiwa for authentification and then turn off Uchiwa's. Since I have been able to get a couple different handlers working (mailer, pagerduty) and have a good handle on checks my supervisors/coworkers would like me to look at how to send event data to graphite through Sensu to compare to our current approach. Also they would like me to look into how Sensu checks handle dependencies.