Multicloud monitoring, an example with AWS and Azure
We keep reading every day about the advantages of taking our services and systems to the cloud. Terms such as private cloud, public cloud or hybrid cloud are already part of the language of any business manager and it seems that if we are not in the cloud is that we or our company is not "cool" or "innovative".
Beyond fashion, the advantages are clear, but vary depending on the application, but we can highlight several:
Ease of scaling and growth in case of need. In fact, this is one of the key characteristics of the cloud: the ability to increase and reduce resources quickly and easily, programmatically and even automatically. This is what really differentiates cloud providers from those hostings that have changed the name of their VPS to "cloud".
Having services regardless of the infrastructure that supports them: databases, serverless computing, etc. If we enter the offer of Amazon Web Services or Google Cloud you will see that there are more types of services and names available than you can imagine, many of them with specific applications on optimized infrastructures such as machine learning, etc.
Some advantages that we already had if we outsourced our infrastructure in a hosting: security, high availability, not having to maintain a datacentre on you own, etc.
In general and thanks to technologies such as Docker, today it is easy to migrate from one cloud or another, and even work with several at once, depending on what suits us. Likewise, being able to offer a service in several geographical areas at the click of a button is incredible.
Pay-per-use. Want to try something? I throw it out, try it out and then throw it away. And I only pay for those minutes or hours. Do I have a peak and need more resources? Same thing.
But they also have certain disadvantages:
Pay per use and scalability, this usually means that companies have no idea what they are going to pay at the end of the month. It is not even easy to establish a variable cost in the style "if my users increase by a certain percentage X, my cloud will automatically scale by a certain percentage Y", but there are so many variables involved (bandwidth consumption, CPU auctions, timetables, etc.) that it almost becomes something random and usually expensive. This is, in the opinion of the writer, the great advantage of the cloud but at the same time the great barrier to the adoption of the cloud, the difficulty of controlling costs. Fortunately, monitoring can lend a hand in this.
The learning curve of the different services that we have, apart from computing and storage, is high, knowing them well, knowing their advantages, when to use them, etc. is complicated.
Comparing prices and performance between different providers is practically impossible, since they don't charge for the same thing, and many times you don't know what is behind a "vCPU" or a "SSD disk".
Working with different clouds can be a problem equivalent to working with several manufacturers in your datacenter, only here we had already reached common protocols (SNMP, IPMI, etc.): we have different APIs, different accessible data, different ways to access, etc. For example, for monitoring Amazon has its CloudWatch and Microsoft Azure its Monitor.
Monitoring allows us to break several of these difficulties and can help us so that:
We can see, analyze, compare and add the cost of our clouds in real time. We can set up alarms to help us control costs, as well as relate these costs to other data such as the number of users on the platform, etc. to help us be able to predict costs. Even automate the shutdown of a machine if the cost is triggered by a "bug" or by someone using our servers to mine bitcoins ...
We can compare performance between the "same" machine in different clouds or geographical areas.
But, above all, to have a centralized tool that helps us to standardize the monitored parameters avoiding having to learn how to get to them through Cloudwatch, Monitor or whatever. And of course, to be able to include the machines of our datacenter.
To show you the cloud capabilities of our Zabbix-based tool, Minerva, we created an example with several machines in Amazon Web Services (AWS) and Microsoft Azure. If you are already using Zabbix you can do it without any problem.
One advantage of using a tool that works with agent is that we can get many metrics for free. Amazon, for example, charges you for using Cloudwatch with a frequency less than 5 minutes, they even charge you for asking you how much they are charging you. If we use an agent there are many metrics (CPU, memory, etc.) that can be obtained at no cost (beyond the transfer of course everything is charged here, but in any case, at a much lower cost). Others, such as cost or firewall configurations or starting a machine, we have no choice but to attack via API, but being able to combine both ways easily is very versatile.
In the example, our agents are automatically registered in the platform when they are created and we can see a simple map showing the aggregated data for each of the providers: memory, storage, etc. so that at a glance we see the overall percentage of use of each of the platforms.
If you click on any of them, you will access a self-generated map with the different machines, also indicating their type and geographical location. As a detail, the platform normalizes the location data to "EUROPE" for example, although AWS says "eu-central-1" and Azure "FranceCentral". In the image, we can see that there is a problem in one of the servers, which is not serving our website.
As Zabbix maps are actionable, and the objective of our Minerva platform is to be able to solve problems as quickly as possible, reducing the number of clicks, we see that when clicking on the problematic server we are presented with several options: restart the machine, stop it, see the latest data, etc., even access the Amazon console.
As we suspect that it is a firewall problem, we click on the option to see the state of the ports and it returns us that, indeed the port 80 is not raised.
With this simple example, we see the wide potential and effectiveness that we have with Zabbix multicloud and hybrid environments, where we can also keep control over our inventory, even if it is ethereal, automatically:
Last but not least, the visualization in a comfortable and practical Grafana dashboard, integrated in our tool. Here we have made one of example, in which we can select the machines by zone and see the breakdown of CPU use, the monthly cost, network, disk and memory, as well as the response time of our website, in this case high, because Japan is somewhat far from our servers in Spain
Any doubt you have about this, if you are already working with Zabbix or Grafana in multicloud environments and you want to know more or you want to replicate this example, you can write us without any commitment, we will try to help you.