Monitoring Istio on AKS with Prometheus and Grafana
How to monitor Istio deployed on AKS with Managed Prometheus and Grafana
As more teams deploy Istio on AKS, I want to demonstrate how to leverage the Managed Prometheus and Grafana services in Azure to monitor the service mesh and associated services sitting behind it.
One of the benefits of leveraging Istio in your stack on AKS is that you can get metrics about calls being made to your backend service without needing to manually instrument your application. This comes from the injected envoy sidecars and the ingress gateway Istio deploys. With the Managed Prometheus and Grafana services in Azure, we have an easy path towards a fully managed solution to store and monitor those exposed metrics.
A few pre-requisites that I'm expecting:
- AKS
- Istio on AKS
- Azure Monitor Workspace linked to Grafana - if you need guidance on this one, review my post on integrating Prometheus with AKS
Step 1 - Setup Scraping for Managed Prometheus
You may have already completed this as part of the pre-requisites, but to ensure it's done properly I'll review it here.
When you setup your AKS cluster with Prometheus in Azure, that only captures the following AKS default metrics. It does not however scrape your custom pods. Therefore, we need to apply a custom config map to the cluster that tells the metrics agent to scrape your pods (there are a few options on how to do this noted here in the docs - we are taking the noted recommended approach here but review the different options if your scenario differs). We can do this by applying the following config map that is configured to scrape all pods in the cluster that have the common Prometheus pod annotations that tell the agent to scrape for metrics - there are configured in the proxies that Istio injects as well as the ingress gateway:
Create a file named prometheus-config
and copy the following contents:
Apply the config map to the kube-system namespace with the following command:
kubectl create configmap ama-metrics-prometheus-config --from-file=prometheus-config -n kube-system
As a confirmation that all is well after you apply the config map, you should find in about 2-3 minutes that the AMA agents in the kube-system namespace have restarted:
If you run a kubectl logs
command on that pod, you will find the following log message showing the scrape job was configured:
Step 2 - Deploy Istio and Associated Service
You may not need to follow this step if you already have Istio deployed. I'll take a path of deploying the default profile to get an ingress gateway and the core components deployed:
From there, I'll deploy the sample httpbin service so there is something to interact with. I'll first label the namespace for injection and then run the deployment:
#CREATE HTTPBIN NAMESPACE
kubectl create ns httpbin
#LABEL NAMESPACE FOR INJECTION
kubectl label namespace httpbin istio-injection=enabled
#DEPLOY HTTPBIN SERVICE IN HTTPBIN NAMESPACE
kubectl apply -f https://raw.githubusercontent.com/istio/istio/master/samples/httpbin/httpbin.yaml -n httpbin
#DEPLPOY GATEWAY AND VIRTUALSERVICE IN HTTPBIN NAMESPACE
kubectl apply -f https://raw.githubusercontent.com/istio/istio/master/samples/httpbin/httpbin-gateway.yaml -n httpbin
As a check, you can get your ingress gateway IP and run a curl. The httpbin pod has a /uuid path that will return a randomly generated uuid:
To generate some data, I'm going to run a curl in a loop that we can then use in our next few steps. Be sure to replace the SERVICE_IP:
Step 3 - Import Istio Dashboards to Grafana
Now that we have Istio deployed and we've setup our Prometheus scraping, we should be able to see the metrics in our linked Grafana instance.
Navigate to the Explore tab from the home page and you should then see a number of envoy and istio metrics:
Now from here you can do anything you normally would with Grafana and Prometheus. However, to me the key value add that we get when we leverage open source tools like Grafana is that someone has likely already built dashboards using these common metrics.
Istio has already published a number of dashboards, so we can import these into our Grafana instance. I deployed Istio version 1.20.0, so I'll want to import the dashboard versions that correspond.
I'll start with the Istio Service Dashboard. We can simply copy the ID to our clipboard and then import the ID in our Grafana instance. Alternatively to correspond to our version, we can select Revisions and download the JSON that matched our Istio Version. I'll take the second path:
Since I have the curl script running in the background, I am able to see data populated. Notice that the "client" in this dashboard is the ingress gateway and the "service" is the httpbin backend service. The metrics look the same from both perspectives since the gateway forwards everything to the httpbin service in our scenario. The metrics also look accurate since the curl script is generating a curl roughly every second:
You can continue following the import process to get the main Istio Dasbhoards into Grafana.
Callout About Default Dashboards
A key point to consider is that default dashboards can make assumptions that you need to evaluate, especially at scale if you have multiple AKS clusters linked to one Azure Monitor Workspace and Grafana instance. Depending on your setup and use case, it may be necessary to extend the dashboard.
If we consider our example, what would happen if I deployed the exact same httpbin service to another AKS cluster and linked that cluster to the same Azure Monitor Workspace? Basically the results would be double, since the names of the gateway and backend service would exactly match and there isn't a clear filter on how to say "cluster 1" versus "cluster 2".
There are many ways to address this. You could simply configure each cluster to use a different Azure Monitor Workspace, but that isn't really necessary. Another way is let's add a filter to further specify the query.
In my small example, I will simply use the cluster name label but you could use custom labels on the pods themselves or other variables to do the filtering.
Every metric processed by the AMA Agent in your AKS cluster will have a cluster label that is the name of the cluster:
Therefore, I could use this label to filter the metrics by each cluster if desired. I would need to add a variable to the dashboard and then use it in the queries. Here's what that could look like, and again you can customize or do it differently as needed.
First navigate to the dashboard settings:
From there, let's create a new variable. This can look different on your end depending on the the type of variable you want to create. In this scenario, we simply want to get the name of the cluster available to us, so our config can look like the following:
Once that is applied, you will see the variable populate at the top of the dashboard. However, we still are not done since we need to then include it in queries of the visuals themselves:
Select one of the visuals we want to update:
From there, we can edit the query and test to confirm it works:
From there you can save the dashboard and view the update. To test it, I actually type is an incorrect cluster name and notice how the visual presents N/A representing no data:
Keep in mind we only did this for one visual, so you want need to make the corresponding updates to the other visuals.
Summary
The goal of this post is to show you the power of using Grafana and Prometheus with AKS, especially when using OSS tools like Istio. Dashboards often already exist and can accelerate your monitoring.