How to Audit and Secure AKS Access

How to audit actions in your AKS Cluster

How to Audit and Secure AKS Access
Photo by Georg Bommeli / Unsplash

The What

In this post I want to improve your AKS access control implementation by demonstrating the following:

  1. What the Local Account is in AKS and why we want it disabled
  2. What integrating with Azure Active Directory means and how that enables us to apply RBAC within the cluster
  3. How we can use the AKS Diagnostic Settings to log all the events generated in the cluster and who generated them (meaning we can identify who is creating, updating, deleting resources in the cluster).

The Why

One of the guiding principles of applying a Zero Trust security strategy is to implement least privilege access - meaning users and applications should only be granted access to data and operations that are required for them to perform their jobs. Additionally, it is valuable to audit what users and applications are doing to determine whether there is an identity that is overprivileged.

The How

Local Account

Do you know what the Local Account is with Azure Kubernetes Service? This account is a shared account and by default it will have cluster admin privileges.

Let's start from a default cluster that you would deploy from the portal with no additional configurations, just something quick where we can focus on the identity portion. This is going to set us up to talk about what exactly the Local Account is in AKS:

0:00
/
Basic AKS cluster with default Local Accounts enabled for access

When you run az aks get-credentials --resource-group <RESOURCE_GROUP> --name <AKS_CLUSTER_NAME> that will download a kubeconfig file that gives you access to the cluster as a cluster admin using a certificate signed by the internal Cluster CA, meaning you can do anything within the cluster and call all APIs. And furthermore, every user that runs this command is functioning as the same identity, so there's no way of knowing who is actually doing what in the cluster.

We can see this by using a kubectl plugin tool called the rbac-tool. Once we get the kubeconfig file and connect to the cluster, we can see who we are authenticated as and the associated Kubernetes role we've been granted. Notice how we have the cluster-admin role granted for this identity:

The rbac-tool does a lookup on the current user context and shows we are cluster-admin

Take note that at this stage, if you wanted to give users access to AKS, the two Azure RBAC roles "Azure Kubernetes Service Cluster Admin Role" and "Azure Kubernetes Service Cluster User Role" are the same - they both give you permissions to call az aks get-credentials and merges the access credential in your local kubeconfig file. Here's a note from the doc reference regarding this:

Limit access to kubeconfig in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn

Clearly this isn't ideal and doesn't align with the Zero Trust principle of using least privilege access. We really need a way to limit access to different resources and namespaces in AKS according to who is accessing the cluster. That will allow us to limit access by user identity or their group membership in AAD. By integrating AKS with AAD, we can go down that path.

Integrating with Azure Active Directory

Let's continue to use our default cluster and upgrade it to leverage AAD for authentication. To prove out our work, I have one admin user (hshahin) and another user (test-user). hshahin is an admin and will be part of the admin group we will provide when upgrading AKS to use AAD. test-user is only given permission "Azure Kubernetes Service Cluster User Role" at this point.

I'll run the following to upgrade the cluster to use AAD (this will continue using Kubernetes RBAC):

az aks update -g MyResourceGroup -n myManagedCluster --enable-aad --aad-admin-group-object-ids <id-1>,<id-2> [--aad-tenant-id <id>]
AKS-managed Azure Active Directory integration - Azure Kubernetes Service | Microsoft Learn

Once that's complete, you should see in the portal that the configuration now shows we're integrated with AAD and leveraging the basic/default Kubernetes RBAC. Take note that the "Kubernetes local accounts" is still checked, so we'll explore that in a bit.

Enabling AAD Integration with Kubernetes RBAC. The Local Account is still enabled

Let's try to understand what this means for how we can access AKS. On this first terminal shown, I'm logged in as hshahin, who is part of the Admin Group. Notice how when I run the usual az aks get-credentials nothing looks different, but then when I run kubectl get pods I get prompted to authenticate with AAD as if I'm signing into the Azure CLI:

Logging in with AAD Integration

Once I login, all works as expected. Let's use the rbac-tool plugin to learn more about my identity. You can see that I'm logged in to AKS as the following user with the following group permissions. Notice how all the group IDs that I'm a member of within AAD appear as well:

Using the rbac-tool to determine my current user context

Let's focus on the 66992c59-... group since this is the object ID of the group I told AKS to make the admin group. If I run a kubectl rbac-tool lookup command with that group ID I get the following, showing that it is assigned the cluster-admin role:

Reviewing my permissions with the rbac-tool

Now, let's try with the test user. Right now, the user has no permissions at all to AKS, while the hshahin user was part of the admin group:

Permission denied for the test-user

I can't run az aks get-credentials command, and this is because I still need to be assigned to at least the Azure Kubernetes Service Cluster User Role to pull down the kubeconfig file:

Adding a role assignment for the test-user

From there, I can then do the pull. However, the kubectl get pods denies me which makes sense because I have not been assigned any roles within the cluster itself through Kubernetes RBAC:

The test-user is still denied in the cluster for actions, but can pull down the kubeconfig to authenticate

And from here, it's a matter of working with Kubernetes RBAC or Azure RBAC to grant this user the necessary roles to execute operations within the cluster. I won't dive into the "how" for RBAC with this post, but you can see we're now setup to apply least privilege based on the users or groups accessing AKS, versus before we had no concept of identity or group because we were using a shared credential.

Here are good references on implementing Kuberenetes RBAC with AAD and Azure RBAC for AKS.

Disabling the Local Account

Now let's return to the local account still being available. Notice how when I switch back to the hshahin user, I can run the az aks get-credentials command with --admin flag:

Showing what happens if the local account is left enabled

When I run that, I get back the local clusterAdmin account. Effectively the --admin is a backdoor to me becoming a cluster admin and bypassing the AAD authentication process and associated roles implemented through RBAC.

Once you disable that local account (either through the portal or the CLI) you will find that you can no longer use the --admin flag:

Showing the result of disabling the local account

Audit Log for AKS

So far we've seen why the local account is bad and how integrating with AAD unlocks our ability to get permissions applied to specific users and groups. Finally, we want a way to audit the operations different users and groups are performing so we can periodically review that we are in line with our intended RBAC implementation.

Let's navigate to the Diagnostic Settings for the cluster where we can configure a rule to store platform logs from the control plane of AKS. The full reference of what each table provides is here. We really care about seeing which identities are creating, updating, deleting etc. in the cluster. While get/list operations may be necessary, take note that they significantly increase the number of logs captured. AKS conveniently provides the kube-audit-admin table to remove reads so we can focus on the operations we care about:

The kube-audit-admin table captures the important events on the API Server and removes get and list
Turn on the Diagnostic Settings and enable the Audit Admin logs to enable capturing actions by users and groups

Once this is turned on, we can review the logs and develop a Kusto query to review overall operations on the cluster and the identity who performed that operation.

Here is a sample query that provides a good starting point for understanding the operations being executed against the cluster. We will use this as part of our test to see what the hshahin user has been doing in the cluster:

AzureDiagnostics
| where Category == "kube-audit-admin"
| extend event = parse_json(log_s)
| extend HttpMethod = tostring(event.verb)
| extend ResponseCode = tostring(event.responseStatus.code)
| extend Authorized = tostring(event.annotations["authorization.k8s.io/decision"])
| extend User = tostring(event.user.username)
| extend Groups = tostring(event.user.groups)
| extend Apiserver = pod_s
| extend SourceIP = tostring(event.sourceIPs[0])
| project TimeGenerated, Category, HttpMethod, ResponseCode, Authorized, User, Groups, event

To test this, let's create a pod using a kubectl run command:

Creating a pod as the hshahin user

From there, let's head over to the logs to run the query and see what we find:

0:00
/
Query showing the creation of the nginx pod by the hshahin user

We can see the activity that hshahin is using. The query projects helpful columns and extracts additional information from the event json object.

Now let's see how this looks from an unauthorized user - we will use the test-user account. Right now the test-user has no permissions to create a pod:

The test-user cannot deploy pods

If we view what this looks like in the logs, we should expect to see the "Authorized" column shows us forbidden. That column is extrapolated from a kubernetes annotation applied to the event and we project it as a column to make the query easier to view:

0:00
/
The test-user is unauthorized to create pods. We can see that in the audit logs

Summary

I'm hoping this helps to improve your security posture with AKS. These are a few simple configurations to enable that get you much closer to applying least privilege and ensuring your cluster is being accessed in a controlled manner. There is still other concepts to explore around this topic, including the creation of RBAC roles and also thinking through securing Service Accounts and traditional Service Principles that we might apply to CI/CD and automation workflows that interact with the cluster.