Recently a new vulnerability in OpenSSH has been identified and the first question that popped into my mind was: How do I make sure my nodes are not affected by _this vulnerability?
In this blog post, I wanted to go over what the vulnerability is, how it can be exploited, explain how you can check if your Azure Kubernetes Service (AKS) is vulnerable to CVE-2024-6387 and what you can do about it, including different options for upgrading the VMSS image and how to choose between them.
Understand the vulnerability#
CVE-2024-6387#
CVE-2024-6387 is a critical unauthenticated RCE-as-root vulnerability that was identified in the OpenSSH server, sshd
, in glibc-based Linux systems. If exploited, this vulnerability grants full root access, affects the default configuration and does not require user interaction thus it is classified as a High Severity.
This was identified on the 1st of July 2024.
The researchers who discovered it also noted that in 2006 OpenSSH faced this vulnerability known as CVE-2006-5051. While the 2006 one was patched, the bug has reappeared. This is why the latest, CVE-2024-6387, vulnerability is dubbed the “regreSSHion bug”: we see a reintroduction of an issue that was fixed due to code changes.
CVE-2024-6387 vulnerability impacts the following OpenSSH server versions:
- Open SSH version between
8.5p1 - 9.8p1
(excluding) - Open SSH versions earlier than
4.4p1
, if they’ve not backport-patched against CVE-2006-5051 or patched against CVE-2008-4109
CVE-2024-6409#
As of the 9th of July another vulnerability has been discovered: CVE-2024-6409.
This is a distinct vulnerability from the regreSSHion bug. The vulnerability allows an attacker to execute code within the privsep child process. This child process is a part of OpenSSH that runs with restricted privileges to limit the damage that can be done if it is compromised.
The vulnerability is caused by a race condition related to how signals are handled. This means that the privsep child process can be exploited because the timing of signal handling operations can be manipulated, leading to unintended behavior that allows code execution.
Impact OpenSSH versions 8.7p1 and 8.8p1 shipped with Red Hat Enterprise Linux 9.
Warning
Machines patched for CVE-2024-6387 will also be patched for CVE-2024-6409.
Suggested actions against the vulnerability#
To protect against this vulnerability the main suggestion is to upgrade the package version using a command like or similar to apt upgrade opensshh-sftp-server
, but if you cannot do this and you need a quick workaround then an option would be to set the LoginGraceTime
SSH configuration parameter to 0 as recommended by Ubuntu.
Let’s look into both recommendations and understand them a bit more and let’s start with the workaround:
Set LoginGraceTime to 0#
OpenSSH allows remote connections to the server machines. LoginGraceTime
SSH server configuration parameter specifies the time allowed for successful authentication to the server.
This means that setting a longer Grace time period allows for more open unauthenticated connections to be made. Setting a shorter Grace time period can protect against a brute force attack in certain cases.
In the context of the identified vulnerability, this is important because the vulnerable code is called only when the LoginGraceTime
timer triggers. So the reasoning is that by setting it to 0, which means no timeout, you prevent the timer from firing, the code will not be called and thus the vulnerability is eliminated.
But there is a caveat here.
Warning
While you eliminate the risk of calling the vulnerable code, and you are protected against brute force attacks, by setting this to 0 you are making sshd
vulnerable to denial of service attacks. So it’s good to consider your options carefully and the tradeoff when you are configuring these settings.
Denial of Service through MaxStartups Exhaustion Explained
MaxStartups
is another sshd
configuration that limits the number of concurrent unauthenticated connections.
If LoginGraceTime
is set to 0, attackers can open numerous connections without being timed out. Since these connections won’t be closed due to timeout, they will remain open indefinitely.
This can exhaust the allowed number of connections specified by MaxStartups
, preventing legitimate users from accessing the SSH service.
Essentially, the server becomes overwhelmed with these open connections, leading to a denial of service for legitimate users (hence the denial of service).
This is why the main recommendation is to upgrade to a patched version of sshd
where the underlying vulnerability has been addressed. This ensures that LoginGraceTime
can be set to a reasonable value, and the server can handle connection attempts appropriately without being vulnerable to a DoS attack via MaxStartups
exhaustion.
Upgrade to a patched version of sshd
#
Now onto the main fix and what this means for your virtual machine scale sets (VMSS) in the AKS context. When running AKS, modifying the VMSS yourself is generally not recommended due to the following reasons:
Managed Service: AKS is a managed Kubernetes service, meaning Microsoft handles most of the underlying infrastructure management for you. Directly modifying VMSS configurations can interfere with the automated management and updates provided by AKS.
Configuration Consistency: AKS maintains certain configurations to ensure the cluster operates correctly. Manual modifications to the VMSS could lead to a configuration drift, where the manually set configurations diverge from the managed state AKS expects and maintains.
Stability and Reliability: Direct modifications can lead to instability or unexpected behavior within your cluster. This includes potential issues during upgrades, scaling operations, or applying patches.
Because of these reasons handling the fix for the vulnerability means waiting for the Azure release team to provide us with a patched image.
Check the AKS version#
When you upgrade Kubernetes it also upgrades the node images so a good place to start is to identify the version of Kubernetes your AKS clusters are running. You can do this through the Azure portal, CLI, or API.
Azure Portal:
Navigate to your AKS cluster resource and check the version information in the Overview section.
Azure CLI:
az aks show --resource-group <ResourceGroupName> --name <AKSClusterName> --query kubernetesVersion
Note: Replace ResourceGroupName
and AKSClusterName
with your actual resource group and AKS cluster names.
Then by making use of kubectl
command line, you can retrieve the exact version of the node images you are using:
kubectl get nodes -o wide
By running these commands you will know your Kubernetes version and also the OS Image version your nodes are running on. Now you can compare your node image version against the versions mentioned in the CVE details as vulnerable to know if you are running the nodes on an image that has a vulnerable version of sshd
.
Check and upgrade the AKS VMSS node image#
Identify the patched image version#
Azure Kubernetes Service regularly provides new node images, so it’s good to upgrade your node images frequently to take advantage of the latest AKS features. Linux node images are updated weekly, and Windows node images are updated monthly.
For Azure, and AKS more specifically, you should perform the following checks:
- Check for the node image with a patched
sshd
version on GitHub Azure AKS Releases - Check the rollout schedule of the patched node image in your region AKS Release Status page
Tip
It is also a good practice, in general, to check the release page for announcements on upcoming releases and the fixes they include and keep your node images up to date to protect against the latest vulnerabilities.
At the time of the writing of the current article, we’ll be looking out for the rollout of the image with version: 202407.08.0.
Generally, when you upgrade the Kubernetes version the images will be upgraded as well, but when you have a security patch you might want to upgrade only the image and not the Kubernetes version.
Caution
Please consider carefully before upgrading a node image version because it’s not possible to downgrade it afterward!
Verify the patched image version availability#
To check for available node image upgrades** for the nodes in your node pool simply run the following command:
az aks nodepool get-upgrades --nodepool-name mynodepool --cluster-name myAKSCluster --resource-group myResourceGroup
In the JSON output, check the latestNodeImageVersion
parameter which indicates the version of the latest image available that the nodes can be upgraded to.
Then, you want to check the actual node image you are running on (can be done via Azure Portal or CLI). If you’re using CLI for this command as well then just run:
az aks nodepool show --resource-group myResourceGroup --cluster-name myAKSCluster --name mynodepool --query nodeImageVersion
Simply compare the two image versions. If there is a difference this means there is an upgrade available for your nodes. If not, you are already running on the latest and you should check the releases for the rollout of the image you are interested in upgrading to.
Having the image version available in your region, the next step will be performing the actual node image upgrade. There are several ways of handling this depending on your scenario which I will detail below.
Upgrade all node images in all node pools#
TL;DR
CLI Command: az aks upgrade --node-image-only
Scope: This command applies the upgrade to all node pools in the specified AKS cluster.
Use Case: Use this when you want to ensure that all nodes in your entire cluster are updated to the latest node image version.
How To
Use the az
aks upgrade
command with the--node-image-only
flag to upgrade the node images across all node pools in the AKS cluster. This command ensures that only the node image is upgraded without altering the Kubernetes version.az aks upgrade \ --resource-group myResourceGroup \ --name myAKSCluster \ --node-image-only
After initiating the upgrade, you can verify the status of the node images using the
kubectl get nodes
command with a specific JSONPath query to output the node names and their image versions.kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'
Once the upgrade is complete, you can retrieve the updated details of the node pools, including the current node image version, using the az aks show command.
az aks show \ --resource-group myResourceGroup \ --name myAKSCluster
Upgrade a specific node pool#
TL;DR
CLI Command: az aks nodepool upgrade --node-image-only
Scope: This command targets a specific node pool within the AKS cluster, identified by the –name parameter.
Use Case: Use this when you need to upgrade the node image for only one particular node pool, perhaps for testing or staggered rollout purposes.
How To
If you want to upgrade the node image of a specific node pool without affecting the entire cluster, use the
az aks nodepool upgrade
command with the--node-image-only
flag.az aks nodepool upgrade \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool \ --node-image-only
Similar to the cluster-wide upgrade, check the status of the node images with the kubectl get nodes command.
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'
Use the az aks nodepool show command to get the details of the updated node pool.
az aks nodepool show \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool
Use Node Surge to Speed Up Upgrades#
TL;DR
CLI Command: az aks nodepool update --max-surge
Scope: This command also targets a specific node pool but includes the –max-surge parameter to control the number of extra nodes that can be created to expedite the upgrade.
Use Case: Use this when you want to perform a faster upgrade of a node pool by temporarily increasing the number of nodes during the upgrade process, thereby reducing downtime or upgrade duration.
How To
To speed up the node image upgrade process, you can use the az aks node pool update command with the –max-surge flag, which specifies the number of extra nodes used during the upgrade process. This allows more nodes to be upgraded simultaneously.
az aks nodepool update \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool \ --max-surge 33% \ --no-wait
Check the node image status as previously described.
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'
Retrieve the updated node pool details using the az aks node pool show command.
az aks nodepool show \ --resource-group myResourceGroup \ --cluster-name myAKSCluster \ --name mynodepool
Conclusion#
The choice between the three will depend on what your strategy will be and what you want to focus on:
- If you have a new security patch or critical update and want every node in your cluster to be updated as quickly as possible without specifying individual node pools, upgrade the entire cluster.
- If you are running different workloads on separate node pools and want to update the node image for only one specific pool to test compatibility or performance just target upgrade.
- If you need a faster upgrade for a specific node pool and can afford to temporarily add more nodes to handle the upgrade process, use node surge.
I hope this article will give you an idea of this particular security vulnerability and how you can mitigate it and how you can approach security patches in the future in the context of AKS VMSS. Thank you for reading!
Image generated using Bing AI