Hack Series: Is your Ansible Package Configuration Secure?

In our client assessment work hacking software and cloud systems of all types, we’re often asked to look into configuration management tools such as Ansible. In this post we’ll deep dive into what package management vulnerabilities in the world of Ansible look like. First we’ll recap what Ansible is, provide some tips for security pros to debug it at a lower level, and explore both a CVE in the dnf module and an interesting gotcha in the apt module.

To ensure we’re always looking out for DevSecOps and aiding defenders, our next post in this series will touch on the strengths and weaknesses of tools like Semgrep for catching vulnerabilities in Ansible configurations.

Ansible

Ansible is an open source, Python-based, configuration management tool developed by Red Hat. It enables DevOps and other system maintainers to easily write automation playbooks, composed of a series of tasks in YAML format, and then run those playbooks against targeted hosts.

A key feature of Ansible is that it is agentless: the targeted hosts don’t need to have Ansible installed, just Python and SSH. The machine running the playbook (“control node” in Ansible speak) copies the Python code required to run the tasks to the targeted hosts (“managed nodes”) over SSH, and then executes that code remotely. Managed nodes are organized into groups in an “inventory” for easy targeting by playbooks.

Credit: codingpackets.com

In 2019 Ansible was the most popular cloud configuration management tool. While the paradigm of “immutable infrastructure” has led to more enthusiasm for choosing Terraform and Docker for performing several tasks that previously might have been done by Ansible, it is still an immensely popular tool for provisioning resources, services, and applications.

Ansible provides a large number of built-in modules, which are essentially high-level interfaces for calling common system commands like apt, yum, or sysctl. The modules are Python files that do the work of translating the specified YAML tasks into the commands that actually get executed on the managed nodes. For example, the following playbook contains a single Ansible task which uses the apt module to install NGINX on a Debian-based system. Normally an Ansible playbook would be run against a remote host, but in our examples we are targeting localhost for illustrative purposes:

- name: Sample Apt Module Playbook
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      apt:
        name: nginx
        state: present

To understand better what this playbook is doing under the hood, let’s use a debugging technique that will come in useful when we look at vulnerabilities later. Since Ansible doesn’t natively provide a way to see the exact commands getting run, we can use a handy strace invocation. strace allows us to follow the flow of system calls that this playbook triggers when run normally under ansible-playbook, even as Ansible forks off multiple child processes (“-f” flag), so we can view the command that ultimately gets executed:

$ sudo strace -f -e trace=execve ansible-playbook playbook.yml 2>&1 | grep apt
[pid 11377] execve("/usr/bin/apt-get", ["/usr/bin/apt-get", "-y", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "install", "nginx"], 0x195b3e0 /* 33 vars */) = 0

Using both strace command line options ("-e trace=execve“) and grep as filters, we are making sure that irrelevant system calls are not output to the terminal; this avoids the noise of all the setup code that both Ansible and the apt module need to run before finally fulfilling the task. Ultimately we can see that the playbook runs the command apt-get install nginx, with a few extra command line flags to automate accepting confirmation prompts and interactive dialogues.

If you are following along and don’t see the apt-get install command in the strace output, make sure NGINX is uninstalled first. To improve performance and prevent unwanted side-effects, Ansible first checks whether a task has already been achieved, and so returns early with an “ok” status if it thinks NGINX is already in the installed state.

Top 10 Tips for Ansible Security Audits

As shown, Ansible transforms tasks declared in simple YAML format into system commands often run as root on the managed nodes. This layer of abstraction can easily turn into a mismatch between what a task appears to do and what actually happens under the hood. We will explore where such mismatches in Ansible’s built-in modules make it possible to create configuration vulnerabilities across all managed nodes.

But first, let’s take a step back and contextualize this by running through general tips if you are auditing an Ansible-managed infrastructure. From an infrastructure security perspective, Ansible does not expose as much attack surface as some other configuration management tools. SSH is the default transport used to connect from the control node to the managed nodes, so Ansible traffic takes advantage of the sane defaults, cryptography, and integration with Linux servers that the OpenSSH server offers. However, Ansible can be deployed in many ways, and best practices may be missed when writing roles and playbooks. Here are IncludeSec’s top 10 Ansible security checks to remember when reviewing a configuration:

  1. Is an old version of Ansible being used which is vulnerable to known CVEs?
  2. Are hardcoded secrets checked into YAML files?
  3. Are managed nodes in different environments (production, development, staging) not appropriately separated into inventories?
  4. Are the control nodes which Ansible is running from completely locked down with host/OS based security controls?
  5. Are unsafe lookups which facilitate template injection enabled?
  6. Are SSHD config files using unrecommended settings like permitting root login or enabling remote port forwarding?
  7. Are alternative connection methods being used (such as ansible-pull) and are they being appropriately secured?
  8. Are the outputs of playbook runs being logged or audited by default?
  9. Is the confidential output of privileged tasks being logged?
  10. Are high-impact roles/tasks (e.g. those that are managing authentication, or installing packages) actually doing what they appear to be?

Whether those tips apply will obviously vary depending on whether the organization is managing Ansible behind a tool like Ansible Tower, or if it’s a startup where all developers have SSH access to production. However, one thing that remains constant is that Ansible is typically used to install packages to setup managed nodes, so configuration vulnerabilities in package management tasks are of particular interest. We will focus on cases where declaring common package management operations in Ansible YAML format can have unintended security consequences.

CVE-2020-14365: Package Signature Ignored in dnf Module

The most obvious type of mismatch between YAML abstraction and reality in an Ansible module would be an outright bug. A recent example of this is CVE-2020-14365. The dnf module installs packages using the dnf package manager, the successor of yum and the default on Fedora Linux. The bug was that the module didn’t perform signature verification on packages it downloaded. Here is an example of a vulnerable task when run on Ansible versions <2.8.15 and <2.9.13:

- name: The task in this playbook was vulnerable to CVE-2020-14365
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      dnf:
        name: nginx
        state: present

The vulnerability is severe when targeted by advanced attackers; an opening for supply-chain attack. The lack of signature verification makes it possible for both the package mirror and man-in-the-middle (MITM) attackers on the network in between to supply their own packages which execute arbitrary commands as root on the host during installation.

For more details about how to perform such an attack, this guide walks through injecting backdoored apt packages from a MITM perspective. The scenario was presented a few years ago on a HackTheBox machine.

The issue is exacerbated by the fact that in most cases on Linux distros, GPG package signatures are the only thing giving authenticity and integrity to the downloaded packages. Package mirrors don’t widely use HTTPS (see Why APT does not use HTTPS for the justification), including dnf. With HTTPS transport between mirror and host, the CVE is still exploitable by a malicious mirror but at least the MITM attacks are a lot harder to pull off. We ran a quick test and despite Fedora using more HTTPS mirrors than Debian, some default mirrors selected due to geographical proximity were HTTP-only:

The root cause of the CVE was that the Ansible dnf module imported a Python module as an interface for handling dnf operations, but did not call a crucial _sig_check_pkg() function. Presumably, this check was either forgotten or assumed to be performed automatically in the imported module.

Package Signature Checks Can be Bypassed When Downgrading Package Versions

The dnf example was clearly a bug, now patched, so let’s move on to a more subtle type of mismatch where the YAML interface doesn’t map cleanly to the desired low-level behavior. This time it is in the apt package manager module and is a mistake we have seen in several production Ansible playbooks.

In a large infrastructure, it is common to install packages from multiple sources, from a mixture of official distro repositories, third-party repositories, and in-house repositories. Sometimes the latest version of a package will cause dependency problems or remove features which are relied upon. The solution which busy teams often choose is to downgrade the package to the last version that was working. While downgrades should never be a long-term solution, they can be necessary when the latest version is actively breaking production or a package update contains a bug.

When run interactively from the command line, apt install (and apt-get install, they are identical for our purposes) allows you to specify an older version you want to downgrade to, and it will do the job. But when accepting confirmation prompts automatically (in “-y” mode, which Ansible uses), apt will error out unless the --allow-downgrades argument is explicitly specified. Further confirmation is required since a downgrade may break other packages. But the Ansible apt module doesn’t offer an --allow-downgrades option equivalent; there’s no clear way to make a downgrade work using Ansible.

The first Stackoverflow answer that comes up when searching for “ansible downgrade package” recommends using force: true (or force: yes which is equivalent in YAML):

- name: Downgrade NGINX in a way that is vulnerable
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      apt:
        name: nginx=1.14.0-0ubuntu1.2
        force: true
        state: present

This works fine, and without follow-up, this pattern can become a fixture of the configuration which an organization runs regularly across hosts. Unfortunately, it creates a vulnerability similar to the dnf CVE, disabling signature verification.

To look into what is going on, let’s use the strace command line to see the full invocation:

$ sudo strace -f -e trace=execve ansible-playbook apt_force_true.yml 2>&1 | grep apt
[pid 479683] execve("/usr/bin/apt-get", ["/usr/bin/apt-get", "-y", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "--force-yes", "install", "nginx=1.14.0-0ubuntu1.2"], 0x1209b40 /* 33 vars */) = 0

The force: true option has added the --force-yes parameter (as stated in the apt module docs). --force-yes is a blunt hammer that will ignore any problems with the installation, including a bad signature on the downloaded package. If this same apt-get install command is run manually from the command line, it will warn: --force-yes is deprecated, use one of the options starting with --allow instead. And to Ansible’s credit, it also warns in the docs that force “is a destructive operation with the potential to destroy your system, and it should almost never be used.”

So why is use of force: true so prevalent across Ansible deployments we have seen? It’s because there’s no easy alternative for this common downgrade use-case. There are only unpleasant workarounds involving running the full apt install command line using the command or shell modules, before either Apt Pinning or dpkg holding, native methods in Debian-derived distros to hold a package at a previous version, can be used.

On the Ansible issue tracker, people have been asking for years for an allow_downgrade option for the apt module, but two separate pull requests have been stuck in limbo because they do not meet the needs of the project. Ansible requires integration tests for every feature, and they are difficult to provide for this functionality since Debian-derived distros don’t normally host older versions of packages in their default repositories to downgrade to. The yum and dnf modules have had an allow_downgrade option since 2018.

Fixing the Problem

At IncludeSec we like to contribute to open source where we can, so we’ve opened a pull request to resolve this shortcoming of the apt module. This time, the change has integration tests and will hopefully meet the requirements of the project and get merged!

(Update: Our PR was accepted and usable as of Ansible Core version 2.12)

The next part of this series will explore using Semgrep to identify this vulnerability and others in Ansible playbooks. We’ll review the top 10 Ansible security audits checks presented and see how much of the hard work can be automated through static analysis. We’ve got a lot more to say about this, stay tuned for our next post on the topic!

1 thought on “Hack Series: Is your Ansible Package Configuration Secure?”

Leave a Reply

Discover more from Include Security Research Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading