Wednesday, August 13, 2014

Reversing the Dropcam Part 3: Digging into complied Lua functionality

Contribs from Nico Rodriguez, Kris Brosch, and Erik Cabetas

In Part 1 & Part 2 of this RE blog series you saw how we reverse engineered the Dropcam and got access to the file system. In this final post of the series we'll examine some of the binaries found on the file system and play a bit with Lua code we found there. As usual we'll talk about some of the lessons learned from some failures in the analysis process as well as successes. We'll conclude with a release of a small tool that can aid reversers who are looking at Lua disassembly.

The Lua code we found on the system is packed inside the Dropcam's /usr/connect binary which was obtained from the rooted Dropcam as described in our previous blog post (Part 2.) We unpacked the connect binary; it's compressed/packed with upx but that is trivial to undo. Once unpacked we loaded the binary in our trusty IDA and looked around a little bit. We noticed it was writing a file named /tmp/connect.bin and then running this command via a call to system():

rm -rf /tmp/connect && mkdir /tmp/connect && tar zx -f /tmp/connect.bin -C /tmp/connect && rm /tmp/connect.bin

So it looks like /usr/bin/connect is decompressing a tar.gz file hidden inside the connect binary itself. The IDA screenshot below shows the function that writes the file and then calls the shell command. This function is called with the arguments 0x8393c (the address of the connect.bin data in memory) and 0x29203 (the length of the file):

We extracted the file using dd:

dd if=./connect.decompressed of=connect.tar.gz bs=1 skip=473404 count=168451

And then, we unpacked the .tar.gz file and took a look at what was there:
$ ls -la

total 808
drwxrwxrwx 1 nico staff 4096 Feb 21 15:20 .
drwxrwxrwx 1 nico staff 4096 Nov 11 20:35 ..
-rwxrwxrwx 1 nico staff 1504 Apr 23 2013 containers.bin
-rwxrwxrwx 1 nico staff 5879 Apr 23 2013 decoder.bin
-rwxrwxrwx 1 nico staff 1038 Apr 23 2013 descriptor.bin
-rwxrwxrwx 1 nico staff 10376 Apr 23 2013 dispatch.bin
-rwxrwxrwx 1 nico staff 54727 Apr 23 2013 droptalk_pb.bin
-rwxrwxrwx 1 nico staff 9360 Apr 23 2013 encoder.bin
-rwxrwxrwx 1 nico staff 1243 Apr 23 2013 hello.bin
-rwxrwxrwx 1 nico staff 545 Apr 23 2013 hwver.bin
-rwxrwxrwx 1 nico staff 4279 Apr 23 2013 ir.bin
-rwxrwxrwx 1 nico staff 879 Apr 23 2013 list.bin
-rwxrwxrwx 1 nico staff 615 Apr 23 2013 listener.bin
-rwxrwxrwx 1 nico staff 650 Apr 23 2013 main.bin
-rwxrwxrwx 1 nico staff 2363 Apr 23 2013 monitor.bin
-rwxrwxrwx 1 nico staff 708 Apr 23 2013 motion.bin
-rwxrwxrwx 1 nico staff 2010 Oct 29 19:48 net.bin
-rwxrwxrwx 1 nico staff 2607 Apr 23 2013 oldiags.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_01_3D_hwrev_1.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_01_3D_hwrev_2.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_02_3D_hwrev_1.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_02_3D_hwrev_2.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_03_3D_hwrev_1.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_03_3D_hwrev_2.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_04_3D_hwrev_1.bin
-rwxrwxrwx 1 nico staff 17536 Apr 18 2013 ov9715_04_3D_hwrev_2.bin
-rwxrwxrwx 1 nico staff 3280 Apr 23 2013 persistence.bin
-rwxrwxrwx 1 nico staff 329 Apr 23 2013 platform.bin
-rwxrwxrwx 1 nico staff 3365 Apr 23 2013 platform_a5s.bin
-rwxrwxrwx 1 nico staff 551 Apr 23 2013 platform_local.bin
-rwxrwxrwx 1 nico staff 20750 Apr 23 2013 protobuf.bin
-rwxrwxrwx 1 nico staff 191 Apr 23 2013 rtp.bin
-rwxrwxrwx 1 nico staff 643 Apr 23 2013 settings.bin
-rwxrwxrwx 1 nico staff 9931 Apr 23 2013 states.bin
-rwxrwxrwx 1 nico staff 912 Apr 23 2013 status.bin
-rwxrwxrwx 1 nico staff 3822 Apr 23 2013 streams.bin
-rwxrwxrwx 1 nico staff 1505 Apr 23 2013 text_format.bin
-rwxrwxrwx 1 nico staff 1525 Apr 23 2013 type_checkers.bin
-rwxrwxrwx 1 nico staff 3047 Apr 23 2013 update.bin
-rwxrwxrwx 1 nico staff 601 Apr 23 2013 usb.bin
-rwxrwxrwx 1 nico staff 2602 Apr 23 2013 util.bin
-rwxrwxrwx 1 nico staff 1468 Apr 23 2013 watchdog.bin
-rwxrwxrwx 1 nico staff 3620 Apr 23 2013 wire_format.bin

Inspecting the first .bin file we see these are Lua byte-code files. The first five bytes were those of a Lua Bytecode Header:
+------------------------+
| 1B | 4C | 75 | 61 | 52 | => Lua 0x52
+------------------------+

These files contain compiled Lua bytecode that supplements the logic in the connect binary. From the initial examination, we saw the bytecode was Lua 5.2 bytecode. The structure of a Lua bytecode file is extensively documented; we'll just cover the necessary information in this post (for a quick overview take a look at this link).

Of course we'd like to know what functionality is hidden in these files so we tried every decompiler we could get our hands on. Unfortunately they all complained about the byte-code version or died trying to interpret the bytes on the files. This is because the decompilers weren't up-to-date for Lua 5.2. This version of Lua adds a couple of instructions to the VM but the semantics and the byte-code format seems to be the same.

Here were some of the decompilers we tried (among others):
Considering this, we tried to hack up the files to trick the decompilers into working with our target files but alas, nothing seemed to be working, the decompilers just died with errors stating that the chunk of code did not correspond to valid Lua code. Note: Pay careful attention to endianness when hacking up byte code files. We even considered patching a tool like unluac to support Lua 5.2 bytecode as this looked like the most mature out of the ones we tried, but this wouldn't be a trivial task and would require major surgery. Unluac and others weren't going anywhere without a major patch and we didn't have much time so we went lower-level to a bytecode disassembler.

Enter: LuaAssemblyTools(LAT) - https://github.com/mlnlover11/LuaAssemblyTools.
This Lua library allowed us to parse and disassemble the byte-code regardless of version and/or endianness. We were able to decompile the Lua 5.2 byte-code used in the connect binary into LASM (a LAT representation of Lua VM's instructions).

Now we have disassembly, but it's ugly -- like DNSSec level of ugly. So our next challenge was what to do with the dissasembled code. The way tables and constants are handled in Lua's VM is great for machine consumption but human readable it is not! How many levels of indirection can one really keep track of in their head at the same time?

Using LAT's LASM Decompiler we disassembled descriptor.bin into this:

; Decompiled to lasm by LASM Decompiler v1.0 ; Decompiler Copyright (C) 2012 LoDC ; Main code .name "" .options 0 0 1 2 ; Above contains: Upvalue count, Argument count, Vararg flag, Max Stack Size ; Constants .const "module" .const "descriptor" .const "FieldDescriptor" .const "TYPE_DOUBLE" .const 1 .const "TYPE_FLOAT" .const 2 .const "TYPE_INT64" .const 3 .const "TYPE_UINT64" .const 4 .const "TYPE_INT32" .const 5 .const "TYPE_FIXED64" .const 6 .const "TYPE_FIXED32" .const 7 .const "TYPE_BOOL" .const 8 .const "TYPE_STRING" .const 9 .const "TYPE_GROUP" .const 10 .const "TYPE_MESSAGE" .const 11 .const "TYPE_BYTES" .const 12 .const "TYPE_UINT32" .const 13 .const "TYPE_ENUM" .const 14 .const "TYPE_SFIXED32" .const 15 .const "TYPE_SFIXED64" .const 16 .const "TYPE_SINT32" .const 17 .const "TYPE_SINT64" .const 18 .const "MAX_TYPE" .const "CPPTYPE_INT32" .const "CPPTYPE_INT64" .const "CPPTYPE_UINT32" .const "CPPTYPE_UINT64" .const "CPPTYPE_DOUBLE" .const "CPPTYPE_FLOAT" .const "CPPTYPE_BOOL" .const "CPPTYPE_ENUM" .const "CPPTYPE_STRING" .const "CPPTYPE_MESSAGE" .const "MAX_CPPTYPE" .const "LABEL_OPTIONAL" .const "LABEL_REQUIRED" .const "LABEL_REPEATED" .const "MAX_LABEL" ; Upvalues .upval '' 1 0 ; Instructions gettabup 0 0 256 loadk 1 1 call 0 2 1 newtable 0 0 25 settable 0 259 260 settable 0 261 262 settable 0 263 264 settable 0 265 266 settable 0 267 268 settable 0 269 270 settable 0 271 272 settable 0 273 274 settable 0 275 276 settable 0 277 278 settable 0 279 280 settable 0 281 282 settable 0 283 284 settable 0 285 286 settable 0 287 288 settable 0 289 290 settable 0 291 292 settable 0 293 294 settable 0 295 294 settable 0 296 260 settable 0 297 262 settable 0 298 264 settable 0 299 266 settable 0 300 268 settable 0 301 270 settable 0 302 272 settable 0 303 274 settable 0 304 276 settable 0 305 278 settable 0 306 278 settable 0 307 260 settable 0 308 262 settable 0 309 264 settable 0 310 264 settabup 0 258 0 return 0 1 0 

To understand this as quick as possible we need something to make LASM a bit more sane, time to write some code to do it ourselves! Lua is a register-based virtual machine so that makes our life a little easier.

We made an easy script that rewrites the LASM code into something more human readable. It organizes the disassembly to a much more readable code display form, so consider the output of this tool somewhere in the middle of the spectrum between a straight disassembler and a decompiler (restructured disassembly?)

If you're interested to learn more, here are a few presentations showing the internals of a Lua VM that came in handy for this task (http://luaforge.net/docman/83/98/ANoFrillsIntroToLua51VMInstructions.pdf and http://www.dcc.ufrj.br/~fabiom/lua/ were a huge help).

The resulting code from our tool can't be compiled (so it's not a true decompiler) but it was so much easier to follow than a straight disassembly. You can find the tool published on our Github here.

Here we can see the description.bin output after using our LasmRewriter.py script:

function main(...) module(descriptor) regs[0] = [] regs[0][TYPE_DOUBLE] = 1 regs[0][TYPE_FLOAT] = 2 regs[0][TYPE_INT64] = 3 regs[0][TYPE_UINT64] = 4 regs[0][TYPE_INT32] = 5 regs[0][TYPE_FIXED64] = 6 regs[0][TYPE_FIXED32] = 7 regs[0][TYPE_BOOL] = 8 regs[0][TYPE_STRING] = 9 regs[0][TYPE_GROUP] = 10 regs[0][TYPE_MESSAGE] = 11 regs[0][TYPE_BYTES] = 12 regs[0][TYPE_UINT32] = 13 regs[0][TYPE_ENUM] = 14 regs[0][TYPE_SFIXED32] = 15 regs[0][TYPE_SFIXED64] = 16 regs[0][TYPE_SINT32] = 17 regs[0][TYPE_SINT64] = 18 regs[0][MAX_TYPE] = 18 regs[0][CPPTYPE_INT32] = 1 regs[0][CPPTYPE_INwhT64] = 2 regs[0][CPPTYPE_UINT32] = 3 regs[0][CPPTYPE_UINT64] = 4 regs[0][CPPTYPE_DOUBLE] = 5 regs[0][CPPTYPE_FLOAT] = 6 regs[0][CPPTYPE_BOOL] = 7 regs[0][CPPTYPE_ENUM] = 8 regs[0][CPPTYPE_STRING] = 9 regs[0][CPPTYPE_MESSAGE] = 10 regs[0][MAX_CPPTYPE] = 10 regs[0][LABEL_OPTIONAL] = 1 regs[0][LABEL_REQUIRED] = 2 regs[0][LABEL_REPEATED] = 3 regs[0][MAX_LABEL] = 3 return regs[0] end

This gets the disassembly to the point where we can easily understand it, compared to what we had before which was just horrible. Now that we can disassemble the files we see that they control the logic of the device, but the hardware access is done at a lower level. More so, the System-on-a-Chip has some interesting features like setting up the parameters of your video input and output and the image post-processing is done by the hardware which is much more efficient.

Lua on an embedded devices such as Dropcam is compact and safer to write than C, so that's a good idea from the security front. The Linux kernel and it's device drivers running on the device take care of everything real-time related and they expose this functionality to Lua the Unix way i.e. everything is a file. You can open a /dev/ file to access the stream of video and manipulate camera functionality. Everything for image conversion, filtering, etc. is taken care of in the low-level drivers. (Note: a bit more detail on this topic can be be found in SynAck's recent presentation which was published after the research you're reading in this blog-post was conducted.)

This way of using Lua on embedded devices is a little different than project like eLua (http://www.eluaproject.net/) which takes the Lua VM and make it run on small embedded devices (to check the supported CPUs click http://www.eluaproject.net/overview/status). We've seen that used on other embedded devices we hack on.

Well that's the conclusion of this blog post series, we hope you got a bit of insight into reversing embedded devices. We didn't publish any 0day vulns in these posts, 0days are a given in every product if you look hard enough, this blog series was meant to give the beginner/intermediate IoT reverser some guidance.

Reminder: You can find the Lua disassembly rewriter tool on our Github here.

Thursday, July 17, 2014

Hacking your hacking tools: When you absolutely must decode ProtoBuf

Earlier this year we did a web application assessment where our client made extensive use of protobufs sent over HTTP. For those who haven't come across it, Protobuf is a library developed by Google for serializing messages to a compact binary format. Protobufs are often used for developing different types of network protocols, and sometimes they are used to serialize data that will be sent over HTTP, a situation where encoding data in a human-readable format like JSON or XML is more common.

We like to use Burp Suite when auditing anything that works over HTTP, and when applications serialize data in a human-readable format, it's easy to use Burp to modify that data. With a binary format like protobufs, however, modifying an encoded message by hand is tedious and error-prone, so we decided to try the (Burp Protobuf Decoder plugin by Marcin Wielgoszewski.) This post details our experience working with the Burp Protobuf Decoder plugin, the problems we had getting Burp set up to test this particular web app, and how we solved those problems.

As we started testing, our Burp session filled with binary data in our proxy history. When we loaded the plugin into that session, it didn't add any “protobuf” tabs or decode anything. We quickly realized that this was because the plugin was looking for messages with a content-type header of "application/x-protobuf", while the application was using a slightly different content-type. Changing the plugin code to look for the modified content-type header let us see the contents of the protobufs more easily, but we still couldn't edit them.

We wanted to edit the contents of the messages, but to see why we couldn't, and what we would have to do to be able to edit them, let's back up and look at how protobufs are defined. Protobuf message formats are defined in the protobuf language and stored as .proto files. The .proto files are then compiled into source code for the language where you want to use them. The Burp Protobuf Decoder plugin allows you to modify protobufs once you've loaded the message definition .proto files; without them, it falls back on using the protoc tool to decode messages.

The protoc tool can decode binary messages without access to the original .proto definition files, but it doesn't support re-encoding messages. This is because some information is lost when encoding the messages, making encoding messages without the message type definition difficult. When you only have the information in the binary messages to go by, the message field types are ambiguous, and it also isn't always clear whether some fields are optional or can be repeated. Of course, the names of fields and enumerated values are not included in binary messages either.

We were lucky because we were doing a greybox assessment, meaning we had access to the .proto files (as well as the rest of the application source code). At the same time we were unlucky - when we tried to load the .proto files into the Burp plugin, some of them would refuse to load, instead causing Java exceptions to be thrown with the message "Method code too large!"

The Protobuf Decoder plugin loads message definitions by first compiling the .proto files into python code using the standard protoc command and then importing the python files on the fly. Burp extensions written in python are run using the Jython python implementation, and it turns out that Java doesn't support methods larger than 64k. This is the reason we were getting the "Method code too large!" exception - Jython was trying to load the python code generated by protoc into Java methods, but they were too big for Java.

For most developers, the solution to the "Method code too large!" exception is to break up their python code into smaller files and methods. In this situation however, our python code was generated by protoc, and it wasn't very clear how to split it up. Instead, we decided to try splitting up the problematic .proto files into multiple smaller .proto files so that each generated python file would be smaller. This solution eventually worked.

The problem with this solution is that it's not necessarily easy to split up .proto files because of dependencies between type definitions. Protobuf messages can have fields that contain other message types. A message definition can reference another message definition in the same .proto file, or in a .proto file that it imports, but protoc can't handle circular dependencies between .proto files.

For example, let's say you're trying to split a.proto into a1.proto and a2.proto. If you have a2.proto import a1.proto, you can't have a1.proto import a2.proto. That means that you have to split the file so that none of the message definitions in a1.proto depend on those in a2.proto.

Say this is a.proto:
message Foo { required Bar bar = 1; } message Bar { optional Qux qux = 1; } message Baz { repeated Foo foo = 1; } message Qux { required int32 q = 1; }
To safely split it into two, you have to carefully arrange your message definitions. Here is a1.proto:
message Bar { optional Qux qux = 1; } message Qux { required int32 q = 1; }
And here's a2.proto:
import "a1.proto"; message Foo { required Bar bar = 1; } message Baz { repeated Foo foo = 1; }
Doing this programatically would require code to parse and re-write .proto files. Luckily, there were only a few .proto files that were giving us trouble, and we were able to split them up by hand relatively easily. We split each of them into two .proto files, which compiled to make python files small enough for Jython to load. We loaded the smaller .proto files into the Burp plugin, allowing us to view and edit messages in Burp and finally do the tests that we wanted to try.

In this case we were unlucky that the .proto files we were given were big enough to cause trouble, but we were able to use Wielgoszewski's plugin and some .proto file hacking to get our hacking done. We hope sharing this experience will save you or another web app hacker some headaches when trying to work with protobufs in Burp!



Tuesday, June 3, 2014

Exploiting CVE-2014-0196 a walk-through of the Linux pty race condition PoC

By Samuel Groß

Introduction

Recently a severe vulnerability in the Linux kernel was publicly disclosed and patched. In this post we'll analyze what this particular security vulnerability looks like in the Linux kernel code and walk you through the publicly published proof-of-concept exploit code by Matthew Daley released May 12th 2014.

The original post by the SUSE security team to oss-security announced that the vuln was found accidentally by a customer in production! You can find the patch at this link.

The core issue is located in the pty subsystem of the kernel and has been there for about five years. There was about one year in the middle where the vuln was not present, we'll talk about that a bit later in this post.

Background on the pty/tty subsystem

In order to fully understand the vuln we'll have to dive into the pty/tty subsystem of the linux kernel so lets start there.

A tty is "..an electromechanical typewriter paired with a communication channel." Back in the day a tty was made up of a keyboard for the input, a screen or similar display for the output and an OS process that was attached to this concept of tty. The process would then receive the input and it's output would be redirected to the screen. Those days are long gone but command line applications are not (thankfully!) and today we mostly use pseudo terminals. The main difference here is that instead of a keyboard and screen another process sits at the master side of the pty (for example a terminal emulator). Think of a pty as a bidirectional pipe or socket with some additional hooks in place (for example if you type a ctrl-c on the master side the kernel will interpret it instead of sending it to the slave. In this case the kernel will send a SIGINT signal to the slave process which will often cause it to terminate execution).

It's the pty subsystem's job to take input from either side of the pty, look for specific bytes in the byte stream (e.g. a ctrl-c), process them and deliver everything else to the other side. There is additional logic involved here which is not present in other IPC concepts such as pipes or sockets. This logic takes care to ensure things like echoing characters you type at the master end are also written back to it, pressing the backspace key to remove previously typed characters actually works on display, or sending signals like SIGINT when ctrl-c is sent. This logic is called line discipline (ldisc in short). Upon receiving data from either side the kernel will store the data in a temporary buffer (struct tty_buffer) and queue a work item to process the incoming data (flush it to the line discipline) at a later point and deliver them to the client side (I assume this is mainly done for "real" terminals whose input arrives in interrupt context (i.e. keyboard press, USB packet, ...) and should thus be handled as fast as possible). In this vuln we'll be racing one of these worker processes while it processes data to find the exploitable condition.


You can learn more about the pty subsystem here: http://www.linusakesson.net/programming/tty/

The vulnerability

For background we'll first need to introduce some important structures from include/linux/tty.h. (all source code excerpts were taken from Linux 3.2.58 except if stated otherwise):

struct tty_buffer { struct tty_buffer *next; char *char_buf_ptr; unsigned char *flag_buf_ptr; int used; int size; int commit; int read; /* Data points here */ unsigned long data[0]; };
As seen above a tty_buffer data structure temporarily holds a fixed number (well under normal circumstances) of bytes that have arrived at one end of the tty and still need to be processed.
tty_buffer is a dynamically sized object, so the char_buf_ptr will always point at the first byte right after the struct and flag_buf_ptr will point to that address plus 'size'. tty_buffer.size (which is only the size of the char buffer) can be any of the following: 256, 512, 768, 1024, 1280, 1536 and 1792 (TTY_BUFFER_PAGE).

The actual size of the object is then calculated as follows: 2 x size (for characters + flags) + sizeof(tty_buffer) (for the header), causing the tty_buffer to live in one of the following three kernel heap slabs: kmalloc-1024, kmalloc-2048 or kmalloc-4096.

struct tty_bufhead { struct work_struct work; spinlock_t lock; struct tty_buffer *head; /* Queue head */ struct tty_buffer *tail; /* Active buffer */ struct tty_buffer *free; /* Free queue head */ int memory_used; /* Buffer space used excluding free queue */ };
A tty_bufhead is, as the name implies, is the head (or first) data structure for tty_buffers. It keeps a list active buffers (head) while also storing a direct pointer to the last buffer (the currently active one) to improve performance. You will often see references to bufhead->tail in the kernel source code, meaning the currently active buffer is requested. It also keeps it's own freelist for buffers smaller than 512 bytes (see drivers/tty/tty_buffer.c:tty_buffer_free()).

struct tty_struct { int magic; struct kref kref; struct device *dev; struct tty_driver *driver; const struct tty_operations *ops; /* ... */ struct tty_bufhead buf; /* Locked internally */ /* ... */ };
The tty_struct data structure represents a tty/pty in kernel space. For the sake of this post all you need to know is that it stores the tty_bufhead and thus the buffers.


Alright, let's start with the function mentioned in the commit message, tty_insert_flip_string_fixed_flag() in drivers/tty/tty_buffer.c.
It is responsible for storing the given bytes in a tty_buffer of the tty device, allocating a new one if required:

The call chain leading up to this function roughly looks like this: write(pty_fd) in userspace -> sys_write() in kernelspace -> tty_write() -> pty_write() -> tty_insert_flip_string_fixed_flag()
int tty_insert_flip_string_fixed_flag(struct tty_struct *tty, const unsigned char *chars, char flag, size_t size) { int copied = 0; do { int goal = min_t(size_t, size - copied, TTY_BUFFER_PAGE); int space = tty_buffer_request_room(tty, goal); /* -1- */ struct tty_buffer *tb = tty->buf.tail; /* If there is no space then tb may be NULL */ if (unlikely(space == 0)) break; memcpy(tb->char_buf_ptr + tb->used, chars, space); /* -2- */ memset(tb->flag_buf_ptr + tb->used, flag, space); tb->used += space; /* -3- */ copied += space; chars += space; /* There is a small chance that we need to split the data over several buffers. If this is the case we must loop */ } while (unlikely(size > copied)); return copied; }
This function is fairly straightforward: At -1- tty_buffer_request_room ensures that enough space is available in the currently active buffer (tty_bufhead->tail), allocating a new one if required. At -2- the incoming data is written to the active buffer and at -3- the 'used' member is incremented. Note that tb->used is used as an index into the buffer.

The commit message mentions that two separate processes (a kernel worker process echoing data previously written to the master end and the process at the slave end writing to the pty directly) can enter this function at the same time due to a missing lock, thus causing a race condition.
So what could happen here? The commit message provides us with the following scenario:

            A                                       B 
__tty_buffer_request_room               
                                        __tty_buffer_request_room 
memcpy(buf(tb->used), ...) 
tb->used += space; 
                                        memcpy(buf(tb->used), ...) ->BOOM

In here we see two processes (A and B) writing to the pty at the same time. Since the first process updates tb->used first the memcpy() of the second process will write past the end of the buffer (assuming the first write already filled the buffer) and thus causes the memory corruption.
Now this looks reasonable at first but is actually only part of the story.
Here are some observations that don't quite fit with this scenario:
- When running a simple PoC the kernel seems to crash very fast (on older kernels at least), while the scenario above seems relatively hard to achieve
- Looking at the debugger shows that often multiple pages of kernel data have been overwritten upon crashing. This can hardly be the case when only sending e.g. 2 x 4096 bytes at once

Also take a look at the following (slightly shortened) stacktrace, produced by setting a breakpoint at tty_insert_flip_string_fixed_flag()

 
#0  tty_insert_flip_string_fixed_flag (tty=tty@entry=0xffff880107a82800, 
    chars=0x0, flag=flag@entry=0 '\000', size=1)                      /* -1. */
#1  tty_insert_flip_string (size=<optimized out>, 
    chars=<optimized out>, tty=0xffff880107a82800)
#2  pty_write (tty=0xffff880117cd3800, buf=<optimized out>, c=<optimized out>)
#3  tty_put_char (tty=tty@entry=0xffff880117cd3800, ch=66 'B')        /* -2- */
#4  process_echoes (tty=0xffff880117cd3800)
#6  n_tty_receive_char (c=<optimized out>, tty=0xffff880117cd3800)
#7  n_tty_receive_buf (tty=0xffff880117cd3800, 
    cp=0xffff880117a78828 'B' ..., fp=0xffff880117a78a2d "", count=512)
#8  flush_to_ldisc (work=0xffff880117cd3910)
#9  process_one_work (worker=worker@entry=0xffff880118f507c0, 
    work=0xffff880117cd3910)
#10 worker_thread (__worker=__worker@entry=0xffff880118f507c0)
#11 kthread (_create=0xffff880118ed9d80)
#12 kernel_thread_helper ()

This is the code path a worker process takes when performing a flush to the line discipline. As can be seen at -1- and -2- the echoing is actually done byte by byte.
Clearly we can't cause much harm by only overwriting a buffer with a single byte when the chunk still has unused space left (as will be the case for tty_buffer objects).

In the following we will now assume that the race went something like this: Process A wrote 256 bytes, process B (performing an echo) entered tty_buffer_request_room() before A updated tb->used, causing it to not allocate a fresh buffer. Afterwards B wrote another byte to the same buffer and incremented tb->used further.

To understand what is really causing the memory corruption take a look at the tty_buffer_request_room() function called by tty_insert_flip_string_fixed_flag().

int tty_buffer_request_room(struct tty_struct *tty, size_t size) { struct tty_buffer *b, *n; int left; /* -1- */ unsigned long flags; spin_lock_irqsave(&tty->buf.lock, flags); /* -2- */ /* OPTIMISATION: We could keep a per tty "zero" sized buffer to remove this conditional if its worth it. This would be invisible to the callers */ if ((b = tty->buf.tail) != NULL) left = b->size - b->used; /* -3- */ else left = 0; if (left < size) { /* -4- */ /* This is the slow path - looking for new buffers to use */ if ((n = tty_buffer_find(tty, size)) != NULL) { if (b != NULL) { b->next = n; b->commit = b->used; } else tty->buf.head = n; tty->buf.tail = n; } else size = left; } spin_unlock_irqrestore(&tty->buf.lock, flags); return size; }
Now things start to get interesting, note how at -1- 'left' has type int while 'size' is of type size_t (aka unsigned long). Assuming we previously won the race and have written 257 bytes while the buffer was only 256 bytes large then we now have the following situation:
b->size is 256
b->used is 257

Looking at the code above, at -3- 'left' will now equal -1 and at -4- will be casted to an unsigned value, resulting in 18446744073709551615 (assuming 64 bit long) which is definitely larger then the given size. The following block will be skipped and no new buffer will be allocated for the current request even though the current buffer is more than full.
At this point sending more data to the pty will result in the data being put into the same buffer, overflowing it further (remember 'used' is used as an index into the buffer). Since b->used will still be incremented for each byte we can now overflow as much data as we want.
Also note that this function is locked internally (at -2-), thus serializing access to it.

Now we are ready to draw an updated scenario that leads to an overflow:
        A (Slave)                          B (Echo)

tty_buffer_request_room                 
        |                     // waiting for A to release the lock
                              tty_buffer_request_room 
                              // tb->used < tb->size,
                              // no new buffer is allocated
memcpy(.., 256);
                                        
                              memcpy(.., 1);

tb->used += space; 
                              tb->used += space;    
                              // tb->used is now larger than tb->size


Note that we will win the race as soon as the echoing process enters tty_buffer_request_room and calculates 'left' before the first process gets to update tb->used. Since the whole memcpy() operation is in between, that time frame is relatively large.

So as far as race condition scenarios go, the single case mentioned in the commit message is only one possible way that can result in memory corruption (and only if A fills the buffer completely).
In general any sequence that results in tb->used being larger than tb->size will result in a memory corruption later on. For that to happen the first process must send data to completely fill a buffer (i.e. sending tb->size bytes in total) while the echoing process must enter tty_buffer_request_room() before the first process updates tb->used (this leads to tty_buffer_request_room() not allocating a fresh buffer). The corruption is then caused by sending more data to the pty which will continue to overflow the same buffer.

At this point the vuln turns into a standard kernel heap overflow.

And we'll conclude this section with fun fact: The race in this vuln can actually be won by using just one process. This stems from the fact that we are racing a kernel worker process and not a second user-land process.

Getting to root - The exploit

Here we want to quickly analyze the published exploit code which will hopefully be easy to understand now that the details of the vuln are known.

Going step-by-step with PoC's console output we see...

[+] Resolving symbols

Yep, that's what it's doing. Note that some modern distributions (notably Ubuntu) set /proc/sys/kernel/kptr_restrict to 1, thus disabling /proc/kallsyms. For repository kernels this is merely an inconvenience though since the kernel image (and System.map) can be downloaded locally and the addresses taken from there.

[+] Doing once-off allocations

Stabilizing the heap. We need to make sure existing holes are filled to maximize the chances of getting objects laid out linear in the address space. We want our target buffer to be followed by one of our target objects (struct tty_struct).

[+] Attempting to overflow into a tty_struct... 

Now we are racing.

This is fairly straightforward, open a pty, spawn a new thread and write to both ends at the same time. Afterwards the child thread will send the data needed to overflow into the adjacent chunk. Assuming the race has been won at the start then there is no time pressure on these operations as discussed above.
Also note that only one byte is sent to the master end, this is done so the number of bytes that has yet to be sent can be calculated.

The exploit targets tty_struct structures which end up in the kmalloc-1024 slab cache. The buffer we will overflow will thus have to be in that cache as well (so tb->size = 256 which is also the minimum size). Before writing to the slave end the first time (to allocate a fresh buffer) the exploit creates a bunch of new pty's, thus allocating tty_structs in kernel space. It will then close one of them in hopes that the newly allocated buffer will end up in the freed chunk. If this works out we will have a bunch of tty_structs, followed by the buffer followed by more tty_structs in the kernel address space.

Let's take a quick look at the function executed by the new thread to overflow into the following chunk:
void *overwrite_thread_fn(void *p) { write(slave_fd, buf, 511); write(slave_fd, buf, 1024 - 32 - (1 + 511 + 1)); write(slave_fd, &overwrite, sizeof(overwrite)); }
The first write here will fill the previously allocated buffer (right after closing one of the pty's we allocated a new buffer by writing one byte to the slave fd). Note that the author assumes the buffer to hold 512 bytes while it's size is 256 (MIN_TTYB_SIZE). The reason for that is that on newer releases the kernel can use the flag buffer for data as well (if it knows the flags won't be needed), so the usable size of the buffer is doubled.

The next write fills the memory chunk of the buffer completely. The chunk is 1024 bytes large and so far we have written 32 bytes (sizeof(struct tty_buffer)) + 511 + 1 (the first write to the slave fd) + 1 (the echoed byte from the master fd).

The final write overwrites into the next heap chunk with a fake tty_structure previously created.

Now remember that tty_struct has a member 'ops' that is a pointer to a tty_operations struct? Well those ops members in the linux kernel are always pointers to structures holding function pointers themselves (if you're familiar with C++ this is similar to the vtable pointer of C++ objects). These function pointers correspond to actions performed on the device, there's one for open(), one for close() one for ioctl() and so on. Now assuming we have overwritten the object then 'ops' will now be under our control, pointing into user space. There we have prepared an array of function pointers pointing to our kernel payload.

Now as soon as we perform an ioctl on the tty device we will hijack the kernel control flow and redirect it into the payload. There we'll execute the standard prepare_kernel_cred(0) followed by commit_creds(), elevating our privileges to root:

[+] Got it :)

# id
 uid=0(root) gid=0(root) groups=0(root)


Note that SMEP/SMAP will prevent this exploit (as well as the grsecurity system) as they prevent the kernel from accessing user-land data (SMAP) and code (SMEP).

Limitations

Unlike most other race conditions, in the case of this vuln the attacker is only able to control one of the two processes. Kernel worker processes will check for new work items regularly but can't really be affected by user space. This seems to make a huge difference for different kernel versions, on 3.2 it usually only takes a couple seconds to win the race while on 3.14 it can take multiple minutes.

As mentioned in the PoC code another thing that limits the reliability is the slab cache size in use. As previously discussed the buffer can only be in one of the following slabs: kmalloc-1024, kmalloc-2048 and kmalloc-4096. At sizes this big the chance of hitting the last chunk in the last page of a slab becomes more likely, further limiting the reliability. When that happens the code will overflow into uncontrolled data. This might have no consequences (no important data has been overwritten), lead to a crash later on (some object has been overwritten that is referenced at some point in the future) or even lead to an immediate panic/Oops (for example when the next page is mapped read only).

As also mentioned in the PoC exploit the flags cause some trouble on older kernels (before the commit acc0f67f307f52f7aec1cffdc40a786c15dd21d9) as b->size bytes following the overwritten part will always be cleared to zero. Thus when overwriting a controlled object either the whole objects needs to be restored (and the zeros written into unused space before the end of the chunk) or an object needs to be found where parts of it can safely be overwritten with zeros.

For the last part it might be possible to target tty_buffer objects when exploiting the vuln on pre 3.14 kernels. Here the header can be overwritten, yielding an arbitrary write (overwrite char_buf_ptr and afterwards send data to the pty) while the zeroes can safely be written into the buffer space and won't cause any trouble.

Is Android vulnerable?

As stated in the advisory the vulnerability dates back to 2.6.x kernels, roughly 5 years old. That would imply that pretty much every android device out there is vulnerable to this issue. Running a quick PoC on a newer device (for example the Nexus 5, HTC One or Galaxy S4) it seems the race can never be won there though. Let's again take a look at some kernel source code, this time from the HTC One (m7) Cyanogenmod kernel source.

int tty_insert_flip_string_fixed_flag(struct tty_struct *tty, const unsigned char *chars, char flag, size_t size) { int copied = 0; do { int goal = min_t(size_t, size - copied, TTY_BUFFER_PAGE); int space; unsigned long flags; struct tty_buffer *tb; spin_lock_irqsave(&tty->buf.lock, flags); /* -1- */ space = __tty_buffer_request_room(tty, goal); tb = tty->buf.tail; if (unlikely(space == 0)) { spin_unlock_irqrestore(&tty->buf.lock, flags); break; } memcpy(tb->char_buf_ptr + tb->used, chars, space); memset(tb->flag_buf_ptr + tb->used, flag, space); tb->used += space; spin_unlock_irqrestore(&tty->buf.lock, flags); copied += space; chars += space; } while (unlikely(size > copied)); return copied; }
The Interesting difference is that at -1- we see that the function here is actually locked internally. Now as stated above to win the race the second process needs to enter __tty_buffer_request_room() before the first process updated tb->used. This is not possible if the function is locked like this.

Taking a look at the git history of the Linux kernel it turns out that all kernels between c56a00a165712fd73081f40044b1e64407bb1875 (march 2012) and 64325a3be08d364a62ee8f84b2cf86934bc2544a (january 2013) are not affected by this vuln as tty_insert_flip_string_fixed_flag() was internally locked there.

For Android that means quite a few of the newer devices are not vulnerable to this issue, most of the older ones are though and there are some newer ones integrated the 64325a3be08d364a62ee8f84b2cf86934bc2544a Linux kernel patch, making them vulnerable again.

Conclusion

Kernel exploits are hard, getting them reliable is even harder! This concludes our analysis of CVE-2014-0196, we hope you have gained some deeper understanding of this vuln and kernel level security in general. For more details on linux kernel exploitation you can take a look at our last post: How to exploit the x32 recvmmsg() kernel vulnerability CVE 2014-0038

If you have feedback or have worked on something similar let us know, you can email us at: info/at\includesecurity.com

Wednesday, May 21, 2014

Mobile App Data Privacy - the Outlook.com Example

By Paolo Soto (contribs. by Erik Cabetas)

In November of 2013 our research team spent some time reverse engineering popular mobile applications to get some practice reversing interesting apps. After reviewing these types of apps we noticed a trend that some messaging apps did not take any steps to ensure confidentiality of their locally stored messages. In light of similar issues having recently been deemed a concern on other platforms we thought we'd publish one of our examples to increase user awareness of such behaviors.

The application we're discussing here is Outlook.com free email service's mobile client offered by Microsoft. This app is described as being created by Seven Networks in conjunction or in association with Microsoft (i.e. looks like it was outsourced.) The app allows users to access their Outlook.com email on Android devices. In the course of our research we found that the on-device email storage doesn't really make any effort to ensure confidentiality of messages and attachments within the phone filesystem itself. After notifying Microsoft (vendor notification timeline is found at the end of this post) they disagreed that our concern was a direct responsibility of their software, in light of similar problems with iOS being deemed a concern by privacy advocates we thought it'd be a good idea to share what we see with the Outlook.com app.

Root Cause: A Common Problem with the Privacy of Mobile Messaging Messaging Apps

We feel a key security and privacy attribute of any mobile messaging application is the ability to maintain the confidentiality of data stored on the device the app runs on. If a device is stolen or compromised, a 3rd party may try to obtain access to locally cached messages  (in this case emails and attachments). We've found that many messaging applications (stored email or IM/chat apps) store their messages in a way that makes it easy for rogue apps or 3rd parties with physical access to the mobile device to obtain access to the messages. This may be counter to a common user expectation that entering a PIN to "protect" their application would also protect the confidentiality of their messages. At the very least app vendors can warn a user and suggest that they encrypt the file system as the application provides no assurance of confidentiality. Or take it the next level and proactively work with the user to encrypto filesystems at installation time.

The Outlook.com Mobile App Behaviors

We've found the following two behaviors of the app:
  • The email attachments are stored in a file system area that is accessible to any application or to 3rd parties who have physical access to the phone. 
  • The emails themselves are stored on the app-specific filesystem, and the "Pincode" feature of the Outlook.com app only protects the Graphical User Interface, it does nothing to ensure the confidentiality of messages on the filesystem of the mobile device. 
We feel users should be aware of cases like this as they often expect that their phone's emails are "protected" when using mobile messaging applications.

Recommendations to Users

We recommend the setting Settings => Developer Options => USB debugging be turned OFF. We further recommend using Full Disk Encryption for Android and SDcard file systems. This would prevent a 3rd party from getting access to any data in plain-text, from a messaging app or other apps that may choose to store private data on the SDCard.

Users may change the email attachments download directory, via Settings->general->Attachments Settings->Attachment Folder. It is advised not to set the download directory for attachments to be /sdcard/external_sd, as this will place email attachments on the removable SDCard (if one is in place).

For the tech and security folks reading this post, we'll now dive into how we investigated these software behaviors......

Behavior #1: Attachments are placed in a possibly world-readable folder.

Outlook.com for Android downloads email attachments to the SDcard partition by default. For almost all versions of Android this places the attachments in a world-readable folder. This would place downloaded email attachments in a storage area accessible to any user or application which can access the SDcard (e.g any app granted READ_EXTERNAL_STORAGE permission) - even if the phone was not rooted. A 3rd party would simply use ADB shell in order to find the attachments, which are located in /sdcard/attachments:

/sdcard/attachments and files inside are word readable on a device running Android 4.0.4.
The attachments can then be pulled from the device using ADB.
Bas Bosschert in his post shows how files from the SDcard may be uploaded to a server.  Hence using a similar technique a rogue application needs only the READ_EXTERNAL_STORAGE and INTERNET permissions to exfiltrate data from the SDcard to the Internet, these permissions are some of the most common permissions granted by users to applications upon installation.

Users of the latest Android 4.4 or later devices would not see this behavior as having security/privacy ramifications since the SDcard partition is not world readable on Android 4.4 and above. However note that Android 4.4 was released on October 31, 2013 and at the time of this writing a large market share of devices are not running this latest version of Android OS.

Behavior #2: Pincode does not  protect/encrypt downloaded emails or attachments.

Outlook.com provides a Pincode feature. When activated, users have to enter a code in order to interact with the application (launch it, resume it, etc). This feature is not enabled by default in the application: the user must manually enable this feature. We've found that the Pincode feature does not encrypt the underlying data, it only protects the Graphical User Interface, and we feel this is a behavior users should be aware of. This is something that a lot of people reading this blog might think is obvious, but we surveyed a couple non-tech users (hi mom!) and found that the expectation of privacy for the Pincode feature was present. Meaning the user expected that the Pincode would "...protect the whole thing, including the emails" -Mom.

A user manually creates the pincode after installation.

After 10 wrong pincode attempts the app will delete the account:

An incorrect Pincode is entered.

The Pincode functionality is located in the com.seven.Z7.app.AppLock class. When a Pincode is created it is passed to AppLock.createHashedPassword(). This creates a custom Base64 encoded SHA1 of the passcode which is stored in the preferences cache (in AppLock.saveHashPassword()). Whenever a Pincode is entered to unlock the app, the same custom Base64 encoded SHA1 is applied to the Pincode and then compared to the stored value for the unlock to succeed (method call: AppLock.testMatchExistingPassword(), called from AppLockPasswordEntry.validatePassword()).

The Pincode is sufficient to stop a party who only will try to access the outlook client via the phone screen interface. It will not prevent a party who  has access to the filesystem on the device via USB (e.g. ADB). If USB Debugging is enabled and the device is rooted a 3rd party would be able to access the cached emails database. The 3rd party would simply have to run ADB shell to navigate to the working directory of the Outlook.com application (which is /data/data/com.outlook.Z7) in order to find the databases folder:

Cached email is stored in an SQLite database located at /data/data/com.outlook.Z7/databases/email.db. 

A 3rd party could retrieve the email database file from a rooted phone via standard use of the adb utility. Or the backup trick outlined by Sergei Shvetsov would allow access to the app specific filesystem on a non-rooted mobile device by using an adb trick. First the email.db would have to be removed from the phone via adb and then the relevant data could be accessed by a utility such as sqlite3 (this whole thing can be automated to execute instantly)

Email bodies are stored in two tables, a plaintext Preview table containing a short snippet of the email, and a html_body table containing the full email including html markup.

Extraction of sensitive data is simpler if the sensitive data is in a short email or at the beginning of a long email, since the first few lines of the email will be placed in plain-text in the Preview column. In this example we have pulled out the body of an email from the Preview table in the database. In the example below we read a specific email (email _id #20) instead of dumping out all emails previews.

Example of reading a short email containing confidential data on the database via sqlite3.

If the email is longer than the Preview will store, this is not a problem, we just have to pull out the email and then read the html with something sane:


Email #18 was crafted to contain a large bit of text from wikipedia with credentials added to the end.

Using a web browser to read the emails stored in the html_body table. 

To read out the entire email spools just remove the WHERE clause from above; the WHERE clause was merely added for brevity.

Recommendations for Mobile App Developers

A good defensive measure would be to check the mobile device to see if encryption is enabled by calling getSystemServer() to obtain a DevicePolicyManager object, and then query that object with getStorageEncryptionStatus() to check if the device is encrypted with native filesystem encryption.

If the device is not encrypted, the application might show a prompt to ask the user if they'd like to apply Full Disk Encryption to both the device and the SDcard or accept the risk of not having encrypted filesystem before storing data in the Outlook.com for Android application.

Alternatively; the Outlook.com for Android app could use 3rd party addons (such as SQLcipher) to encrypt the SQLite database in tandem with transmitting the attachments as opaque binary blobs to ensure that the attachments can only be read by the Outlook.com app (perhaps using the JOBB tool). These methods would be useful for older devices (such as devices that run Android 4.0 and earlier) that do not support full disk encryption.

Digital Forensics Notes

Digital Forensics technicians interested in obtaining the account information for investigations would take note of the following information.
  • The application working directory on the android device is located at: /data/data/com.outlook.Z7
  • The subdirectories contain the following information:
    • cache/ - contains the various webcaches for content being pulled into webviews.
    • databases/ - contains the database files where the content of messages, emails, and contact lists are kept
    • files/ - log files for the client and the engine
    • lib/ - empty
    • shared_preferences/ - contain xml files which reflect the state of the user options on the client
  • The account email username is kept in the file: /data/data/com.outlook.Z7/shared_prefs/com.outlook.Z7_preferences.xml 
  • The username can be found by searching for the string "email_addresses" in that file. 
  • The actual human name of the account is stored in the log files located at: /data/data/com.outlook.Z7/files Looking for the string "connector" within the files in that directory will show the name and account information. 
  • Technicians should be able to retrieve the emails by rooting the phone and retrieving the file at: /data/data/com.outlook.Z7/databases/email.db
  • Once rooted, attachments can be found in the folder: /sdcard/Attachments/
  • The default email address may also be found by looking for the string "email_default_from_address" in the file: /data/data/com.outlook.Z7/shared_prefs/com.outlook.Z7_preferences.xml
Also note that contacts and messages are also stored in the email.db database which may contain additional proof of a communication between two parties.

Message content and contact information are stored in the email.db.

Lost Feature: Encryption?

We discovered the following which may indicate a possible future role for encryption in the Outlook.com app.

The DecryptingSQLiteCursor implementation could be used to decrypt items in a database column (such as email contents).
In order to test the extent of the "in place" encryption infrastructure, we decided to force-enable encryption by changing the value of Z7MetaData.ClientConfig.UseEncryption in the AndroidManifest.xml to true, and then recompiling and reinstalling the Outlook apk. 

This did trigger encryption in the subject, and html_body columns (the body column is not used) but did not encrypt the preview column in the database:
Again, please note this was just an experiment. The encryption isnot enabled in the app, and the encryption feature set may be incomplete. We hope the encryption mechanism scaffolding we see here can be modified and included in a future release.

Vendor Coordination

Microsoft Security Response Center was notified via encrypted email of these observed behaviors on December 3rd, 2013. The key message in the response received that same day was "...users should not assume data is encrypted by default in any application or operating system unless an explicit promise to that effect has been made." On May 15th 2014 we contacted Microsoft asking for reconsideration of our report and mentioning our plans to publish this research. They re-stated their position: users of the app should not expect encryption of transmitted or stored messages.

Version Information

Our original research was conducted on the following app version below.

  • Application Label: Outlook.com
  • Process Name: com.outlook.Z7:client
  • Version: 7.8.2.12.49.2176
  • APK(s):
  • com.outlook.Z7-1.apk (SHA1- 14b76363ebe96954151965676cfc15761650ef7e)
  • com.outlook.Z7-2.apk (SHA1- 41339b21ba5aac7030a9553ee7f341ff6f0a6cf2)

We also confirmed that the relevant classes have not changed by doing a hash comparison of the classes in latest app version which was released May 6th 2014 (7.8.2.12.49.5701)

  • Version: 7.8.2.12
  • Build Number: 28.49.5701.3
  • Build Date: 2014-05-04
  • com.outlook.Z7-1.apk (SHA1 4ee3dc145f199357540a14e0f2ea7b8eb346401e)