Hacking your hacking tools: When you absolutely must decode ProtoBuf

Earlier this year we did a web application assessment where our client made extensive use of protobufs sent over HTTP. For those who haven’t come across it, Protobuf is a library developed by Google for serializing messages to a compact binary format. Protobufs are often used for developing different types of network protocols, and sometimes they are used to serialize data that will be sent over HTTP, a situation where encoding data in a human-readable format like JSON or XML is more common.

We like to use Burp Suite when auditing anything that works over HTTP, and when applications serialize data in a human-readable format, it’s easy to use Burp to modify that data. With a binary format like protobufs, however, modifying an encoded message by hand is tedious and error-prone, so we decided to try the (Burp Protobuf Decoder plugin by Marcin Wielgoszewski.) This post details our experience working with the Burp Protobuf Decoder plugin, the problems we had getting Burp set up to test this particular web app, and how we solved those problems.

As we started testing, our Burp session filled with binary data in our proxy history. When we loaded the plugin into that session, it didn’t add any “protobuf” tabs or decode anything. We quickly realized that this was because the plugin was looking for messages with a content-type header of “application/x-protobuf”, while the application was using a slightly different content-type. Changing the plugin code to look for the modified content-type header let us see the contents of the protobufs more easily, but we still couldn’t edit them.

We wanted to edit the contents of the messages, but to see why we couldn’t, and what we would have to do to be able to edit them, let’s back up and look at how protobufs are defined. Protobuf message formats are defined in the protobuf language and stored as .proto files. The .proto files are then compiled into source code for the language where you want to use them. The Burp Protobuf Decoder plugin allows you to modify protobufs once you’ve loaded the message definition .proto files; without them, it falls back on using the protoc tool to decode messages.

The protoc tool can decode binary messages without access to the original .proto definition files, but it doesn’t support re-encoding messages. This is because some information is lost when encoding the messages, making encoding messages without the message type definition difficult. When you only have the information in the binary messages to go by, the message field types are ambiguous, and it also isn’t always clear whether some fields are optional or can be repeated. Of course, the names of fields and enumerated values are not included in binary messages either.

We were lucky because we were doing a greybox assessment, meaning we had access to the .proto files (as well as the rest of the application source code). At the same time we were unlucky – when we tried to load the .proto files into the Burp plugin, some of them would refuse to load, instead causing Java exceptions to be thrown with the message “Method code too large!”

The Protobuf Decoder plugin loads message definitions by first compiling the .proto files into python code using the standard protoc command and then importing the python files on the fly. Burp extensions written in python are run using the Jython python implementation, and it turns out that Java doesn’t support methods larger than 64k. This is the reason we were getting the “Method code too large!” exception – Jython was trying to load the python code generated by protoc into Java methods, but they were too big for Java.

For most developers, the solution to the “Method code too large!” exception is to break up their python code into smaller files and methods. In this situation however, our python code was generated by protoc, and it wasn’t very clear how to split it up. Instead, we decided to try splitting up the problematic .proto files into multiple smaller .proto files so that each generated python file would be smaller. This solution eventually worked.

The problem with this solution is that it’s not necessarily easy to split up .proto files because of dependencies between type definitions. Protobuf messages can have fields that contain other message types. A message definition can reference another message definition in the same .proto file, or in a .proto file that it imports, but protoc can’t handle circular dependencies between .proto files.

For example, let’s say you’re trying to split a.proto into a1.proto and a2.proto. If you have a2.proto import a1.proto, you can’t have a1.proto import a2.proto. That means that you have to split the file so that none of the message definitions in a1.proto depend on those in a2.proto.

Say this is a.proto:

message Foo {
  required Bar bar = 1;
}

message Bar {
  optional Qux qux = 1;
}

message Baz {
  repeated Foo foo = 1;
}

message Qux {
  required int32 q = 1;
}

To safely split it into two, you have to carefully arrange your message definitions. Here is a1.proto:

message Bar {
  optional Qux qux = 1;
}

message Qux {
  required int32 q = 1;
}

And here’s a2.proto:

import "a1.proto";

message Foo {
  required Bar bar = 1;
}

message Baz {
  repeated Foo foo = 1;
}

Doing this programatically would require code to parse and re-write .proto files. Luckily, there were only a few .proto files that were giving us trouble, and we were able to split them up by hand relatively easily. We split each of them into two .proto files, which compiled to make python files small enough for Jython to load. We loaded the smaller .proto files into the Burp plugin, allowing us to view and edit messages in Burp and finally do the tests that we wanted to try.

In this case we were unlucky that the .proto files we were given were big enough to cause trouble, but we were able to use Wielgoszewski’s plugin and some .proto file hacking to get our hacking done. We hope sharing this experience will save you or another web app hacker some headaches when trying to work with protobufs in Burp!

Leave a Reply

Discover more from Include Security Research Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading