Thursday, March 26, 2009

LittleCMS exploit

Now that new packages are out for lcms and OpenJDK, I'll publish my LittleCMS exploit. It's harmless in that if it actually works on your machine, all it does it put your CPU into a spin -- watch out for 100% CPU usage. It's also relatively harmless in that it doesn't work on many systems out of the box. I targeted my 32-bit Ubuntu 8.10 laptop which happens to have an executable heap, executable stack, no stack cookies but does have ASLR.

Here's the sample JPG file with embedded evil ICC profile: http://cevans-app.appspot.com/static/CVE-2009-0733.jpg

I'm only bothering to write about this because the story behind the exploit contains a few interesting twists which illustrate the iterative constraint solving used in modern exploits:
  • The underlying code flaw is a stack-based buffer overflow. But the data going past the bounds is not arbitrarily user-controlled. (If it were, a traditional stack smashing exploit would work, but the ASLR could affect the reliability of the exploit). Here's the faulty code in cmsio1.c, where nCurves can end up greater than MAXCHANNELS:

    static
    LCMSBOOL ReadSetOfCurves(LPLCMSICCPROFILE Icc, size_t Offset, LPLUT NewLUT, int nLocation)
    {
    LPGAMMATABLE Curves[MAXCHANNELS];
    ...
    for (i=0; i < nCurves; i++) {

    Curves[i] = ReadCurve(Icc);
    ...

  • The data going past bounds is actually pointers to heap chunks (returned by ReadCurve). This is nice because it takes ASLR out of the equation. We'll automatically overwrite %eip with a pointer to a valid heap address. But what is in that heap chunk? If it were arbitrary user controlled data, we'd have game over already, but unfortunately it is not. We're looking at pointers to this structure:

    struct GAMMATABLE {
    unsigned int Crc32;
    int Type;
    double Params[10];
    int nEntries;
    WORD GammaTable[1];
    }

  • There are two types of constructs in the input ICC profile used as a basis to populate this structure: curv and para. curv is of little use to us because it mostly leaves Crc32 set to 0 (or set based only on 16 bits of input entropy). Trying to execute the code 0x00 0x00 is a crash because it dereferences the %eax register: add %al,(%eax), and the value of this register is left as 0 or 1 to denote success of failure when the ReadSetOfCurves function exits.

  • This leaves us looking at a para curve, which calculates Crc32 based very indirectly on some input variables under our control. Although we can't reverse it, we can brute force it with a little program:

    #include "lcms.h"

    static
    void AdjustEndianess32(LPBYTE pByte)
    {
    BYTE temp1;
    BYTE temp2;

    temp1 = *pByte++;
    temp2 = *pByte++;
    *(pByte-1) = *pByte;
    *pByte++ = temp2;
    *(pByte-3) = *pByte;
    *pByte = temp1;
    }

    static
    double Convert15Fixed16(icS15Fixed16Number fix32)
    {
    double floater, sign, mid, hack;
    int Whole, FracPart;

    AdjustEndianess32((LPBYTE) &fix32);

    sign = (fix32 < 0 ? -1 : 1);
    fix32 = abs(fix32);

    Whole = LOWORD(fix32 >> 16);
    FracPart = LOWORD(fix32 & 0x0000ffffL);

    hack = 65536.0;
    mid = (double) FracPart / hack;
    floater = (double) Whole + mid;

    return sign * floater;
    }

    int
    main(int argc, const char* argv[]) {
    unsigned int crc;
    unsigned char* p_crc;
    double params[10];
    int type = 0;
    unsigned int i;
    unsigned int start = atoi(argv[1]);

    for (i = start; i <= 0xffffffff; ++i) {
    if ((i % 10000) == 0) {
    printf("progress: %u\n", i);
    }
    params[0] = Convert15Fixed16(i);
    LPGAMMATABLE table = cmsBuildParametricGamma(4096, type + 1, params);
    crc = table->Seed.Crc32;
    p_crc = &crc;
    if ((p_crc[0] == 0x93 || p_crc[0] == 0x95 || p_crc[0] == 0x97) &&
    p_crc[1] == 0xff &&
    p_crc[2] == 0xe6) {
    printf("got it!!!!!!! %u %u\n", i, p_crc[0]);
    return 0;
    }
    free(table);
    }
    return 1;
    }

  • What does this program do? Let's see:

    chris@chris-desktop:~/progs$ ./a.out 3221970952
    got it!!!!!!! 3221970952 151

    This is telling us that a para curve of type 0 whose 4 input bytes are 3221970952 == 0xC00B6008 will result in 0x97 0xff 0xe6 0x?? being written to Crc32. We don't care about the last byte. This assembles to xchg %eax,%edi jmp %esi which will execute because %eip jumps to this para heap chunk, which starts with the CRC. It is urgent to do something in 4 bytes or less because we have terrible control over the rest of the content of this heap chunk. What these 3 bytes do is to overwrite %eax with %edi then jump to %esi. The significance here is that both registers we used were under our control because they were also restored from the stack we trashed with pointers to valid heap chunks.

  • So execution continues at the curve heap chunk pointed to by %esi. We arrange for this to be a curv type chunk. Earlier we dismissed them as useless because the 0 Crc32 will result in a dereference of %eax. But now, thanks to our para chunk, we've repaired %eax to point to a valid heap chunk! Unlike para chunks, curv chunks do contain arbitrary data we can supply, after a mostly-zero header. We've essentially used the control at the beginning of a para chunk to repair %eax and use the vast control at the end of a curv chunk. Before our arbitrary code executes, a bunch of now harmless 0x00 0x00 will execute, writing some junk at the start of one of our unused heap chunks. Finally, just before our arbitrary code, the value of nEntries in the header will be executed. This value is 0x02 0x00 0x00 0x00 which is add (%eax),%al add %al,(%eax). This trashes %eax a little bit before dereferencing it again, but only up to 256 bytes, so we're good and we could always use a different number of entries in our curv chunk. Certainly, a real payload would need more than 2 words.

  • The actual arbitrary code that executes is 0xeb 0xfe which is equivalent to for (;;); in C. Look out for an endian reversed instance of those two bytes, as well as an endian reversed 0xC00B6008 in the exploit file.

  • There's one further twist in the exploit relating to stack alignment. Some compilation optimization leaves no space between the end of the Curves array and the start of the saved registers. Other cases have a 4-byte gap. The exploit caters for both these stack layouts by careful layout of curv vs. para chunks. Here's a simple illustrative table:


    Curve in input fileHit if 0 gapHit if 4 bytes gap
    17: blank curvebp4-byte gap
    18: curv payloadesiebp
    19: curv payloadediesi
    20: blank curvebxedi
    21: para redirect payloadeipebx
    22: para redirect payloadeip + 4eip
    As can be seen, %eip always gets the para payload and %esi always gets the real payload.
With thanks to Julien Tinnes and Tavis Ormandy for inspiration.

1 comment: