Wednesday, March 9, 2011

Multi-browser heap address leak in XSLT

It's not often that I find a bug that affects multiple different codebases in the same way, but here is an interesting info-leak bug that is currently unpatched in Firefox, Internet Explorer and Safari.

I'm releasing it now for a few reasons:
  1. The bug was already publicly noted here.

  2. This bug cannot damage anyone in and of itself; it's a low severity info-leak that does not corrupt anything. It needs to be paired with other bugs, perhaps as an exploit aid against ASLR.

  3. This is a rare and unique opportunity to directly compare vendor responses and response times for a near-identical bug. It's nice that this is a lower-severity issue as all vendors tend to treat critical issues with at least some urgency; lower severity issues serve as a better differentiator.

The bug
The bug is in the generate-id() XPath function, and is sometimes used in XSL transforms. Here's an web page that simply calls generate-id() and renders the result as a web page:

Let's see how this renders in different browsers:

Firefox (64-bit Linux)

There is no "obfuscation" that this is a raw heap address. Since Firefox is open source, we can go and look at the source code to find that indeed, the string is generated from a pointer (txXPathNodeUtils::getXSLTId):
const char gPrintfFmt[] = "id0x%016p";

Internet Explorer 8 (Windows 7)

Doesn't look like a heap address, does it? If, however, you strip off the "ID" prefix and treat the string as a [A-Z0-5] base32 encoded "little endian" string, you resolve to a nice heap address. At that address is a pointer in msxml.dll, possibly the address of a vtable for some internal xml node class.

Safari 5 (Mac OS X)

Also does not immediately look like a heap address, but libxslt is doing a simple transform on a heap address:

val = (unsigned long)((char *)cur - (char *)0);
val /= sizeof(xmlNode);
sprintf((char *)str, "id%ld", val);

These object ids bounce around all over the place. I don't know what is going on so I'm not making the claim that Opera is affected.

Latest stable Chrome (Chrome 10) is not affected. It has been removed from the "time to fix" competition in order to keep things fair.

It's on!! Who will fix it first and who will be the security laggard? Updates to be provided via Twitter: @scarybeasts


Anonymous said...

I think the code is doing exactly what the programmers intended, so it's not technically a bug.

Anonymous said...

re: Anonymous

This IS a security mistake and should be fixed. All object ids should not release information about their low end C/C++ implementation as this could be used to infer the memory layout at the moment of exploiting a vuln or even plainly bypass ASLR if the address used is inside some loaded module, for example.

From a user perspective, there's always a layer of abstraction from the back end implementation and the front end language (JS/XSLT/VB/whatever).


Anonymous said...

How do you translate IDAW0MLB
into a heap address in details?

Jonas Sicking said...

Thanks for finding this!

Unfortunately it appears that the fix for libxslt contains a couple of bugs. The patch does indeed fix the leak of heap addresses, but it in the process breaks the functionality of the generate-id function.

The point of the generate-id function is that it's supposed to generate unique strings for each node. This string needs to remain unique for a given transformation.

However, the patch can generate the same id for two different nodes in two ways:

First off, the string it returns appears to be the difference between a node and its owner document. However since multiple documents can be in an XSLT transformation, this can generate the same value for two different nodes with different owner documents.

Second, it appears that in an effort to avoid dealing with negative values, the code uses the absolute value of this difference. Meaning that a node that is located 100 bytes after its owner document and a node that is located 100 bytes before its owner document, will get the same generated id.

Would be lovely to see that fixed as to avoid having transforms break more or less randomly in libxslt based XSLT implementations.

/ Jonas

Chris Evans said...

@Jonas: the second case is actually OK. The id prefix varies between "idp" and "idm" depending on whether the delta was positive or negative.

I'm less qualified to comment on the first case, but I believe the libxslt maintainer consulted the spec and noted that generate-id() only guarantees the id to be unique within the current document.

Perhaps something to be continued over e-mail? I can do intros.

Jonas Sicking said...

Yes, please lets chat about this over email. I think that is not a correct interpretation.

You can find my email in the bugzilla bug (

Anonymous said...

I can't believe that they would bother to allocate additional memory to generate an ID, when keeping a single pointer which you increment each time it's called would have worked perfectly fine.

Steffen said...

I dont think that any holes are closed. This smells like a begin from a big thing. Leave on it.