Saturday, July 25, 2015

vsftpd-3.0.3 released... and the horrors of FTP over SSL

I just released vsftpd-3.0.3, as noted on the vsftpd home page. It's actually been almost three years(!) since vsftpd-3.0.2, so things do seem to be getting very stable and calming down.

The exception to things getting very stable and calming down seems to be SSL over FTP, which has been a constant source of, uh, joy, for some time now. Some issues fixed relate to security and warrant describing here because I think they are interesting.

Cross-protocol MITM SSL connection rewiring to effect XSS
If this description sounds like a crazy weird vulnerability, you're right. The best public description is probably in this ProFTPd bug. Pretty awesome work by Jann Horn. It's a sufficiently involved issue that it's hard to pin down a root cause, but my primary take away would be: use different SSL certificates for different protocols. It's hard to predict what other cross-protocol confusions might be possible and different certificates for different protocols helps protect against the unknown.
That all said, vsftpd-3.0.3 drops the FTP connection if it sees HTTP command verbs, thus avoiding one known trouble for anyone who has an unfortunate certificate and server setup.

SSL session re-use workaround can be thwarted
Back in 2008, I blogged about a simple yet powerful attack which permitted stealing of in-progress FTP SSL data transfers. In the 2008 post, I seem to blame FTP clients but I don't think that's correct: the FTP protocol itself is broken for SSL transfers. (That was 7 years ago so perhaps I could have driven an RFC to fix the protocol by now; mea culpa.) In the face of a broken protocol, I've been working with Tim Kosse (of FileZilla fame) to try and kludge around the situation for years. The most solid solution is client certificate matching but that is not something that FTP clients do by default, and we wanted better security by default. Accordingly, vsftpd started to authenticate SSL data connections by requiring session re-use on SSL data connections. This did seem to work by default in many FTP clients.
Unfortunately, you can now go and read how Tim Kosse broke this defense. The kludge piled on top of existing kludges is to terminate the full FTP session in the event of an SSL session re-use failure on the data channel. Are we done with kludge stacking? We hope so but it's hard to be sure.

SSL upload data connections now must be shutdown correctly at the SSL layer
vsftpd-3.0.3 flipped a default setting so that SSL upload data connections must now be shutdown correctly at the SSL layer. Absent this setting, the FTP server can't tell whether the upload connection was closed with a TCP FIN (which any MITM can inject) vs. a proper shutdown over the SSL channel (secure). To put it in plainer terms, without this setting, the FTP server can't tell if a network attacker deliberately truncated the upload or not. I documented this area in 2008 (proposed root cause: the OpenSSL API is bad). At the time, FTP clients and servers were universally rubbish at SSL connection shutdown integrity. Since then, things are better, so I've flipped the default. Again, if you're using FileZilla as the FTP client, it goes to pains to do SSL well.

ECDHE support
Thanks again to Tim Kosse (this guy is on fire!), there's ECDHE support in vsftpd-3.0.3. It turns out that you need to incant a few magic lines of OpenSSL API in order to enable ECDHE. Also, just to make the point, vsftpd's default cipher list now consists of the single cipher ECDHE-RSA-AES256-GCM-SHA384. This is a modern TLSv1.2 cipher that is believed pretty solid. People needing interoperability with older clients may need to change or disable the cipher list.

FTP SSL data security is kind of tricky, and probably isn't 100% dealt with yet, but the combination of the latest FileZilla + latest vsftpd should be a reasonable start, if you really must use SSL over FTP.

Saturday, September 27, 2014

Exile for the BBC Micro; some elegant solutions

[Prelude: sorry, this has nothing to do with security whatsoever. Feel free to bail now if you're not interested in a classic 1980's game, and rest assured that non-security posts to this blog will remain extremely rare.]

The BBC Micro game Exile, released in 1988, has a realistic claim for the best game ever. I lost months of my youth to this game. I also lost a fair few days recently re-playing the game under emulation in 2014!

What the authors were able to do with just 32KB (for everything, including video RAM) was amazing. The art of coding in this way has simply been lost. The game features:

  • An enormous map featuring fairly open-ended exploration.
  • A full physics engine (gravity, momentum, conservation of momentum, buoyancy, friction).
  • Dozens of fiendish puzzles, characters and objects, with many interactions between entities.
There's even a great disassembly online. I was quite surprised to see that the game really is powered by real 6502 opcodes, and not unicorn tears.

The claim for "best game ever" isn't just about packing so much into such a small resource. Completing the game, even if you know what you're doing, is hours of immersive play that alternates between solving very varied puzzles and arcade-like blowing stuff up. Given how free-form the game is, there are also different solutions and orderings to to the game, so you can put your own personal spin on things.

If you want to see what all the fuss is about, the best emulator is probably B-em (part of a webring, remember those??) and the Exile game image can be readily found. And do feel free to stop reading to avoid the spoilers that now follow.

There seem to be two solutions published on the web. Unfortunately, both have triggered my OCD. Both have solutions for some of the more interesting problems that rely on abusing the limits and corner cases of the game engine, such as:
  • Using the built-in viewport scrolling to sneak around with the viewport scrolled to the extreme so that an enemy or obstacle does not "see" the player.
  • Abusing the fact that the physics engine "forgets" objects that are offscreen, causing corner-case and clearly unintentional behavior. (Give the poor game a break, it's trying to fit everything into 32KB!)
  • Proposing solutions with low reliability.
More significantly, these problems have such beautifully elegant solutions that once you see them, it's clear that you've worked out the authors' original intent. So without further ado, here is a small collection of videos that illustrates some elegant solutions as well as an easter egg:

Getting the alien weapon

The game features a tricky-to-get alien weapon! In fact you can get it from two different places. Generally, you can feed different types of imps different "gifts" and then they might later throw you a gift in return. In this video, these cyan imps will accept a blue mushroom. Later in the game, dark blue imps accept piranhas. I had found this later exchange, but this earlier exchange was a complete surprise to me -- I only noticed it reading the disassembly referenced above.
It's a real boon to get such a powerful weapon earlier in the game. It never runs out of energy and it has good destructive power for some obstacles that are otherwise annoying. Just watch out you don't burn yourself and that you don't blow the weapon up, it's destructible.

Getting the first coronium rocks out of the alcove

There's no need to try and force the rock past the blowing bush. Speedy throwing and viewport scrolling are not necessary. There's a simple, elegant sequence that will rescue it reliably and without dubiousness.

Blowing open the rune door

The first two-thirds or so of the game are building up towards opening a very important door into the bad guy's lair. This door is blown open with a nuclear explosion between two radioactive rocks. Given the importance of the door, it's not surprising that the final puzzles towards opening it have beautiful solutions.
Both published solutions transport radioactive rocks via a route that is clearly not supposed to be an option, and the route only works on account of abusing game engine quirks. Tut tut! Here's a less hacky way of getting the required rocks, in three parts:

There's a lot going on here:
  • In part #1, the use of the maggot to "wake up" the nest of green slimes is fun. Note that this doesn't always work! Game design bug? This is a very busy area of the map and the game engine often decides there's too much on-screen to spawn creatures from the nest. I lost a day stepping through 6502 assembler to understand this.
  • Then, the green slimes appear attracted to sound. So we made some noise!
  • The use of buoyancy to avoid the sucking bush is the first and only significant usage in a puzzle. Wonderful.
  • In part #2 and #3, the presence of the big fish prevents the (very dangerous) piranhas from coming out the nest and ruining your day.
  • In part #3, the piranha is actually immune to damage from acid drops -- I believe the only creature in the game that has this trait. (You can even check the disassembly :-)
Getting the mushroom immunity pill

Again, this puzzle involves blowing a door open with radioactive rocks. The published solutions suggest all sorts of hacks here, but there's a really neat solution with the "blaster" weapon that has been recently collected at this stage in the game. Previous weapons were projectile based but this one is force based and it can be used variously: at a distance to gently deviate the course of an acid drop, and then at close range to reliably launch a rock past a problem area.

Happy exploring.

Thursday, September 25, 2014

Using ASAN as a protection

AddressSanitizer, or ASAN, is an excellent tool for detecting subtle memory errors at runtime in C / C++ programs. It is now a productionized option in both the clang and gcc compilers, and has assisted in uncovering literally thousands of security bugs.

ASAN works by instrumenting compiled code with careful detections for runtime errors. It is primarily a detection tool. But what if we attempted to use is as a tool for protection?

The case for using ASAN-compiled software as a protection is an interesting one. Some of the most severe vulnerabilities are memory corruptions used to completely compromise a victim's machine. This is particularly the case for a web browser. If an ASAN-compiled build can help defend against these bugs, perhaps it has value to some users? An ASAN build is slower enough that no production software is likely to ship compiled with ASAN. But the slow down is not so bad that a particularly paranoid user wouldn't be able to easily accept it on a fast machine.

With that trade-off in mind, let's explore: does ASAN actually provide protection? To answer that, let's break memory corruption down into common vulnerability classes:

1. Linear buffer overflow (heap, stack, BSS, etc.)
A linear buffer overflow is one where every byte past the end of a buffer is written in sequence, up to some end point (example). For example, a memcpy() or strcpy() based overflow is linear. Because of the way ASAN works, I believe it will always catch a linear buffer overflow. It uses a default "redzone" of at least 16 bytes, i.e. touching _any_ address within 16 bytes of a valid buffer will halt the program with an error. Under ASAN, a linear buffer overflow condition will always hit the redzone.
This is great news because linear buffer overflows are one of the more common types of security bugs, and they are quite serious, affording the attacker a lot of control in corrupting program state.

2. Non-linear buffer overflow
A non-linear buffer overflow is one where data is written at some specific (but often attacker-controlled) out-of-bounds offset relative to a buffer (example). These bugs can be extremely powerful. Unfortunately, because of their power, they are both favored by attackers and also not stopped by ASAN if the attacker knows they are targeting an ASAN build. Example C program:

int main()
  char* p = malloc(16);
  char* p2 = malloc(16);
  printf("p, p2: %p, %p\n", p, p2);
  p2[31] = '\0';

Compile it with ASAN (clang -fsanitize=address) and then run it and no error will be detected. The bad dereference "jumps over" the redzone to corrupt p2 via pointer p.

3. Use-after-free / double-free
ASAN does detect use-after-frees very reliably in the conditions that matter for current use cases: normal usage, and under fuzzing. However, if the attacker is specifically targeting an exploit against an ASAN build, they can pull tricks to still attempt the exploit. By churning the memory allocator hard (as is trivially possible with JavaScript), the condition can be hidden. Example C program:

int main()
   int n = 257 * 1024 * 1024;
   char* p2;
   char* p = malloc(1024);
   printf("p: %p\n", p);
   while (n) {
     p2 = malloc(1024);
     if (p2 == p) printf("reused!!\n");
     n -= 1024;
   n = 30 * 1024 * 1024;
   while (n) {
     p2 = malloc(1024);
     if (p2 == p) printf("reused!!\n");
     n -= 1024;
   p[0] = 'A';

The bad reference is not trapped with default ASAN values. The default values can be changed such that the bad reference is trapped:

ASAN_OPTIONS=quarantine_size=4294967295 ./a.out

It's a shame that setting this value to "unlimited" may not be possible due to a probable integer truncation in parameter parsing, see how this behaves differently:

ASAN_OPTIONS=quarantine_size=4294967296 ./a.out

4. Uninitialized value
Uninitialized values are harder to categorize. The impact varies drastically depending on where the uninitialized value is a pointer or an integer. For example, for an uninitialized pointer, effects similar to "non-linear buffer overflow" might even apply. Or if the uninitialized value is a copy length then perhaps it's more similar to "linear buffer overflow".
Or, if it's an uninitialized raw function pointer, that's a bigger problem. Indirect jumps are not checked. The behavior of the following ASAN-compiled program is instructive (run it in the debugger):

void subfunc1()
  unsigned long long blah = 0x0000414141414141ull;

void subfunc2()
  int (*funcptr)(void);

int main()

If the uninitialized value is a pointer to a C++ class then similar (indirect) problems apply.

5. Bad cast
The effects of a bad cast are fairly varied! Perhaps the bad cast involves mistakenly using an integer value as a pointer. In this instance, effects similar to "non-linear buffer overflow" might be achievable. Or perhaps if a pointer for a C++ object is expected, but it is mistaken with a pointer to a raw buffer, then a bad vtable gets used, leading to program flow subversion. One final C++ example to illustrate this. Run under ASAN to observe a raw crash trying to read a vtable entry from 0x0000414141414141:

class A
  long long val;

class B
  virtual void vfunc() {};

int main()
  class A a;
  a.val = 0x0000414141414141ull;
  class B* pb = (class B*) &a;

Safer ASAN?
There's certainly scope for a safer variant of ASAN, specifically designed to provide safety rather than detection. It would be based on various changes:

  • Change the dereference check from "is this dereference address ok?" to "is this address in bounds for this specific pointer?". This takes care of the nasty "non-linear buffer overflow" as well as some of the worst effects of bad casts. This is not an easy change.
  • Initialize more variables: pointer values on the stack and heap. (This is not as easy as it sounds, particularly for the heap case, where the casting operator may become a point of action.)
  • Make the quarantine size for use-after-free unlimited. This burns a lot of memory, of course, but may be acceptable if fully unused pages are returned to the system with madvise() or even a crazy remap_file_pages() trick.

Remaining risks

Of course, even a "safer ASAN" build would not be bullet-proof. Taking the specific case of an safer-ASAN compiled Chromium, there would still be additional attacks possible:

  • Plug-ins. Many plug-ins are closed source and therefore cannot be replaced with ASANified versions. The safer build of Chromium would have plug-ins disabled: --disable-plugins or even at compile time.
  • Native attack surfaces called by the browser. For example, what happens when the browser encounters a web font. It'll probably get passed to a system library which parses this dangerous format using native code. In extreme cases, such as older Chromium on Windows, fonts were parsed in the kernel(!). --disable-remote-fonts, probably other flags.
  • Native attack surfaces triggerable by the browser. Less obviously, there can be operating system mechanisms that kick in simply because a file is downloaded or appears on disk. Anti-virus is notoriously buggy in this regard.
  • The v8 JIT engine. Any logic error in the JIT engine resulting in the emission of bad opcode sequences, or sequences with buggy bounds checks, are pretty toxic.
  • Pure logic vulnerabilities. UXSS vulnerabilities will remain unmitigated. In extremely rare but spectacular cases, unsandboxed code execute has been achieved without the need for memory corruption at all.

That all said, a stock ASAN build -- and even more so a hypothetical safer-ASAN build -- provide significant mitigation potential against memory corruption vulnerabilities. One measure of how strong a mitigation is, is whether is totally closes the door on a subset of bug classes or bugs. Even for the stock ASAN case, it appears that it does (linear buffer overflows for a start).

There is certainly more room for exploration in this space.

Thursday, June 5, 2014

Execute without read

A couple of years ago, during an idle moment, I wondered what we could do if we had the hardware CPU primitive of pages with permissions execute-only (i.e. no read and write):

It turns out that aarch64 has exactly such support. Here's support heading in to the Linux kernel:

The original idea was to defeat ROP by having all of the instructions randomized a bit on a per-install basis. You know, the usual tricks such as applying equivalence transforms on the opcode stream. Such an approach would have some obvious downsides such as diagnosability and let's face it, implementing this would also feel a bit hacky. Can we do better?

Maybe we can. The original idea focused on the attacker knowing where the binaries are in virtual address space, but not knowing or being able to read or otherwise predict the content. What if we instead keep the binary content stable but try and make sure the attacker cannot discern the location of the binaries? With enough ASLR entropy, this would be an interesting approach.

For the sake of the exercise, imagine the attacker has the most powerful of bugs: an arbitrary read/write primitive relative to an existing heap location. The attacker can follow heap pointers to the stack, the BSS, vtables, etc. At first, this sounds prohibitively hard to deal with. But for every way the attacker might try to leak the address of the binary, there currently seems to be a solution:

  • The heap is riddled with vtable pointers. If the attacker follows a vtable pointer, they get to read function pointers and the location of the binary is revealed. We fix this in one of two ways: either get sneaky and turn vtables into code (jmp 0xblah) instead of data, and reuse our exec-without-read primitive. Or we burn a register (aarch64 has lots) as a storage for a secret ASLR base for the binary.
  • The heap is riddled with raw function pointers. We can redo function pointers as something like single-slot vtables and use the above trick. We don't want to directly store function pointers in writable memory as a relative position to our secret register, because the attacker could then easily jump to an arbitrary point in the binary.
  • The BSS and data sections are typically stacked adjacent to the binary. We need to not do this, so that pointers into the BSS and data sections do not reveal the location of the binary.
  • The stack contains saved return addresses. These return addresses reveal the address of the binary. And for sure, the heap will contain pointers to the stack from time to time. Separating your stack into control flow and data will sort this out -- perhaps burning another register to keep the control flow stack separate and at a secret location.
  • JIT engines are a pain. And your heap is going to contain chains of pointers leading to the JIT pages. Depending on the type of JIT engine, there are various tricks that can be pulled. Enumerating them here is going to make the post too long. Some of the more amusing tricks including having the kernel ban syscalls from a writable page.
Perhaps at this point we decide that the hacks are piling up and add an indirection to all indirect jumps that uses a secret register for the binary location, and an offset into a table of valid jump locations. (I think this maybe where @comex was heading in a tweet in a discussion today:

Such a system isn't going to be invulnerable to memory corruption, but it _is_ going to be a significant pain to attack. The most obvious remaining attack is probably to read a couple of different vtable pointers and interchange them, calling an arbitrary attacker-chosen _existing function_ in the binary. If your binary has function pointers to system() in the heap, you're going to be in trouble. But generally, going after the kernel is going to be hard. Valid functions in your binary are unlikely to have the side effect of calling syscalls with bad parameters.

We also find ourselves wondering if we've sort-of re-implemented something like NaCl, although the performance characteristics and granularity of attacker-chosen code blocks will be different.

Crazy idea? Plausible direction?

[Thanks to Lee Campbell for helping with discussions and this blog post]

Friday, March 21, 2014

Together, we can make a difference

A couple of weeks back, I released a popular spreadsheet which lists many of the Adobe Flash Player 0-days used to harm people in the wild since 2010. I counted 18 and countless kind Twitterers pointed out some I may have missed. It was an interesting exercise, of course with an ulterior motive!

Looking beyond the raw counts, the spreadsheet shouts two items:
  • We should want to make a difference. The harm done from all these 0-days is just a litany of awfulness. We have harm to democracy activists and the human rights organizations that try to help these people. We have harm to American defense interests, aka. espionage. We have harm to corporations, aka. theft and economic damage.
  • We can make a difference! If you look at the data, you'll see 7 memory corruption 0-days in a year, starting mid-2010. After this year, Tavis Ormandy's famous Flash security rampage landed (80+ fixes), with follow-up patches such as 7 fixes here. Almost a year passes between Flash memory corruption 0-days after Tavis' work. You should call him a hero. (You should also call Mateusz Jurczyk, Gynvael Coldwind and Fermin Serna heroes too. They continued Tavis' work, have a look at the CVE count in this Adobe advisory to appreciate their work.)
Whilst it's true that Flash 0-days have seen a resurgence in Dec 2013 - Feb 2014, this does not invalidate the data that the whitehat community made a difference in 2010 - 2011 onwards. If anything, the data suggests that attackers have regrouped and refocused their research efforts to target areas that are still fertile. We can certainly do the same and put down this resurgence.

How you can help make a difference

Join us in the whitehat world. When you entered the greyhat world, they told you you'd be helping catch terrorists, didn't they? Recent and ongoing revelations show that no, in fact the biggest use of your work was enabling mass surveillance, the compromise of foreign nations and even the compromise of foreign corporations. If you want to make an actual difference, see above for where defensive help is needed.

Join us working on Flash and other important software. Many of us are working hard to provide reasonable avenues of reward for those who work on important software in the whitehat community. For example, the Internet Bug Bounty includes Flash as a category. For Flash vulnerabilities where exploitability is near-certain, we're rewarding up to $10,000 -- we have rewarded at this level three times already. We also anticipate $5,000 as a popular reward level for vulnerabilities that are likely exploitable but not proven. I previously blogged about $10,000 example here.

What are you waiting for? Join us and we'll make a difference. You'll get some good coin as a side-effect.

Thursday, February 20, 2014

Internet Bug Bounty issues its first $10,000 reward

One of my side projects is as an adviser and panelist for the non-profit Internet Bug Bounty (IBB). We recently added Adobe Flash Player as in scope for rewards.

Earlier today, David Rude collected $10,000 for a vulnerability recently fixed in APSB13-28. My thoughts on this are too long to fit into a tweet, so I summarize them here:

  • This shows that the IBB is serious about rewarding research which makes us all safer. $10,000 is a respectable reward by modern bug bounty program standards. It is also shows that when we give the reward range as "$2000 - $5000+", we are serious about that little plus character!
  • David Rude is a hero. This vulnerability was found being exploited in the wild. Recent research by Citizen Lab has linked the exploit to a morally dubious company, targeting of journalists and regimes with poor human rights records. Getting this bug fixed is a service to all internet users, democracy and human rights.
  • The IBB culture is to err on the side of paying. Note that David did not discover the vulnerability himself; he discovered someone else using it. IBB culture is to look mainly at whether a given discovery or piece of research helped make us all safer. Our aim is to motivate and incentivize any high-impact work that leads to a safer internet for all.
  • The vulnerability was never in fact reported to IBB! Wait, wut? It's true. The vulnerability went via Adobe's standard channels. IBB does not want or need details of unfixed vulnerabilities -- that would violate strict need-to-know handling. Once a public advisory and fix is issued, researchers or their friends may file IBB bugs to nominate their bugs for reward. Or, for important categories such as Flash or Windows / Linux kernel bugs, panel members keep an eye out for high impact disclosures and nominate on the researchers' behalf. Because we care.
Join us for the common good of a safer internet. You can help by doing your research in the open, targeting high-impact vulnerabilities or even becoming a new corporate sponsor. If we all pull together we can make a difference.

Sunday, December 29, 2013

vtable protections: fast and thorough?

Recently, there's been a reasonable amount of activity in the vtable protection space. Most of it is compiler-based. For example, there's the GCC-based virtual table verification, aka. VTV. There are also multiple experiments based on clang / LLVM and of course MSVC's vtguard. In the non-compiler space, there's Blink's heap partitioning, enabled by PartitionAlloc.

It seems, though, that these various techniques require the user to choose between "fast" or "thorough protection". This isn't ideal. Shortly, I'll document my own idea for how to try and get both fast and thorough. But first, a recap on what we mean by fast and thorough.

Fast vtable protection

Protecting vtables typically involves inserting machine instructions around vtable pointer load or virtual calls. Going fast is simple: only insert a very small number of fast instructions (i.e. no hard-to-predict branches). This is the approach taken by vtguard. If you look at page 14 in the vtguard PDF linked above, you'll see that there's just a single cmp and a single jne (short, and never taken in normal execution) added to the hot path.

Tangentially, another task commonly undertaken when adding vtable protections to a given program is to remove as many virtual calls as possible, by annotating classes and methods with the "final" keyword and/or applying whole-program optimizations.

Thorough vtable protection

Describing what we want in a thorough vtable protection is a little more involved. We want:

  • Defeating ASLR does not defeat the vtable check. (vtguard lacks this property, whereas the GCC implementation has it.)
  • Only a valid vtable pointer can be used.
  • Furthermore, only a vtable pointer corresponding to the correct hierarchy for the call site can be used. 
  • Ideally, only a vtable pointer corresponding to the correct hierarchy level for the call site can be used.

A fast solution for thorough vtable protection?

How can we get all of the protections above and get them fast? My idea revolves around separating the problem into two pieces:

  1. Work out whether we can trust the vtable pointer or not.
  2. Validate that the class type represented by the vtable pointer is appropriate for the call site.
To trust or not to trust?

Current schemes trust the vtable pointer or not, based either on an some secret (vtguard, xor-based LLVM approach), a fixed table of valid values (GCC, some LLVM approaches) or by constraining values that might appear in the vtable position (heap partitioning).

The new scheme would be to reserve a certain portion of the address space for vtables. We know that nothing else can be mapped there, so by suitably masking any proposed vtable pointer, we know it is valid. I haven't fully thought this through for 32-bit, but look at this 64-bit variant:
  • Host vtables in the lower 4GB of address space.
  • Use the dereference of a 32-bit register to load the vtable entry. This provides masking for free and even saves a byte in the instruction sequence. It works because loading 4-bytes into a 64-bit register zero extends the result.
  • Optionally, save memory by having the compiler use 4-byte vtables.

This scheme is approximately free, maybe even performance positive in some situations. Furthermore, one possible implementation is to stop somewhere around here for a very fast protection scheme that is "ok" in thoroughness.

On the downside, you've lost the 64-bit invariant that "nothing is mapped in the bottom 4GB", but the percentage of space used is going to be small. If that bothers us, then we can use the same trick to load a 4-byte vtable pointer and then "or" it with 0x100000000 (use bts if you dare) or some other value.

Validating class type

Once you know you trust your vtable pointer, validating the class type becomes a lot simpler. Instead of messing with secrets inside the vtable, you can just store a compact representation of the class type inside the vtable, with the aim of satisfying validation needs with a single compare.

The one trick we want to play is to make it easy to validate various different positions in a class hierarchy with minimal work. To do this, we can store class details in a hierarchical format. To take a simple case, imagine that we have the following classes in the system:

A1, A1:B1, A2, A2:B1, A2:B1:C1

We encode these using one byte per hierarchy level, the basemost class being the LSB: 00000001, 00000101, 00000002, 00000102, 00010102. (Note that this will be an approximation. For example, if you have more than 256 basemost classes with virtual functions, you would need to represent the first level with 2 or more bytes.)

Finally, our "is this object of the correct type for the callsite?" check becomes a simple compare. Depending on the position in the hierarchy, we may be able to achieve the compare with no masking and therefore a single instruction.

For example, for a call site expecting an object of type A1, it's just "cmpb $1, (%eax)". That's a 4-byte sequence, which is much shorter than the 10-byte sequence noted in the vtguard PDF. For a call site expecting an object of type A2:B1, it's "cmpw $0x102, (%eax)".

Closing notes

Will it work well? Who knows. I haven't had time to implement this, nor am I likely to in the near future. Feel free to take this and run with it.

Note that this idea doesn't cover what to do with raw function pointer calls. If you want to head towards complete control flow integrity, you'll want to look at protecting those, as well as return addresses (the current canary-based stack defenses do nothing against an arbitrary read/write primitive).