Monday, December 14, 2020

The cleverest floppy disc protection ever? Western Security Ltd.

Introduction

I've been on a bit of a floppy disc protection odyssey recently. This will probably be the last floppy disc related post for some time, so how better to end things than describe a stroke of genius I came across during my research of BBC Micro disc protection?

In my previous posts, we've already covered (directly or tangentially) some interesting floppy disc protection schemes:

  • Weak bits. [link: Weak bits floppy disc protection: an alternate origins story on 8-bit]. "Weak bits" protection is where a patch of disc surface is left empty of magnetic flux transitions. Absent a signal, the floppy disc drive itself will turn up the gain on its amplifier and see noise, which manifests as non-deterministic digital reads from the disc.
  • Fuzzy bits. [link: Technical Documentation - Dungeon Master and Chaos Strikes Back - Detailed analysis of Atari ST Floppy Disks]. "Fuzzy bits" protection is where magnetic flux transitions are recorded at intervals that are out-of-spec with regards to timing expectations (e.g. MFM). By using intervals right in the middle of a pair of expected values (e.g. 5us instead of the expected 4us or 6us), it's possible to cause the disc controller to return non-deterministic digital reads.
  • Long / short tracks. [link: Turning a £400 BBC Micro (1981) into a $40,000 disc writer (1987)] [see the "Capabilities Unlocked" section]. Long track protection is where flux transitions are written a little bit faster than they should be. This could be the whole track or just part of a track. In either case, the track will appear to contain more bytes than a track normally should. This works because the floppy disc controller usually has a broad tolerance of bitrate, to cater for disc drives naturally spinning at different speeds.
Now, we're going to cover something else entirely, with some unique properties that I think are clever.

Western who, now?


It's not readily possible to find any "Western Security, Ltd." company associated with the BBC Micro. So how do we know such a thing exists? Two reasons. Firstly, when developing my beebjit [link] emulator, I made numerous mistakes and had tons of bugs. While grappling with accurate floppy disc emulation -- in order to load original protected disc images -- I hit this:

Jolly Jack Tar by Sherston Software

Of course, the original author here missed out an amusing case. Either the disc is faulty (no), it's an illegal copy (not really?).... or a buggy emulator. Here's another error screen. Perhaps an earlier version of the protection scheme? The message uses "disk" instead of "disc" and does not entertain the fact the disc may be at fault:

Phantom Combat by Doctor Soft

(At the risk of digressing heavily, here's another screenshot. This is unrelated to Western Security Ltd., but follows the pattern of a dedicated screen for when the disc protected loader fails. This case is interesting because there's an entire decent image hidden from normal view!)

Disc Duplicator 3, if it thinks it's a copy

The second reason we know the name Western is from disc duplicator markers. A disc duplicator marker is a special track written by some commercial disc duplication solutions. It is typically one track after the end of the normal disc, i.e. the 41st track or 81st track of the disc. Despite being "out of bounds", most disc drives will seek at least one track out of bounds. On the track, you'll typically find a single sector header, and a sector body with a CRC error. This is an unusual construction and it sticks out, for example when my discbeast program [link] looks at the Jolly Jack Tar disc:

Discbeast's view of Jolly Jack Tar

There's a date and a text string or two embedded in the duplicator marker. The Jolly Jack Tar instance is:

86 07 01 (1986, July 1st)
523-037E        WESTERN NM,10/256 PROT DUP 5"-48/40 1S SD SS
32173-2Aw

There's one other less common duplicator marker string we see associated with Western security, for example Tens and Units by Sherston Software. It appears to be an earlier prototype version:

85 10 31 (1985, October 31st)
523-037C        BBC NM,10/256 WESTERN SEC. PROTO1 DUP 5"-48/40 1
70422-00w

A recap of disc protection fundamentals

Before diving in to how the Western Security protection works, let's recap the fundamentals of disc protection. The specifics are nuanced but the general principle is pleasantly simple:

  • The disc is easy to read but hard to write.
The disc has to be relatively easy to read. If the disc couldn't be read by standard disc controllers, you couldn't sell the disc because customers would not be able to load it!

And the correct way to make a disc hard to write is to carefully understand consumer disc controller chips and disc drives and make the disc impossible to write with standard kit.

The BBC Micro disc protection journey did not follow this route -- the earlier discs, and many of the later discs, were exotic but not particularly hard to create with the standard disc controller. But you did need a special program. The software companies had their special mastering program and the customers did not, so in a sense the disc was "hard" to write. But then, general copying programs such as Disc Duplicator 3 appeared and suddenly the discs were "easy" to write, if you or a friend had the advanced disc copying program.

However, a few BBC Micro pioneers did write discs that "can't" be written with the standard disc controller, most notably:
  • Simon Hosler's weak bits scheme, as linked above. (And here: [link].) [1]
  • Many discs, such as The Sentinel by Firebird, hid data between sectors. [2]
[1] One of a disc controller's main jobs is to write precise disc surfaces, i.e. to never write weak bits. However in the modern era I did find a way to trick the Intel 8271 controller into writing weak bits. Best I know, this technique was not discovered or used back in the day.
[2] With a bit of clever programming, I believe data between sectors could be written with standard disc controllers. It would involve writing to a track multiple times and resetting the disc controller at critical timing junctures. I've not seen any historic disc copying program that attempts to copy such discs, though.

Analyzing the Western Security Ltd. protected loader

This version of The Wizard's Revenge by Sherston Software features Western Security Ltd. disc protection.

Here's a gratuitous screenshot of the game.

And here's what discbeast makes of the disc:

There's nothing too unusual here. The red block at the end denotes a track with 1 sector that has a CRC error. It's not part of the disc protection; it's the duplicator marker. The green color represents a standard sector layout with nothing particularly exotic present. The "D" indicates the presence of deleted sectors. This is disc protection, but not exotic or hard to copy. However, loading the disc in my beebjit emulator, using the Western Digital WD1770 floppy disc controller and -log disc:commands, yields something highly unusual:

info:disc:1770: command $E4 tr 1 sr 3 dr 1 cr $29 ptrk 1 hpos 1884

Command $E4 is "read track". It is not used in the normal operation of disc loading, or indeed in any other known protected loaders.
It's time to walk through the loader a little to see what is going on. The goal of this post is not to give a detailed walk through of all the gnarly corners of the loader. Like many protected loaders, it is "encrypted" or more accurately, obfuscated. It de-obfuscates itself using many layers of self-modification. At PC $0657, a disc call is set up to load a bit more of the loader from deleted sectors in track 9. This could easily be confused as being the extent of the protection, but it is not. Later, at PC $3CF7, there is a loop:

[ITRP] 3CF7: JSR $3D16
[ITRP] 3CFA: LDA $75
[ITRP] 3CFC: SEC
[ITRP] 3CFD: SBC $3F4E
[ITRP] 3D00: TAX
[ITRP] 3D01: LDA $86
[ITRP] 3D03: STA $3F71,X
[ITRP] 3D06: INC $75
[ITRP] 3D08: LDA $3F4F
[ITRP] 3D0B: CMP $75
[ITRP] 3D0D: BCS $3CF7

It's not expected to be clear from this fragment alone (without all the subroutines), but what is going on here is a loop across tracks 1-8 inclusive. For each track, it is read fully and the length of the track is stored in a table at $3F71. Later, at PC $3CB6, these lengths are checked against a table located at $3D9B. The check is particularly interesting. For the copy protection to pass, a count is taken of each track length that falls within +-1 of the expectation, and this count must be 7 or 8. Immediately we see that this protection is not a precise technology. This is to be expected: discs are analog storage mediums underneath it all and there will be a bit of noise at the beginning or end of the track. From read-to-read, the same disc, drive and controller can and does see slight variations of track length. Looking at the load in beebjit, here are the encountered vs. expected lengths. (0x4A == 3122 bytes.)

My captured disc
3F71: 4A 4D 4B 4A 4B 4D 4D 4A
Expectations
3D9B: 4A 4C 4B 4A 4B 4D 4C 4A

As can be seen, my disc reads almost exactly to expectations except track 2, which was 3125 bytes instead of the expected 3124. That's within the +-1 range so all 8 tracks pass the check (7 or more are required).

To spell it out, the protected loader expects the track length (in FM encoded bytes) for tracks 1-8 inclusive to be 3122, 3124, 3123, 3122, 3123, 3124, 3123, 3122. That's an incredible level of precision and accuracy of writing for 1985 technology. Each byte covers about 64us on a spinning disc, but disc drives of the era would often specify running RPM variation of "less than +-1.5%". At 300rpm, +-1.5% is +-3ms per rotation! In my experience, drives are much much better than that but still! How did 1985 technology cope??

One further note, on the WD1770 vs. Intel 8271 disc controllers. We've focused our analysis on the protection as seen by the WD1770. This disc also loads fine with an Intel 8271 controller. It uses a completely different code path to indirectly calculate track length in this case. It works because the padding bytes up to the end of the track are all 0x00 instead of the usual 0xFF. An "over-read" past the final sector on the track is used, and track length is detected by the change from reading 0x00 to 0xFF.

The genius twist

And then we realize the genius of this scheme. What if they didn't write the tracks with extreme precision, because the technology wasn't there? What if they just wrote the track any-old-how and then examined the result? I think this what they did:

  • Format the disc. (That's it. Just format the disc normally and you've created a fiendishly protected disc.)
  • Read the lengths of tracks 1-8 of the disc you just wrote.
  • On a per-disc basis, generate, obfuscate and write the required table of expected track lengths into the loader on track 9.
What a trick! These discs are savagely hard to copy, even with dedicated hardware of the era. In fact, this trick is so clever that it requires us to revisit the fundamentals of disc protection. Here's our earlier statement:

  • The disc is easy to read but hard to write.
But the Western Security Ltd. protection is quite easy to write -- you just format a disc. So perhaps we should say:
  • The disc is easy to read but hard to recreate.
The Western Security protection is easy to write, easy to read but very hard to copy. One of the advantages of such a protection is that you could use a home computer to write it cheaply if the economics worked for you. That said, many discs I've seen appear to have duplicator markers typically found with expensive disc duplicating machines.

How can we support our theory? One way is to find a second disc of the same title, The Wizard's Revenge, and see if it differs at all. I did find one, and it does. My disc as per above:

Track 0 sectors 10 length 3124 fixups 1 CRC32 2E6B86E9
Track 1 sectors 10 length 3122 fixups 0 CRC32 37E30EC8
Track 2 sectors 10 length 3125 fixups 1 CRC32 F7BDE89B
Track 3 sectors 10 length 3123 fixups 0 CRC32 BB1E32C3
Track 4 sectors 10 length 3122 fixups 1 CRC32 2EC84AF1
Track 5 sectors 10 length 3123 fixups 0 CRC32 58FD732B
Track 6 sectors 10 length 3125 fixups 0 CRC32 43416F5A
Track 7 sectors 10 length 3125 fixups 1 CRC32 B3D8AB10
Track 8 sectors 10 length 3122 fixups 1 CRC32 8D02AC32
Track 9 sectors 10 length 3125 fixups 1 CRC32 75B1F57B
Track 10 sectors 10 length 3123 fixups 1 CRC32 D2B0A1EF
[...]

Additional captured disc:

Track 0 sectors 10 length 3126 fixups 1 CRC32 2E6B86E9
Track 1 sectors 10 length 3129 fixups 1 CRC32 37E30EC8
Track 2 sectors 10 length 3127 fixups 1 CRC32 F7BDE89B
Track 3 sectors 10 length 3127 fixups 1 CRC32 BB1E32C3
Track 4 sectors 10 length 3127 fixups 1 CRC32 2EC84AF1
Track 5 sectors 10 length 3129 fixups 1 CRC32 58FD732B
Track 6 sectors 10 length 3129 fixups 1 CRC32 43416F5A
Track 7 sectors 10 length 3128 fixups 1 CRC32 B3D8AB10
Track 8 sectors 10 length 3128 fixups 1 CRC32 8D02AC32
Track 9 sectors 10 length 3127 fixups 1 CRC32 1F5ABD44
Track 10 sectors 10 length 3128 fixups 0 CRC32 D2B0A1EF
[...]

As can be seen, the sector data (covered by the CRC32s) of the discs is the same except for track 9, which is where the table of expected track lengths resides. And the track lengths themselves are different. In both cases there's track length wobble and in the case of the latter discs, the tracks are all a little longer. This means that the drive that formatted the disc was running at a slightly slower RPM, enabling more bytes to be written in a single revolution.

And there's a second way to support the theory. I just asked Simon Hosler, who wrote a lot of Sherston Software games back in the 1980s, if he remembered anything. He recalls,

"Western Security –  this wasn’t my system but I think it worked like this... It works because each disk drive runs at slightly different speeds. So when a drive creates a track on the disk – after it has added all of the headers and sectors etc, it has a bit of space to fill up, so adds a few extra ‘fill in’ bits that do nothing. The number of fill in bits will depend on the exact speed of the original formatting drive. So will be different if it was copied. I remember this system became a pain because disk drives were changing (?) all the time. [...] the system was called 'Fingerprinting'"

There remains the mystery of how a commercial duplicator could handle making these discs. One thing I've come across is references to a "Freeform" script [link], used in conjunction with Trace duplicators. I haven't been able to find a manual for this scripting language. There are some strange fragments in the memory of the loader after de-obfuscation; could it be related?

3F80: 54 70 00 41 44 44 20 20 20 20 20 72 00 4D 4F 56  Tp.ADD     r.MOV
3F90: 45 54 4F 20 20 74 00 55 4E 49 54 20 20 20 20 75  ETO  t.UNIT    u
3FA0: 00 54 52 41 43 4B 20 20 20 76 00 53 45 43 54 4F  .TRACK   v.SECTO
3FB0: 52 20 20 77 00 54 4F 50 54 52 41 43 4B 78 00 55  R  w.TOPTRACKx.U

Recreating Western Security style protected discs

I happen to have a couple of very different disc drives, so I thought I'd try them out. I formatted a disc in each drive, then had a look at the track lengths (in bytes) read back by a WD1772 disc controller in a real BBC Micro model B.

Contender #1: my Chinon F-051MD drive. It is older, 40 track and single-sided only. The pinch-to-close style door mechanism later fell out of fashion (was it less reliable?). Another way to date this is the large DIP-style main chip, a NEC D8048 which is an Intel 8048 clone! (There's a ROM here to extract one day!)

Contender #2: not my Mitsuibish MF504C, but pretty similar. A much newer drive, 80 track and double sided. Notice the newer chips, cleaner wiring and simpler clamp mechanism. Not visible, but it also has a smaller (next gen?) stepper motor.

Track lengths from Chinon F-051MD:

[...]
Track 1 sectors 10 length 3137 fixups 1 CRC32 67F0950E
Track 2 sectors 10 length 3138 fixups 1 CRC32 67F0950E
Track 3 sectors 10 length 3138 fixups 1 CRC32 67F0950E
Track 4 sectors 10 length 3139 fixups 1 CRC32 67F0950E
Track 5 sectors 10 length 3138 fixups 1 CRC32 67F0950E
Track 6 sectors 10 length 3140 fixups 1 CRC32 67F0950E
Track 7 sectors 10 length 3140 fixups 1 CRC32 67F0950E
Track 8 sectors 10 length 3139 fixups 1 CRC32 67F0950E
Track 9 sectors 10 length 3140 fixups 1 CRC32 67F0950E
Track 10 sectors 10 length 3140 fixups 1 CRC32 67F0950E
[...]

Track lengths from Mitsubishi MF504C:

[...]
Track 1 sectors 10 length 3119 fixups 1 CRC32 67F0950E
Track 2 sectors 10 length 3119 fixups 1 CRC32 67F0950E
Track 3 sectors 10 length 3118 fixups 0 CRC32 67F0950E
Track 4 sectors 10 length 3119 fixups 1 CRC32 67F0950E
Track 5 sectors 10 length 3119 fixups 1 CRC32 67F0950E
Track 6 sectors 10 length 3119 fixups 1 CRC32 67F0950E
Track 7 sectors 10 length 3119 fixups 1 CRC32 67F0950E
Track 8 sectors 10 length 3119 fixups 1 CRC32 67F0950E
Track 9 sectors 10 length 3118 fixups 1 CRC32 67F0950E
Track 10 sectors 10 length 3119 fixups 1 CRC32 67F0950E
[...]

These two drives definitely have unique fingerprints! The older drive has a bit more variance / wobble in individual track lengths. In fact, it seems to fit more bytes per track on the later sectors -- most of the latter tracks are length 3142. The older drive also generally fits 20 or so more bytes per track. This means it is spinning a little slower. By contrast, the newer drive runs like clockwork with minimal track-to-track variance. This might make it a little less suitable for generating varied per-disc fingerprints, but copying the disc on home computing hardware would still be tricky -- the machine used would need to have a similarly precise drive, and the drive would also need to be running at the exact same speed as the drive that made the original disc.

One last look at track lengths, this time from a Phantom Combat game disc, by Doctor Soft. This disc uses Western Security Ltd. protection:

Nothing quite says "we overcooked the disc protection" like a warning sticker about compatibility (left, middle). The sticker states that only Acorn disc ROMs are supported. Unfortunately, non-Acorn disc ROMs were extremely common.

[...]
Track 1 sectors 10 length 3152 fixups 0 CRC32 67F0950E
Track 2 sectors 10 length 3152 fixups 1 CRC32 67F0950E
Track 3 sectors 10 length 3152 fixups 0 CRC32 67F0950E
Track 4 sectors 10 length 3152 fixups 1 CRC32 67F0950E
Track 5 sectors 10 length 3151 fixups 0 CRC32 67F0950E
Track 6 sectors 10 length 3152 fixups 0 CRC32 67F0950E
Track 7 sectors 10 length 3150 fixups 0 CRC32 67F0950E
Track 8 sectors 10 length 3151 fixups 0 CRC32 67F0950E
[...]

This is disc protection hidden in plain sight. These 8 protected tracks are in fact 100% default format tracks (CRC32 67F0950E). There's the correct number of sectors and there are no deleted sectors. discbeast would show these tracks as plain green, aka. move on, nothing to see here!
There's a bit of track length wobble from track to track, but the tracks are also unusually long (the ideal is 3125 bytes). It is not clear whether the tracks were deliberately laid down a little long, but it definitely helps the disc protection. Any disc drive used to attempt to re-create these tracks is unlikely to be spinning at that speed because it is 0.8% slow and the disc drive motor technology is generally better than that. It's interesting to note that simple long track protection is just another variant in the Western Security scheme.
This disc was probably written from a BBC Micro itself, rather than a fancy duplication machine, as evidenced by the lack of a duplicator marker and by the fact the game data sectors show evidence of being written sector-at-a-time, not track-at-a-time. Who knows, maybe there was a deliberate attempt to find a slow drive; some older drives even have a twiddleable variable resistor to vary motor speed.

Conclusion


We've reviewed the most powerful BBC Micro model B disc protection scheme I found, across an audit of most of the copy protected discs released for the machine. It's clever in that you don't need specialized hardware to create the disc, or read the disc. But you're going to struggle to duplicate the disc.
My suspicion is that even modern hardware -- a Greaseweazle perhaps -- might struggle to hit byte-perfect precision, unless it is paired with a high quality disc drive that rotates very consistently (such as my Mitsubishi MF504C).
Talking again to Simon, as far as he's aware, this was the work of George Keeling, who I'd love to contact if only to tell him I think this work was awesome.

Monday, November 16, 2020

Reverse engineering a forgotten 1970s Intel dual core beast: 8271, a new ISA

"As I recall, those two chips were fairly large. And fairly late -- to the marketplace. We had lots of issues with them. [...] Sometimes the elegant solution isn't the best solution." -- Dave House, digressing to the 8271 during "Oral History Panel on the Development and Promotion of the Intel 8080 Microprocessor" [link], April 26th 2007, Computer History Museum, Mountain View, California.

Introduction

Around 1977, Intel released a floppy disc controller (FDC) chip called the 8271. This controller isn't particularly well known. It was mainly used in business computers and storage solutions, but its one breakthrough into the consumer space was with the BBC Micro, a UK-centric computer released in 1981.

Intel 8271 (left) in an issue 3 BBC Micro model B

There are very few easily discovered details about this chip online, aside from the useful datasheet. This, combined with increasing observations of strange behavior, make the chip a bit of an enigma. My interest in the chip was piqued when I accidentally triggered a wild test mode that managed to corrupt one of my floppy discs even though the write protect tab was present! You can read about that here:

A wild bug: 1970s Intel 8271 disc chip ate my data!

Can we reverse engineer a detailed understanding of how it works? What wonders will we find?

Credits

The work described here represents the efforts of a virtual team that came together in an impromptu way to investigate the chip. There are many critical players, and special thanks go to:

  • Nigel Barnes. Bridged the MAME community and BBC Micro community, locating someone with chip decapping skills.
  • BeebMaster. Provided a couple of sacrificial 8271 chips for decapping.
  • Sean Riddle. Decapped the chips and provided beautiful hi-resolution images.
  • ZXGuesser, Diminished. Hardware level reverse engineering, including accurate extraction of ROM bits.
  • Ken Shirriff. Provided notes on silicon macro structures, and located historic documents of great utility.
  • Rich Talbot-Watkins, Chris Evans (me). Calculating the ISA, disassembling the ROM.

The beauty of the beast: recap and decap

So to recap: we had a few indicators that this could be a very interesting chip. These range from the crazy test mode we found above, to the fact the data sheet hints at a large number of internal registers, with only a few documented.

And, to decap:

(See references at the end for very high resolution shots)

This a complicated chip for an FDC! There are a large number of large structures present, densely packed. To illustrate the point, we can compare this chip with the heart of the BBC Micro -- a venerable, legendary 6502, as used in iconic 80s machines and consoles such as the Apple ][, Commodore 64 and NES:

8271 left, 6502 right

Not only is the 8271 larger than the main CPU by a significant amount, it also cost more by all accounts:

"In Acorn's wisdom, they had chosen the Intel 8271 disk controller for the BBC Micro - this controller was probably obsolete even before the BBC Micro was launched. The Acorn disk upgrade comprised the 8271, a handful of standard TTL ICs, an Acorn DFS ROM, and the disk manual. The ICs plugged into unpopulated sockets on the motherboard.

A few hardy folks tried sourcing the parts separately - which was fine, except that an 8271 tended to cost about £109 on its own, if you could find one..." -- [source link]

By contrast, the 6502 allegedly cost around $7.45 in 1981, which is a huge difference factor.

Getting the ROM bits

From a quick eyeball of the 8271 die, there are plenty of PLA-like structures, but perhaps most promisingly, a large rectangle of ROM in the lower left. Right away, we're thinking that this could indeed be a general purpose microcontroller CPU if there's a ROM program. Initial focus fell immediately to extracting the ROM.

ZXGuesser and Diminished put in a Herculean effort, transcribing the ROM bits first by hand and later with some tooling assist and cross-checking. In case you're wondering what transcribing ROM bits by hand looks like, it looks a little like this:

Look closely; those cyan annotations are 0s and 1s!

The ROM matrix is 64x108 bit cells, for a ROM size of 864 bytes.

Once you've gotten the ROM bits, you still have the challenge of assembling them together into a correctly sequenced byte stream. This isn't always as easy as you might think, and is particularly rough when you don't have a reliable way of telling if the extracted bytes are ok, as is the case here. During my research, I found:

Fortunately, the ROM reversers also looked at the row and column circuitry connecting the ROM, and gave a fairly robust opinion that the ROM bits / bytes are:

  • Left-to-right, top-to-bottom.
  • Bits inverted relative to the initial decode in the image above.
  • Bits MSB first.
  • Bytes build from 1 bit per linear 8 bit group.

This gives the first bytes of the ROM as the following:


This happens to be a correct decode, although we couldn't be sure for some time.

Architecture hints

As Rich and myself engaged in efforts to try and disassemble the ROM, without any prior knowledge of the Instruction Set Architecture (ISA), I was fortunate enough to have a conversation with Ken Shirriff (righto.com) about this interesting chip. Against all odds, he found a detailed conference presentation abstract from 1977 [link to copy of full abstract].

This image from the abstract, will look familiar (imagine +90 degrees rotated).


The rest of the abstract is a gold mine. The presentation was titled "A Dual Processor Serial Data Controller Chip" and begins:

"A DUAL PROCESSOR microprogrammable chip that implements a specialized architecture for high-speed serial data controllers will be described. The chip measures 218 mils by 244 mils and contains 22,000 transistors..."

22,000 transistors!! We knew this was a bit of a chonker, but for reference, the 6502 has 3,218 transistors; the 8080 6,000 transistors and the 8086 29,000 transistors. Yes, the 8271 is not that far off an 8086 in terms of transistor count.

This instructive table and diagram are also from the abstract:

This confirms our suspicions that we are dealing with a general purpose CPU, coupled with some acceleration for I/O. The general purpose CPU has the features you'd expect, including a PC, stack, ALU, accumulator, registers, and access to a bus. Further detail in the abstract includes "32 eight-bit registers" and "four-level stack".

The presence of a a bit processor (a co-processor if you like) with 250ns cycle time resolves the elephant in the room with the "general CPU" theory. A general CPU of the era probably wouldn't be fast enough to calculate CRC16 at the disc data rate. However, a specialized PLA-driven bit engine bolted on to the side will do just fine.

Finally on this abstract, we see that it states "To date, two distinct controllers have been microprogrammed: a floppy disk controller and a synchronous data link controller (SDLC)". Well, we've found the FDC, the 8271. Intel's SDLC from the era was the 8273. Enter Sean Riddle once more -- what a star -- and another sacrificed chip or two later, we have our result:

8271 left, 8273 right

Ken also found a patent relating to this dual core design! It's US4152761. [link to a more complete version]. There's a lot of useful architectural detail in the patent, including this interesting summary of key components.

From US4,152,761

As can be seen, there's additional complexity described here that falls outside what we'd expect from a traditional CPU. There's dispatcher requests, priority resolution, a "case" and an "address" register in addition to the program counter and instruction register. This relates to some form of scheduling, which we'll encounter later.

Finally, here's a diagram (courtesy of Ken). It's an early estimation of how the architecture we've seen in the the documents so far might map to the silicon. As a dual processor behemoth, note how there's two ALUs, two sets of registers, lots of PLAs (including a couple to the left and right of the "control" label), and plenty of patches of silicon with unknown function.


From bits to bytes to an Instruction Set Architecture (ISA)

With an overview of the architecture, we have a better idea of what sort of opcodes we might find in the instruction set. The more pieces of the puzzle we have in our heads, the better chance we have of making abstract connections in our attempt to solve a problem with a lot of moving parts.

Initial attempts to disassemble the ROM centered around the theory that the CPU core might be based on the Intel MCS-48 microcontroller series. This series includes the well-known 8048 and has several variants such as the slightly cut-down 8020 (less I/O lines). And why wouldn't this core be based on a further cut-down 8020? It just fits too well: 1K ROM, 64 registers, 13 I/O lines. The timing works well too: the MCS-48 series first launched in 1976, making the core available before the release of the 8271. So the theory went, you'd be crazy as an 1970s Intel employee to not just walk down the corridor and raid the parts bin of your colleagues in order to get a headstart.

The MCS-48 theory turned out to fit extremely well.... but be false. No amount of ROM bit / byte wrangling led to sensible MCS-48 disassemblies.

Back to the drawing board, we looked again at our "probably correct" ROM decode and analyzed it for patterns we'd expect to find in an 8271 ROM, based on our understanding of how the controller works, and how disc recording works in general. Here's a section we found enlightening:

Most significant is the appearance of the constants $FE + $C7 (middle box), then $E5 + $FF (lower box) in a the same context. These constants are the data byte + clocks byte we'd expect to see in the format routine for FM (single density) formatted discs. $FE + $C7 is the sector header marker, and $E5 + $FF is the default fill byte for freshly formatted sectors. As a bonus, we speculated that $E9 (appearing four times in a row) could be a shift left or right opcode, with $FD perhaps being RET. All in all, some strong circumstantial evidence for a correct decode.

After the MCS-48 opcode list failed to match our ROM, we spent a long time trying to derive an alternate instruction set.

But it's hard; it's a bit like one of those jigsaw puzzles where every piece is the exact same color. It's very hard to find a start as there are so many different possibilities to try. Research is too often presented as a neatly packaged result, with no mention of the struggle. I think this contributes to discouraging newcomers. So make no mistake: this was a struggle; it went on for some time; there was swearing; and the characteristics leading to success were perseverance and grit as opposed to any particular technical ability.

The breakthrough was catalyzed when the hardware investigations got a good read on the wiring and content of one of the instruction PLAs associated with the byte processor. The PLA in question is actually the darker rectangular block to the left of the "control" label in the image above. It is populated like this:

Thanks to Diminished!

With the instruction PLAs decoded, it would be possible to trace the effects of each 8-bit opcode by following activation lines to the registers, stack, ALU, etc. However, we were able to hit our breakthrough with only opcode ranges, provided by this PLA. Of specific interest is these ranges from Rich:

1111 1100   FC
1111 1101   FD
[...]
0010 xxxx   20-2F
[...]
0011 xxxx   30-3F
[...]
001x xxxx   20-3F

The specialized $FC and $FD match our theory of CALL and RET, but more pivotal is the range of $20-$3F, split into 2x 16 wide blocks. Immediately, I speculated this could be "move register to accumulator" and "move accumulator to register". This would turn out to be correct. Furthermore, I noted that opcode $0? was common before $2? or $3?, e.g.

05D:   04 22 03 35

This led to the theory that these are register bank selection opcodes. Similar to the MCS-48, the theory is that there aren't enough opcodes for all operations to be able to reference all 32 registers, so there must be a bank select. This also would turn out to be correct. In fact, with these pieces, the jigsaw started to fall in place, and fall into place faster and faster.

Some ROM code examples

For these code examples, bear in mind that the Instruction Set Architecture (ISA) presented here hasn't seen public light of day as far as we know. We've had to invent our own assembly mnemonics, although they're designed to be familiar to anyone familiar with assembly languages in general. Some specific notes:

  • SEL RB is SELect Register Bank, and provides the base index for register access (multiply by 8 to get actual index).
  • Note that all register references are by index, not register number. A register number can be calculated with the index and register bank.

1) READ SPECIAL REGISTER

.command_READ_SPECIAL_REGISTER
32E   00 SEL RB 0
32F   27 MOV A, I7 ; R7 ($07) (param 1)
330   30 MOV I0, A ; R0 ($00) = register index
331   F8 MOV A, [I0] ; read special register value to A
332   02 SEL RB 2
333   36 MOV I6, A ; R22 ($16) (ext RESULT) = A
334   0E SEL RB 14
335   EE SYS 2, RB ; JMP (2,E,0) => $288, .post_command_tidy_up

The READ SPECIAL REGISTER command is one of the simplest. As suspected, "special register" externally is just "index into the 32 registers" internally. We can also note plenty of interesting things:

  • The command setup code, which we'll see in a bit, writes parameters to R7 downwards.
  • Indirect reads and writes are done via special opcodes that indirect through I0 (typically but not always R0), as per the MCS-48 architecture.
  • Some of the internal registers interact with the external bus registers used to interact with the host CPU. R22 is where you write values for them to appear in "ext RESULT", which is read with BBC Micro's 6502 at memory location $FE81.
  • The "register bank" concept appears to be re-used to provide a lookup index to the SYS 2, RB opcode. You might consider this a minor kludge.
  • In order to jump from page 3 ($3??) to page 2 ($2??) at the end, a special SYS opcode is required. Normal JMP / CALL instructions only have an 8-bit operand and can only jump to the current page. SYS opcodes look up a 10-bit PC from a hard coded table.
  • The command exits by jumping to a common exit routine, .post_command_tidy_up. This routine disables most of the chip, including the bit processor, and events. This will be important later.

2) SPECIFY

.command_SPECIFY
066   00 SEL RB 0
067   27 MOV A, I7 ; R7 ($07) (param 1)
068   30 MOV I0, A ; R0 ($00) = destination index
069   F1 03 MOV I1, #$03 ; R1 ($01) = count of 3 extra parameters
06B   14 YIELDTO 4 ; seems to set PARAM callback to
; .wakeup_PARAM_4_SPECIFY, then YIELD
; ** entry point (3, segment 9, routine 4 5 6 7 C D E F)
.wakeup_PARAM_4_SPECIFY
06C   03 SEL RB 3
06D   2E MOV A, IE ; R30 ($1E) (ext PARAM)
06E   00 SEL RB 0
06F   E8 MOV [I0], A
070   00 SEL RB 0
071   80 INC I0
072   A1 DEC I1 ; R1 ($01), decrement count
073   8A 8C BZ $08C                 ; branch if done
; lands at SEL RB 14, JMP RB
; which jumps to .post_command_tidy_up
075   FF YIELD

SPECIFY is also a simple command. It's not strictly a necessary command because it has the same effect as 3 WRITE SPECIAL REGISTER commands to sequential register numbers. It's unclear why it exists, given the space crunch in the ROM and on the silicon, but the datasheet does describe initializing the 8271 using a few of calls to this. Other items of note:
  • SPECIFY accepts and applies the register writes immediately, YIELDing (sleeping / idling) between writes and waiting for the host CPU to supply the next register. This means the SPECIFY command might be partially completed for some time, and never fully completed, depending on how we program the chip. This will be significant later.
  • SPECIFY internally uses R0 as a destination index to write register values, and R1 as a count to complete. This makes it an ideal candidate to do some black box testing to confirm the chip behaves the same as our source code disassembly. Specifically, I tried and confirmed:
    • Pass the first parameter to SPECIFY as 0, indicating we are writing register values starting at R0. Note that as per the code, R0 is used internally by SPECIFY.
    • Pass the second parameter as $22. This will get written to R0 (because R0 currently points to R0) and incremented, leaving $23 in R0. This has "corrupted" R0.
    • Pass the third parameter as $48. This will get written to MMIO register R35 ($23), and should turn on the drive motor.
    • Do not pass a fourth parameter. Despite not “completing” the full four parameters of the command, we expect the writing of the third parameter to have the effect as just described. It does on real hardware.

3) External command register handling

.wakeup_COMMAND
014   02 SEL RB 2
015   F6 00 MOV I6, #$00 ; R22 ($16) = $00 (ext RESULT)
017   CF BF AND I7, #$BF ; R23 ($17) (ext STATUS) !CMD_FULL
019   00 SEL RB 0
01A   F5 01 MOV I5, #$01 ; R5 ($05) = $01 (param 3 default $01)
; That’s 1x 128 byte sector in many cases.
01C   F4 01 MOV I4, #$01 ; R4 ($04) = $01 (param 4 default $01)
01E   03 SEL RB 3
01F   98 05 MOV A, #$05 ; 5 parameters expected
021   6F 18 27 TBZ IF, #$18, $027 ; R31 ($1F) (ext CMD), jump if 5 param                                                command
; matches SCAN, FORMAT
024   2F MOV A, IF ; R31 ($1F) (ext CMD)
025   9C 03 AND A, #$03 ; (CMD & 3) parameters expected
027   00 SEL RB 0
028   31 MOV I1, A ; R1 ($01) = A = parameters expected
029   8A 3D BZ $03D     ; if no parameters, start command
02B   F0 07 MOV I0, #$07 ; R0 ($00) = $07 (put parameters at $07 down)
02D   BC 01 TASK 4, 1 ; select .wakeup_PARAM_1_accept
; 3 9 (1, 3, 9, B) = $035
02F   FF YIELD

This is where to start reading if you want to trace the main entry point into the ROM. The 8271 byte processor wakes up here where the external host CPU writes to the external command register ($FE80 on the BBC Micro). And again some interesting things to note:
  • For commands other than SCAN and FORMAT, the number of parameters expected is actually encoding in to the low order bits of the command byte. Undoubtedly, this saves space in the ROM. This is another case where a simple test can confirm the behavior claimed by the ROM code:
    • Take a command that takes 1 parameter, e.g. READ SPECIAL REGISTER.
    • Increment the command byte and supply that to the 8271 instead.
    • We'd expect the 8271 to require 2 parameters to start the command, but then ignore the second parameter and behave as if the 1 parameter version was used. This is indeed observed on real hardware.
  • Some default parameter values are deployed. These are used in the commands the datasheet calls "128 Byte Single Record Format". Presumably it is again a win for ROM space savings.

4) Full ROM disassembly

The full ROM disassembly, as of time of writing, may be found here: [link]. This copy will remain static. It's mostly complete, and all of the "main" paths are fully traced, including SEEK, READ DATA, WRITE DATA and FORMAT TRACK.

Javascript on a (1970s!) chip


It is time to look at some of the unusual sounding instructions in the ISA:
  • YIELD. $FF. This instruction tells the processor to switch to running the highest priority dispatcher request, or (more likely) go idle if there's no dispatcher request active.
  • TASK t, r. $B8-$BF, 2 bytes. This instruction changes the callback routine for the specified task. Callback routines are specified as an integer that form part of a lookup key into a table of ROM addresses.
  • YIELDTO r. $10-$1F. Change the callback routine for the currently executing task and yield.
  • SYS 0, RB, A. $EC. Jump to the ROM address at key (0, RB, A) in the address PLA. RB is the current register bank from the SEL RB instruction, and A is the accumulator value.
  • SYS 1, RB, R. $ED. Jump to the ROM address at key (1, RB, R) in the address PLA. RB is the current register bank from the SEL RB instruction, and R is current routine value.
  • SYS 2, RB. $EE. Jump to the ROM address at key (2, RB, 0) in the address PLA. RB is the current register bank.
There are quite a few concepts introduced here. If you find them somewhat jumbled, you are not alone. A lot of things about the non-traditional parts of the byte processor CPU appear very ad-hoc to me. Let's look at a concrete example: the handler for the SPECIFY command. This is the .command_SPECIFY code in example chunk 2) above.

To get .command_SPECIFY to run, the host CPU provides the command $35 to the external command register, then 1 initial expected parameter to the external parameter register. The byte processor CPU wakes up at .wakeup_PARAM_1_accept. The context here is:
  • PC == $035
  • TASK == 4
  • SEGMENT == 9, ROUTINE == 1

The fact that writes to the external parameter register wake up task 4 is hardcoded. The PC executed is determined by the routine selected for this task, which was 1 at the time. This is used to create the key (3,9,1), which looks up the address $035 in the address PLA. The address PLA is hardcoded. The fact that task 4 keys lookups as (3,9,x) is hardcoded.

.command_SPECIFY ends with this, because it is the one command that gets executed and then decides it wants to yield to wait for 3 more parameters:

06B   14 YIELDTO 4 ; seems to set PARAM callback to
; .wakeup_PARAM_4_SPECIFY, then YIELD

This unusual instruction sets the callback routine for the current task to be 4. This means that the callback code for the next external parameter register write will be keyed (3,9,4) in the address PLA. That's PC $06C, aka. .wakeup_PARAM_4_SPECIFY.

A good mental model for this is Javascript. All execution is event based; the handler for a given event can be changed (by the handler itself, or an unrelated handler); there is no pre-emption until an explicit yield.

The known events wired in to the byte processor are:
  • 0: Not reversed at all. Appears to be related to the SCAN command, which isn't enabled on the BBC Micro due to lack of DMA.
  • 1: Bit processor event, e.g. found sync, lost sync, CRC error.
  • 2: Bit processor, read byte ready.
  • 3: Bit processor, write byte needed.
  • 4: External parmeter register written.
  • 5: appears unused. (Could be permanently connected to external command register?)
  • 6: Disc drive index pulse.
  • 7: appears unused.

On top of the "normal" Javascript model, there's also the concept of task priorities. These are not visible in the ISA and are presumably hardcoded. One instance this might come in handy in a floppy disc controller is when a disc drive index pulse (once per disc revolution) fires at the same time the bit processor needs a write byte. (It's not common to write across the index, but it could happen.) In this instance, providing the bit processor a data byte is a much more real-time task than handling an index pulse, so it should be handled first.

Yes, this is quite some complexity in addition to the complexity typical in a general purpose CPU. In particular, it provides many additional ways to slice and dice control flow handling beyond the standard JMP / CALL / RET. In fact -- and somewhat painfully -- the different control flow possibilities are mixed together! To briefly get a taste of the horrors, here's the command handler for many of the sector read operations:

.command_READ_DATA
.command_READ_DATA_AND_DEL_DATA
.command_VERIFY_DATA_AND_DEL_DATA
0A7   FC C9 CALL $0C9       ; .do_common_path_from_seek
        ; very gnarly because this CALL does a YIELD
        ; context on RET is from
                                ; .wakeup_BITPROC_EVENT_1_check_header_crc
                ; if we get a matching sector header,
                                ; the RET at $255 fires
0A9   BA 0B TASK 2, 11 ; select .wakeup_BITPROC_READ_10_11_count_GAP2
        ; (3,5,(10,11)) = $1B8
0AB   12 YIELDTO 2 ; select
                                ; .wakeup_BITPROC_EVENT_2_check_for_data_marker
; (3,4,2) = $2D0
; and YIELD

As can be seen, a CALL subroutine is doing a YIELD without unwinding the stack. This should probably be considered an anti-pattern? The stack is a shared resource across all tasks, so you'd better hope that two different tasks can't trip over each by doing this at the same time. There's also the complexity that when a CALL returns, various aspects of the execution environment (task, routine, RB, etc.) may well have changed.

All this makes the ROM very hard to follow for the read and write paths, even with a fully disassembled and commented ROM!

Unusual features of the ISA


1) Lack of symmetry

These are the opcodes we found relating to incrementing, decrementing, adding and subtracting:

$80-$83: INC Ix
$84-$87: ADC A, Ix
[...]
$A0-$A7: DEC Ix
[...]
$90-$93: ADC Ix does INC Ix if carry
$94-$97: SBB A, Ix

This is interesting because there is an asymmetry between which registers can be incremented vs. decremented. Some registers cannot be incremented at all, even with register banking, because banking operates in multiples of 8 and there are only 4 increment opcodes.

One theory is that the chip designers were short on silicon space, and trying to get away without add and subtract support in the ALU. There's only circumstantial evidence to support this, such as the hacked-out ranges in the opcode space; the code generally being add / subtract free except for the code implementing disc drive seek; no compare instruction found (yet -- there are unknowns); the use of XOR to do equality checks at $240 and $24F; and the presence of only direct equality checking or mask based checking for the most common branch opcodes (see item 2) directly below). But it's fun to speculate wildly isn't it?

2) 3-byte opcodes

An entire quarter(!) of the opcode space is devoted to four conditional branch instructions:

$40-$4F:        BEQ Ix, #imm, abs branch if Ix == imm
$50-$5F:        BNE Ix, #imm, abs branch if Ix != imm
$60-$6F:        TBZ Ix, #imm, abs test and branch if zero
$70-$7F:        TBNZ Ix, #imm, abs test and branch if not zero

These opcodes are all three bytes, which is a departure from Intel's other microcontrollers of the era. It's a lot of opcode space, but they can do a lot with a little, for example this piece of code from .do_seek.

13E   62 01 3E TBZ I2, #$01, $13E   ; MMIO R34 ($22) (drive in), wait until CNT/OPI
141   72 01 41 TBNZ I2, #$01, $141  ; MMIO R34 ($22) (drive in), wait until !CNT/OPI

For a certain (less common) seek mode, these 6 bytes are sufficient to busy loop while waiting for the drive to start the seek, then acknowledge finishing it.

These three byte opcodes were found by the previously mentioned "jigsaw" analogy. Once the command entry function was fully disassembled apart from just three bytes at $021, there was only one clean possibility that fitted the required behavior.

3) No timers, port I/O or IRQs

Presumably to save silicon, the byte processor CPU has done away with timers, dedicated port I/O instructions and IRQs. The MCS-48 has all of those.

They're not needed, though. Delays, such as the millisecond range delays for seek steps, are simply timed via busy loops.

4) Decide your own adventure

At time of writing, our opcode list is here: [link]. It is not complete. The one part of the 8271 ROM we have not disassembled, SCAN handling, uses at least opcodes $9A, $9B and $8C. (We've ignored SCAN because it needs DMA wired up, which is not the case in the BBC Micro application.) Furthermore, the PLAs suggest that other opcode ranges not seen in the 8271 ROM might do something. This includes $A8-$AF and $B0-$B7. Feel free to go and have a look!

Interface to the bit processor


Now that we've got the byte processor understood and disassembled, it's time to turn our attention to interface to the bit processor and its behavior. After all, it's the bit processor that is wired to the disc drive control and data lines!

Our initial assumption was that the byte processor CPU would, like the MCS-48, have some form of port I/O instructions. This turned out to be false. The reality is simpler: it uses MMIO (Memory Mapped Input/Output). This means that access to certain register indexes change registers in the bit processor instead of the byte processor. It's quite simple: 0-31 references the byte processor registers. And the range 32-39 references bit processor registers. For simplicity of decoding, the bit processor's 8 register references are mirrored 4 times across the range 32-63. The entire 0-63 range is then mirrored 4 times in the entire addressable range of 0-255.

Registers 0-63, read on a real BBC Micro, illustrating mirroring of the bit processor registers

Furthermore, the context of bit processor references in the byte processor code makes it clear that the bit processor interface is very simple. It is so simple we didn't feel the need to reverse engineer the bit processor further. The bit processor register assignments are as follows:
  • 0: control register, 4 bits
    • Bit 0 (0x01) => gather CRC (?)
    • Bit 1 (0x02) => finish CRC (?)
    • Bit 2 (0x04) => 1 for read, 0 for write
    • Bit 3 (0x08) => idle state
  • 1: status register, indicates sync data byte type, CRC error, etc.
  • 2: drive input, read for drive status
  • 3: drive output, controls step / write / etc. lines
  • 4: clocks output byte
  • 5: data output byte
  • 6: data input byte
  • 7: unused? (returns 0xFF, and byte processor ROM relies on this!)
The astute reader might ask: does this mean the bit processor can be programmed directly with the WRITE SPECIAL REGISTER command, since the bit processor registers are MMIO? And the answer is yes! There are severe caveats however:
  • The generic command entry code corrupts bit processor state on entry.
  • The generic command exit code resets the bit processor and associated callbacks on exit (but strangely and usefully not for WRITE SPECIAL REGISTER).
  • The latency of WRITE SPECIAL REGISTER is terrible.

That last bullet, the poor latency, is unfortunate. It's about 211us, which means there's zero chance for tricks like writing data to the disc a clocks + data byte at a time. The latency is large because the byte processor executes a large number of instructions on the command entry path, doing things like checking and caching drive status, as well as reading parameters one at a time, checking if the selected drive changed, etc.

Writing the drive output register directly is useful, though. I did this for my trick to write "weak bits" directly using the 8271! See my blog post about weak bits for more details. [link]

Writing the unwritable


Now that we know how this monster works, it is of course time to turn our attention to mischief. Can we make the chip do things it is "not supposed" to be able to do? Of course we can, and as usual, it involves disobeying the datasheet:


We are specifically going to disobey the sentence that states "Issuing a command while another command is in progress is illegal." We are now equipped to see exactly what happens if we do this, by reading and reasoning about the code. The callback called when the external command register is written gets on with its job without regard for whether a command is in progress, so side effects will be:
  • The internal command register itself is corrupted, i.e. mismatched with the currently executing command. It is referenced from time to time so this may be useful to us.
  • The illegal command will change internal register values, which may impact the execution of the current command.
  • The illegal command may change or disable callbacks, or reset or reconfigure the bit processor.
Taking these things into consideration, we are going to try and write an arbitrary FM bit stream. Achieving this will enable us to recreate copy protected disc surfaces that are not supposed to be writeable with the 8271. Remember, kids:


We're going to attempt this by separating out how we write the data bytes and how we write the special sector mark clock bytes. Writing a full track of data bytes is easy, but useless on its own. We can do this by:
  • Formatting a track with a single sector header.
  • Issue a WRITE DATA command for that sector, of size 8192 bytes. A track is only 3125 bytes (with a perfectly calibrated drive), so:
    • At the first wrap around, start writing the track of bytes we want.
    • 3125 or so bytes later, at the second wrap around, abort the command by resetting the controller.
This track of data bytes alone will be useless. If you try to read a sector from it, you will get error $18, aka. "sector not found". The special clock byte markers required to identify sector headers and sector data will be missing.

The way FM encoding works is very simple: it alternates clock bit, data bit, clock bit, data bit, ... every 4us. Normally all the clock bits are 1, to maintain timing and keep the drive electronics happy. But for a sector header or data marker, a few clock bits are left out so that the floppy disc controller can locate things in a bitstream where it is not sure where it is.

So, while the WRITE DATA command is running, we're going to sneak in some parallel SPECIFY commands to "corrupt" registers for the running WRITE DATA command, without disturbing it. Specifically, we're only going to corrupt the clocks byte register (MMIO R36) at the precise times necessary to write clocks byte values other than 0xFF. The overall operation looks like this:


By some miracle, this scheme does work, and it easily replicates discs that no BBC Micro copier back in the day could come close to. One example of a tough disc, for some reason, is The Sentinel [link]. The disc has a pretty label and packaging so of course, time for a gratuitous image:


There are plenty of quirks and caveats to get this working. To briefly note them for completeness:
  • The SPECIFY command was used instead of WRITE SPECIAL REGISTER to help with latency concerns. The SPECIFY command can be "primed" by passing the command and the first parameter, such that the second parameter (first register value to write) executes the write and executes with low latency at just the right moment.
  • The WRITE SPECIAL REGISTER command exits without resetting the bit processor or clearing all the I/O callbacks. Unfortunately, the same cannot be said for SPECIFY. Fortunately, we don't have to exit the SPECIFY command! It starts having the useful side effect of writing internal registers before it is complete. And then you can later restart it again and again, never completing it.
  • The act of command dispatch corrupts the MMIO clocks byte register! This is very unexpected but it uses MMIO R36 as a temporary storage location while it is calculating which disc drive (0 or 1) is selected, and whether it changed. It is worth looking at:
.parameters_complete_launch_command
03D   FC 5D CALL $05D ; .read_drive_status
03F   BC 00 TASK 4, 0 ; select .wakeup_PARAM_0_no_action
; (3,9,(0,2,8,A)) => $030
041   03 SEL RB 3
042   2F MOV A, IF ; R31 ($1F) (ext CMD)
043   CF 3C AND I7, #$3C ; R31 ($1F) (select bits + param count masked out)
045   9C C0 AND A, #$C0 ; A now contains drive select bits
047   04 SEL RB 4
048   34 MOV I4, A ; MMIO R36 ($24) (???) (temp storage?)
049   E3 XOR A, I3 ; MMIO R35 ($23) (drive out)
04A   9C C0 AND A, #$C0
04C   8A 53 BZ $053     ; only update drive out if select bits changed
; .command_dispatch
04E   98 20 MOV A, #$20 ; bit for side select (drive 0 vs. 2)
050   C3 AND A, I3 ; MMIO R35 ($23) (drive out)
051   D4 OR A, I4 ; MMIO R36 ($24) merge back select bits?
052   33 MOV I3, A ; MMIO R35 ($23) (drive out)
; only drive select bits and side select kept
; clears write enable, head load, and others
; matches data sheet

The line in orange is the one in question. I annotated it as iffy with ??? when I first disassembled it as it looked wacky. But, it's correct and a real machine exhibits the corrupted clocks precisely as described by the code. The corrupt clocks value, $40 if writing to drive 0, is shown in orange in the above diagram. Once you know it's there, and why, the easiest way to navigate around it is to arrange for the clocks corruption to land where it doesn't case problems. When paired with a $FF data byte, it doesn't create weak bits on the disc surface, and the controller actually seems to skip over it on read.
  • The write I/O path, by some stroke of good fortune, resets the clocks byte to $FF on every write I/O callback. This saves us a lot of trouble:
.wakeup_BITPROC_WRITE_3_set_clocks_and_count_host_bytes
318   F4 FF MOV I4, #$FF ; MMIO R36 ($24), standard clocks
[...]
  • Careful timing is needed. There's a pipeline of bytes from the external data register to the internal bit processor data byte register to the actual output pulse machinery. This needs to be accounted for.
Combined, these quirks convince me that I'd have never gotten this going, or gotten close, without a thoroughly reverse engineered and disassembled ROM.

Other successes and failures


Other things enabled or demonstrated based on careful reading of the ROM code:
  • beebjit now has a much more accurate 8271 driver.
  • Unexplained weirdness trying to read a sector with logical track id $FF has been explained as an integer overflow in the bad tracks handling.
  • Sectors on a non-zero physical track, but with a zero logical track id, were believed "impossible" to read. I'm able to read them by using the "command within command" trick. Once a READ DATA command is safely underway, including having processed the seek request and gone idle, it's possible to use WRITE SPECIAL REGISTER to change R7 (command param 1, which is the requested track) to zero, and have the sector read fine. The issue is that references to logical track 0 in the seek code are always treated as a mandate to find physical track 0. You need to bypass that.
Things not achieved:
  • Reading or writing MFM (double density) -- looks fundamentally impossible. Does not appear to be a capability in the hard-wired bit processor.
  • Executing arbitrary code on the chip. This is a shame, as the byte processor is a capable CPU! Who knows what we could use a little co-processor for? Things making this hard include:
    • Separation of code and data in separate address spaces.
    • No references to the stack possible outside of CALL and RET.
    • Indirect jump targets stored in a read-only PLA. (Microsoft CFG? :)
    • ... and yes, curiously, these accidental defenses all sound similar to defensive technologies investigated or deployed since year 2000. So, the 1970s called and...

Summary

The 8271 has exceeded our expectations! Where to start? It's a massive chip, encompassing dual cores and a Javascript like execution model. Remember, this was the mid-1970s. Its general purpose CPU runs an Intel instruction set architecture that I don't believe has been publicly documented until now. It's not every day we get the treat of a new Intel ISA.

We never got to the bottom of the crazy test mode that started this whole investigation. There's no trace of it in the byte processor ROM, so it must be handled by some other component on the silicon. Something to investigate for another day perhaps.

Having seen the complexity of the chip, I must confess to a feeling of surprise every time my BBC Micro successfully loads a disc.

Epilogue

Given the 8271 issues with cost, heat, supply chain, complexity, and lack of MFM, it wouldn't be surprising if Intel had had enough with the architecture behind the 8271 and 8273. Intel staggered bravely forward with the 8272, which introduced MFM support and... hang on, let's have a look at a decapitated one of those...

Left to right: Intel 8271, NEC D765, Intel 8272

This is very cheeky! The 8272 die may say "8272 (c) Intel 1979" but it is the same die as a NEC D765, stamped "NEC D765B". It looks like Intel may have licensed the NEC design. The NEC doesn't appear much smaller in terms of die size, but the layout looks much simpler and less busy. Bizarrely, Intel appears to have fabbed the 8272 much larger than the NEC. 

Extra references

  • StarDot forum thread where the investigation unfolded: [link]
  • Sean Riddle's decap page: [link]
  • beebjit's 8271 driver: [link]
  • 8271 tests and tools (warning: rough) for the BBC Micro: [link]
  • Live document for 8271 disassembly: [link]
  • Very high resolution die shots of the 8271 and 8273 (beware, will hang browsers!): [link]