Friday, July 10, 2020

Turning a £400 BBC Micro (1981) into a $40,000 disc writer (1987)


One of the most iconic floppy disc protection stories is Dungeon Master. Released in December 1987, Dungeon Master combined an advanced physical disc format (fuzzy bits) with sneaky protection checks embedded into the gameplay itself.

I strongly recommend this article which gives an excellent overview of floppy discs before launching into a very thorough overview of the fuzzy bits protection on the Atari ST Dungeon Master disc. There is also this excellent article which goes more into the stories surrounding the Dungeon Master protection. It includes a quote from one of the Dungeon Master authors:

"We had the advantage of owning the patent on a floppy-disk copy protection scheme that required a $40,000 specialized hardware device to write the disks. It was impossible to create a disk image without this hardware, and the hardware itself was out of production."

The reason for the hefty price tag is likely the timing precision required to create fuzzy bits reliably. The required precision is measured in nanoseconds at a time when much of the world was still chugging along in microseconds.

The BBC Micro had a 2MHz 6502 CPU and its simplest instructions took 2 cycles, which is 1 microsecond. Is there any hope of writing fuzzy bits with such a constraint? We will see how far we can get. This work is called project Oiled Otter.

To get us in the mood, and close out the introduction, here's a picture of a 3.5" floppy disc duplicator machine. I'm amused by how much it looks like a photocopier, except the hopper takes discs instead of paper! It looks like Advanced World Products might even still sell you one.

The BBC Micro user port

The BBC Micro was known for excellent expandability, including the so-called user port. This port is driven by a 6522 Versatile Interface Adapter running at 1MHz. The port itself offers 8 data pins and 2 control pins. There's a lot of control offered over these pins. The data pins can individually be configured as inputs or outputs and the output logic levels can be set as high or low as needed.

Why are we looking at the user port? Well, we're going to attempt to drive a disc drive directly from the user port. By removing the floppy disc controller from the equation, we hope to get it out of the way and achieve more direct control of the disc drive and the data streams to and from it.

Cable from the user port to the disc drive

The above image is my lovely cable connecting the user port to a disc drive. The connectors are standard and the wires between them are just jumper wires. In a genuine attempt to create something that could have been done simply "back in day", I'm not allowing any extra electronics.

The cable wiring is as follows:

The main take away I'd encourage from this wiring setup is that the disc drive interface is probably simpler than you thought. We can drive the disc drive and query its important state with just 8 pins. It's very simple. Say you want to spin up the drive, you set the logic level to low on PB0 and PB1. Say you want to wait for the disc to rotate to the start of the track, you query the logic level on PB6 until you see a high->low transition. Stepping is just setting a logic level for "step in" vs. "step out" and then pulsing low the "step" pin.

So far so good. We have basic control of the drive but have yet to write anything.

Electrical headaches

A brief digression into electrical headaches is warranted, because I hit some. Wiring together random pairs of components might sometimes work and sometimes it may need finessing. Here's a scope view of voltages initially seen on the W/DATA pin, at the drive:

Trying to write pulses to the drive at the FM rate of 250kHz

The logic 1 voltage is about 3.4v and the logic 0 voltage is about 1.5v. This is a significant problem! Acceptable TTL voltage levels are well defined:

"A TTL input signal is defined as 'low' when between 0 V and 0.8 V with respect to the ground terminal, and 'high' when between 2 V and VCC (5 V), and if a voltage signal ranging between 0.8 V and 2.0 V is sent into the input of a TTL gate, there is no certain response from the gate and therefore it is considered 'uncertain'"

A logic 0 voltage of 1.5v is considered "uncertain" and won't do at all. Indeed, the drive I'm using didn't write anything with the above signal.

The problem was resolved by removing the floppy cable termination resistor from the drive. Here's a picture of my drive and the socketed termination resistor array is ringed in red:

This resolves the voltage levels perfectly and everything then works. It appears than many of the BBC Micro's ports other than the disc port don't have the oomph to drive a terminated cable. But hang on -- presumably the resistor was there for a reason in the first place? Yes. Removing it has two caveats:
  • Watch your cable length. Longer cables are prone to signal degradation when they are unterminated.
  • Watch the voltage levels on unconnected wires. I was seeing a voltage level of 1.32v on the S/SEL (side select) pin at the drive. This is not ok as it is again in the TTL uncertain range. Where will the drive write data? Maybe the upper side, maybe the lower side. Or maybe neither or both! This was resolved by connecting every significant wire and driving it high or low as desired.

The quest for more bandwidth

The elephant in the room is: how do we provide a signal to the W/DATA pin? This is the "hard" pin. It is high bandwidth and has precise timing requirements. Let's stop dreaming about nanoseconds-level precision fuzzy bits for a moment and try and write basic FM pulses to the drive.

Most BBC Micro discs are FM (aka. DFM aka. single denisty) encoded at 250kHz. To write a FM track is actually pretty simple. Make sure drive is spinning and the write gate is open. Now, every 4 microseconds, either pulse W/DATA low then back to high (a 1 bit) or do not (a 0 bit). Most of the time, every other bit must be a 1 (a clock bit to maintain timing and synchronization).

Driving W/DATA using the CPU is hopeless. 4 microseconds is 8 CPU cycles -- certainly not enough to perhaps load a byte, shift it, and write a 0 then a 1 to the user port logic levels. A simple loop is likely to be 12+ microseconds which is way out of range. Instead, to hope to drive W/DATA fast enough, we must look to facilities of the 6522 VIA chip.

6522 VIA shift register

The most obvious candidate for our task is the shift register. The shift register is an 8-bit register. When in the appropriate mode, loading the shift register will cause the chip to sequentially emit the 8 bits over one of the pins of the user port. This is great -- the bits are being dealt with in parallel with the main CPU's execution, so the CPU is free to spend its time working out the next set of bits to start shifting.

Unfortunately, I was unable to get it to work. The only shift mode that has the potential to be fast enough is "shift out at system clock rate". The Western Design Center 6522 datasheet has a good diagram:

The VIA system clock is 1MHz so the shift clock will be a 500kHz signal and the resolution of the bits we can output is 250kHz. That's just enough. However, I did not work out how to get the shift clock running continuously and smoothly. Even with attempts at precise timing for shift register reloading, the scope output for the shift clock pin always looked like this:

It would appear that in the only shift mode fast enough to have a shot, reloading the shift register incurs a delay before shifting resumes. This is not suitable.

6522 VIA pulse output mode

A little-known feature of the 6522 is its "pulse output mode". Not every 6522 variant datasheet covers it but here is the tiny entry in the MOS Technology datasheet:

For once, the datasheet appears to accurately describe behavior. This mode is very interesting to us because one write to the VIA promises two distinct effects: an output pin will go to logic low, and then raise to logic high again 1 cycle (1 microsecond) later with no further effort from us. Because of that, it is tractable to use this to drive a 250kHz output signal. CPU is very tight; a loop is way out of the question but a linear 6502 code block can do it, e.g.:

        \ &70 points to &FE60, aka. user 6522 VIA ORB register.
        STA (&70),Y        \ 8 cycles, pulse output
        STA (&70),Y        \ 8 cycles, pulse output
        STA (&70),Y        \ 8 cycles, pulse output
        LDA (&70),Y        \ 8 cycles, do not pulse output
        STA (&70),Y        \ 8 cycles, pulse output

This works. There is just enough CPU horsepower to do it. 8 cycles is 4 microseconds, which is the shortest time between disc pulses.

Unfortunately it is extremely memory intensive. 2 bytes of linear 6502 code are required per FM encoded bit. One useful data bit is 2 FM bits because every other bit is a clock bit. A track is 3125 bytes so this requires 3125 * 8 * 2 * 2 == 100kB of linear code. The BBC Micro has 32kB of RAM so we are out of luck here. It's possible to write single (smaller) sectors including some powerful novel disc protection mechanisms. But we can't write large (1024 byte) sectors or full tracks. Both of these things are required to correctly write many discs. Furthermore, our timing resolution is 1 microsecond, which is not good enough to write many advanced protections and disc surfaces.

We can be happy that we got something working at all, given the constraints, but it isn't a fully satisfying solution.

Help from an unlikely output port

It's fortunate I hang out with clever people such as the Bitshifters Collective. (Please go and watch the latest demo, Evil Influences, immediately!) In a Slack conversation, Tom Seddon (author of the b2 emulator) suggested... using the output of the RGB port??

A video-to-disc cable... you don't see those on Amazon every day.

It's an idea I may have chuckled at initially but the more I reflected on it, the more it seemed that it might be possible. The BBC Micro uses the 6845 video chip for timing. Like the 6522, it's a quirky beast but at least the quirks are well understood on account of Bitshifters' demos which abuse the 6845. I also did some reversing work to make the jsbeeb emulator correctly emulate the Hitachi 6845. Let's see Oiled Otter working in this video and then describe what we saw:

[Inline video not showing? Direct link:]

It works by configuring the 6845 chip unusually. The 6845 is run at 1MHz and frame timing is set to be a single 32 microsecond / 32 byte scanline per "frame". While every frame is being output, the 6845's video memory registers are rewritten to fetch the next 32 bytes from a potentially different location. So every 32 microseconds, a different output pattern is selected from a table of output patterns. We have configured the RGB pins to emit 8 pixels per microsecond, which is 256 pixels per output pattern. This gives a huge number of different possible output patterns. But since we're writing a 32 microsecond chunk of disc FM encodings, only a few patterns make sense. In 32 microseconds, we can fit 8 FM pulses / bits. 4 bits will be clock bits, which are usually all 1. 4 bits will be data bits, of which there are just 16 combinations.

For example, if we're writing the data nibble 0x5, the 32 microsecond output needs to look like this:

The video data bytes would be 00FFFFFFFFFFFFFF00FFFFFF00FFFFFF00FFFFFFFFFFFFFF00FFFFFF00FFFFFF. The 1st, 2nd, 4th and 5th 00s are the clock bits. In between the clock bits is the data bit pattern 0101, or 0x5.

The CPU and memory constraints balance out nicely. In the end, it's a similar setup to as if the VIA shift register had worked: some little co-processor (the video chip) is busy emitting a bunch of FM bits while the CPU is freed up to load and provide the next pattern. The memory requirements are very reasonable. The table of necessary 32 microsecond output chunks fits comfortably within 1024 bytes, thanks to the use of a special linear addressing mode. The list of look up indexes for an entire track is around 12kB, so everything is fitting nicely in the BBC Micro's 32kB of RAM.

BBC Micro / 6845 quirks

6845 last character / column

Of course, getting this working didn't come without tripping over some "fun" quirks. The first of these is the 6845's quirk whereby it outputs black for the last character of every scanline. It is the bane of demo writers as well as disc pioneers, it seems. Here's a slide from a recent talk I gave showing the issue:

On the left, there's a demo effect plagued by vertical black stripes caused by the "last character / column is black" issue. Multiple 6845 scanlines were placed inside a single raster beam scanline and unfortunately, the black stripes are unwanted. When used to drive a disc, the effect is arguably worse: the black stripes represent unwanted pulses written to the disc.

On the right, there's an image of the solution: the waveform submitted to the disc is simply inverted. It is now ok for the last column to be black (shown in orange outline) because a zero value is always required there. Technically, this is a violation of some disc drives' timing requirements for pulsing W/DATA low. Here's a timing diagram for a drive from the era, a Mitsubishi M4852/M4853:

According to this diagram, logic 0 should be held for up to 2100ns. With an inverted waveform, 3000ns+ is to be expected. However, the drives I have only care about negative-going data pulses and not the hold time. This is not surprising. It would be possible to jiggle a few things around to avoid the 6845 quirk and also submit in-spec hold timings but this has not proven necessary so it has not been undertaken.

DRAM decay

DRAM decay is a horror. It is what happens when you fail to refresh DRAM. From the Wikipedia article on memory refresh:

"This process is conducted automatically in the background by the memory circuitry and is transparent to the user."

On a modern system yes, but on a BBC Micro not so much. On the BBC Micro, DRAM refresh is a side effect of the video subsystem. It relies on the fact that standard screen modes iterate across all DRAM rows within a short period of time. Perhaps you can see where this is going.... our special video mode used to output 32 microsecond frames is very much not a standard screen mode. It does not guarantee to visit all DRAM rows so DRAM decay will occur! DRAM decay is no joke. I lost various programs and disc contents due to unplanned DRAM decay. For a laugh, here's a BASIC program that inflicted DRAM decay on itself for a fraction of a second:

The bad news with DRAM decay is that if you are bitten by it, you can easily lose data.

The good news with DRAM decay is that if you are expecting it, you can usually easily work around it. In the case of the Oiled Otter, there are various critical loops with the 6845 video chip in an unusual state. For each of these, a manual incrementing memory fetch is worked in to maintain DRAM refresh.

Capabilities unlocked

Now that we have a working disc writing system, bypassing the floppy disc controller, what can we do with it? We've already demonstrated the ability to write arbitrary FM encoded discs in the video above. 

But we have gotten very lucky along this journey. Thanks to VIA shift register's deficiencies forcing us to find the video output pin solution, we have access to much finer grained timing resolution on the W/DATA pin. We're using the BBC Micro MODE4 which uses an 8MHz pixel clock. This means we can output black or white pixels every 125ns, triggering a write pulse with 125ns granularity. If we wanted to spend a little extra memory (which we do have) on larger tables, we could use MODE0 which uses a 16MHz pixel clock, affording 62.5ns resolution. 125ns has proved just fine for all tested disc protections, but it's a nice feeling to have a little bit left in the tank.

Long track protection

My favorite disc protection is long track protection. It was popular in the Amiga days. I do not believe it was ever used on the BBC Micro. I like long track protection because it is very fundamental: the floppy disc controller has a broad tolerance for differing read speeds (because disc drives rotate at differing speeds) but it only writes at the one correct speed.

An advanced long track protection is to write two sectors on the same track with one recorded at a faster rate. The copy protection check is to compare the time taken to read the two sectors. The sector recorded at the faster rate should read measurably and significantly faster.

Can Oiled Otter write such a track? Yes, fairly easily. Given the 125ns output resolution, it's easy to create a few output table entries that are like the normal ones, but with 125ns shaved off per 1 microsecond. Here is a video of creating long track protection and checking how it reads back:

[Inline video not showing? Direct link:]

Fuzzy bits protection

We are probably overdue to return to where we started: fuzzy bits protection. Can Oiled Otter create fuzzy bits on 1981 tech? Let's give it a go. Here's an image of the results of reading a sector a couple of times after it was written with Oiled Otter's FUZZ command.

The FUZZ command writes the 0x8 data nibble, with the data bit progressively being delayed by 125ns increments. This is similar to the description of how the Dungeon Master fuzzy bits are written. As can be seen in the screenshot, the 0x88 data bytes soon start reading back incorrect and non-deterministically. But the variance isn't 100% random like weak bits -- the variance is whether the 0x8 bit is late enough to have a chance of being missed. If missed, you can still eyeball that there are patterns and themes to the madness.

The above results are actually the application of fuzzy bit principles to FM encoded data. In FM encoding, every data bit is interleaved with a clock bit. This results in the bleeding of clock bits in to the data stream on occasion (see the 0xFF bytes in the first run above -- they are likely clock bits). The Dungeon Master protection uses fuzzy bits in conjunction with MFM. This leads to a calmer situation where the fuzzy bit drifts between two valid data bit encodings and does not mess up the clock! Of course, Oiled Otter can write MFM, GCR or any encoding you can dream up. It's all just different protocols for our fundamental primitive of being able to send a pulse to the drive at any time, with good resolution.

For good measure, here's an oscilloscope view of the fuzzy bits on the disc. The peaks are somewhat irregular, and when two pulses are very close together (1 microsecond or so, far too close for any encoding standard), the strength of the flux reversal detected by the drive even starts to get weak.

Mission accomplished. We gained the ability to write disc pulses with 125ns granularity. This is perfectly sufficient to create advanced disc protections including long tracks, weak bits and fuzzy bits. Not bad for 1981 hardware with a 1 microsecond fastest instruction!


Rasz_pl said...

>I did not work out how to get the shift clock running continuously and smoothly. Even with attempts at precise timing for shift register reloading... reloading the shift register incurs a delay before shifting resumes.

This is a hardware bug , rather famously the reason for C64 having cripplingly slow (300 byte/s) floppy drive communication implemented thru bitbanging in software despite using fixed 6526 CIA. Previous VIC-20 used 6522 and floppy drives had to stay compatible

Dropping this compatibility and using 6526 CIA hardware shift register results in C128+1571 combo reaching ~5KB/s (as you would expect from 6526 clocked at 2MHz)

>video-to-disc cable

similar to video-to-antena concept :)

Unknown said...

We use the 1571 drive’s 6526 clocked at 2 MHz in the ZoomFloppy to transfer raw GCR bits. Rate is 500 Kbit/second.

Anonymous said...

Wonderful job on using the Beeb to the last drop. Thanks for the effort.