Monday, May 17, 2021

Recovering "lost" treasure-filled floppy discs with an oscilloscope

There are many good, modern solutions for reading data off old floppy discs and drives. Perhaps the best is the Greaseweazle: it's capable, open source, open hardware, inexpensive and has a vibrant and friendly community behind it. It connects directly to a floppy drive, replacing the floppy disc controller, and reads the disc in great detail. It can handle regular discs or any known copy protection without really breaking a sweat.

But what happens when the Greaseweazle reports data that is heavily corrupted and unreadable? Are we out of luck? What if the unreadable disc contains some historic treasure, such as source code for an iconic game? Do we have to shed a tear and move on?

I recently found myself, along with Phil Pemberton, in this situation. Given the unique nature of source code discs, and the potential for historical interest, we were determined to succeed...

A Greaseweazle F7 Plus. A very capable disc reading device, but it is limited by the information the disc drive sends to it.


How do discs and drives work anyway?

Back in the 1980s, your electronic devices often had outstanding user manuals and maintenance manuals, and this definitely applied to floppy disc drives. Aside from enabling you to service the devices you owned, you often got great information on how the device worked. Here is an extremely informative page from the TEC FB-50x technical manual:

This one page concisely describes the chain of electronic transforms from the disc drive magnetic read head, through to the READ DATA output pin that the disc drive offers to the Greaseweazle. As can be seen, the READ DATA pulses (digital pulses, but with analog timing between them) are a best guess approximation at where the analog waveform peaks exist in the voltages coming off the read head.

So conceptually, a disc drive's read operation is fairly simple. Magnetic flux reversals on the disc surface generate peaks in an analog voltage waveform. The drive outputs pulses corresponding to these peaks. It's up to the floppy disc controller to make sense of the timings between the pulses and turn these into data bits. There are several different encoding schemes used for turning pulse timings into data (and visa versa), such as MFM (used in PCs, Amiga, BBC Micro ADFS) and GCR (used in the Apple II and Commodore 64). The BBC Micro DFS discs we're dealing with have the simplest encoding of them all, FM.

The drive doesn't care at all about the encoding, as long as the time between the pulses is calibrated such that the magnetic reversals can be stored reliably on the physical disc surface.


The Greaseweazle view of some "unreadable" discs

Phil and I had the honor of trying to recover the source code to the iconic BBC Micro game Repton 3. We also received the source code discs for other games by the same author (Matthew Atkinson), including Tempest, The Living Daylights and U.I.M.

The recovery was not entirely smooth. Some of the discs had patches of moderate damage and a few of the discs had patches of heavy damage. Let's have a look at two of those, as seen by the Greaseweazle and rendered by HxC Floppy Emulator Software:

Track 22 of one of the Repton 3 source code discs.

Looking at the first disc above, we've got problems in the middle of one of the sectors on track 22 of the disc. The vertical green bars are the sectors. Unsurprisingly, the red one has the problem. Red means that there's a CRC error: the on-disc CRC doesn't match the calculated expected CRC.

Have a look at the two horizontal blank bands. These are actually "dot clouds" representing timings between pulses coming back from the disc drive. The tighter these bands are, the clearer the signal and the better condition the disc surface is in. The black bands above are ok -- not too ragged -- until they degenerate to almost random noise at the point of the error. This is a serious read failure because the faulty timings are all over the place. We can re-read this disc all we want, but the Greaseweazle will only see noise. Any data in that faulty region is lost to the Greaseweazle.

Out of curiosity, what is wrong with this disc?? Well, a quick inspection of the disc itself reveals the culprit: a dent!

This dent affects around 10 tracks, some severely.

A disc's surface is made of pretty robust and pliable plastic, so we did carefully flatten the dent out without scraping off any disc surface. Unfortunately, this did not improve the signal quality. One theory is that the disc was written after the dent was formed.

Track 0 of one of The Living Daylights source code discs.

Looking at the second disc above, we see this poor disc has suffered a terrible corruption! The first two physical sectors on track 0 are comprised largely of noise. This is one of the worst sector read attempts you are likely to see. An unformatted track looks worse, but not by a whole lot.

What is particularly curious is that the sector headers leading up to the sector data are perfectly intact. Before the grey noise clouds on the white background, you'll see a thin vertical strip of green. That's the sector header and they're 100% fine. So, we're not looking at a contiguous patch of disc damage. Instead, the write of the sector bodies appears to have written noise.  The question of "what happened here?" is definitely interesting so we'll revisit it later.


The analog hookup

So, we're dealing with discs that are sufficiently faulty as to be unreadable to a Greaseweazle. Is the data lost forever? To find out, we need to look at what's actually on the disc surface (as opposed to what the drive sends to the Greaseweazle). This image shows the setup we used to investigate:

My TEC FB-502 drive connected to a Greaseweazle and also connected to my Siglent SDS 1104X-E scope.

In this setup, we're still using the Greaseweazle, but only for control and not data. We use it to seek and to spin up the drive motor. The red, yellow and blue wires are connected to three of the drive's test points. The blue wire is the index signal, which pulses once per revolution of the disc. We use this to trigger the oscilloscope capture. The red and yellow wires correspond to the TP3 and TP4 test points referenced above in the diagram from the TEC technical manual. These are the post-amplified voltages coming off the disc drive read head and are the inverse of one another. The trick is to subtract one from the other to eliminate noise common to both signals.

The oscilloscope is a Siglent SDS 1104X-E. It's considered entry level but fortunately, modern tech is powerful compared to 1980s tech. It can sample an entire disc track (200ms of time) to its internal memory at a sample rate of 25Msamples/s, whereas the on-disc bit rate is 250kbit/s. It has voltage sensitivity down to 500uV/div, which dwarfs the a typical signal strength of 400mV peak-to-peak.

Looking on the scope screen, the top half represents an analog capture of an entire track. There's not much to see other than a couple of cyan blips -- those are the index pulses denoting the start of the track revolution. The bottom half is a zoomed-in view and shows the analog view of actual stored bits on the disc surface. To the left, there is 4us between each peak, and on the right, 8us between each peak. We'll get into the encoding later, but a couple of 4us peaks typically represents a "1" data bit and an 8us peak a "0" data bit.

[For another journey on analog-level disc reading, we recommend this FloppyControlApp blog post]


Different drives, different results

One thing we found interesting is that different disc drives have different behaviors at the analog level. These differences won't affect discs in good condition, but definitely lead to different behavior when trying to read faulty discs.

I captured the analog signal from various drives' test points, for a couple of faulty discs. First up, here's another disc with a dent, my Animated Numbers educational title. The manifestation of a dent generally seems to be a significant loss of signal amplitude. Let's have a look at the analog signal captured around the region of the dent:

Mitsubishi MF503 (full res: click here)

At the low amplitudes in the middle of the dent, my MF503 drive provides a noisy, spiky signal. After filtering the noise out in modern software, the signal still isn't clean and differentiating the 0 bits from the 1 bits isn't trivial.

Mitsubishi MF504C (full res: click here)

This slightly less ancient Mitsubishi drive offers a much cleaner analog signal, although quite weak at the worst of the dent: about 5mV peak-to-peak in places. The signal cleans up further with filtering in modern software. The peaks are far too faint for the drive electronics to resolve anything, but as a human, you can discern 4us peaks from 8us peaks. From a quick eyeball, the data appears to be intact.

TEAC FD-55FV (full res: click here)

The TEAC drive signal appears "thick" in the above image. That's the presence of some high frequency noise. It is probably there because some TEAC technical reference manuals show the analog signal test points as being wired before the low pass filter. Applying a low pass filter in software reveals a reasonable looking signal that is slightly stronger than the Mitsubishi drives.

TEC FB-502 (full res: click here)

This TEC drive does very well! The "worst" peaks are clear, clean and reasonably strong at 50mV peak-to-peak. No further software filtering is needed to get clearly resolvable peaks. Also, this drive has a generally stronger signal than the other drives (at 700mV peak-to-peak) for the undamaged sections of the disc.

Let's move on to another dodgy disc to test our ability to recover data at the analog level. This time, we're looking at "Old Macdonald's Farm" by EDIT, an unarchived disc I'm borrowing that is damaged in an identical manner to the "The Living Daylights" source code disc noted above. Here's the analog view of track 0 of that disc.

Track 0 of Old McDonald's Farm.

On top we have a full-track view. It repeats about 75% of the screen width across. The first two sector bodies have a completely collapsed signal, denoted by a magenta flat-line. The strong blip in the middle of the two sector bodies is the sector header, which is fully intact. Only the bodies are toast. On the bottom we have a zoomed-in view to the point where the signal collapses. At first glace, it's a total signal collapse that looks like analog noise.

Let's see what the different drives make of the collapsed area, if we turn up the oscilloscope sensitivity to 10mV/div and zoom in. We'll specifically look at what the first byte of the sector data looks like on various different drives:

In each case, the oscilloscope is capturing at a resolution of 10mV/div, so these are weak signals. The TEC drive has the strongest signal at around 25mV peak-to-peak, with weaker peaks at 10mV peak-to-peak. The other drives are downhill from there. The TEC drive also produces the clearest, least noisy signal. It is the only drive with any reasonable hope of recovering bits from "Old Macdonald's Farm", because the signal gets more chaotic later into the ruined sectors.

For each signal sample, you'll see the bits of the first sector data byte annotated underneath in red (always the same, obviously, and it happens to be the ASCII character '1', the first character of the disc title which is "1187V1.0"). The BBC Micro discs we are dealing with here are single density discs, which use a very simple FM encoding. Inside a sector body, this encoding uses one "clock" bit to keep track of timing for every data bit. The clock bit is always 1, representing a flux transition and therefore a voltage peak every 8us. To store a "0" data bit, there will only be one peak in an 8us window; to store a "1" data bit there will be two peaks.

The signals above are not only faint but also messed up. If you're having trouble discerning why the "1"s and "0"s are labeled the way they are, that's not surprising. Here's what the analog signal looks like when it's normal:

This is "00101100".

The signal is very clear. You'll notice that the peaks all have the same amplitude and shape as well as regular timing. This is from a floppy disc that is well over 30 years old. As long as floppy discs have been stored well, it's staggering how good the data integrity still is!

In terms of disc drives, one of my drives performs head and shoulders above the others: my TEC FB-502. I don't know why, and it's also one of my older drives. It could be that the disc read head is more sensitive, or able to hover closer to the disc surface. Or it could be the first stage amplifier has a far superior gain and signal-to-noise ratio. Or perhaps a mix of the two. The amplifier is a Hitachi HA16331P, which none of my other drives have. I look forward to finding another drive with this chip and comparing. It's also possible that the components (capacitors, perhaps?) have degraded on some of my drives and not the others. There are a lot of possible variables at play here.

The advice when recovering a signal that barely exists, is to try a few drives and see which is best. My TEC FB-502 is clearly a bit of a beast! The "Repton 3" and "The Living Daylights" discs were captured by Phil on his Mitsubishi drive, which is different still to my pair -- with different read heads and electronics -- which was giving good results.

As a fun bonus, Phil's drive also happened to have a variable resistor to control the rotation speed. This enabled us to crank the speed from the standard 300rpm to 400rpm. Why? Because of physics! The voltages induced in the drive read head are supposed to be proportional to the speed of magnetic flux change, and it did help a little with the signal strength.


A quick audio diversion

The astute observer will notice that we're using "Audacity" for viewing the analog disc waveforms -- which is a well-known audio analysis and editing tool. This may seem strange at first, but it's a perfect fit. It enables rapid zooming and exploration of the waveforms, as well as having versatile low-pass filters and the ability to draw directly onto the canvas to fix iffy peaks. It also has a facility to import CSV files, which is one of the formats you might get from an oscilloscope.

And since we're in an audio tool, we can have some fun!

Have you ever heard what a floppy disc recording sounds like? If not, check out this WAV: click me. It's slowed down so that it's in the range of human hearing. A disc revolution is normally 0.2s, as opposed to the 20s or so sample here. Some people say it evokes memories of their old modems connecting. This sample is actually the dented disc and perhaps my favorite part is the dent, where it sounds like a bit of a wub! wub! Could be a good bass sample for a tune :)


Peak recovery

Having wired up our oscilloscopes to our best drives for capturing analog data, it's time to interpret the resulting waveforms. Of course, we ideally want a program to turn the analog waveform into a series of sector data bytes.

One of my attempts simply filtered the data aggressively with a low-pass filter and then looked at the timings between the peaks. Surprisingly (at least to me), this worked terribly. Upon investigation, there were various contributing factors:

  • The signal is very degraded, so there's a baseline of jitter in where the peaks appear vs. where they should be.
  • Applying aggressive filtering that is sufficient to eliminate "false" peaks also shifts the peaks' positions, exaggerating the jitter to the point that some peak timing deltas become indeterminate between 8us and 4us.
  • In the weak signals, the amplitude of the 4us peaks has degraded more than the 8us peaks. Sometimes, the 4us peaks are almost completely gone and filtering doesn't help with the clarity.
The attempt that bore some fruit was to instead locate the start of the sector data, and continually look at the next 8us of the analog stream. If the voltage appears to generally drift in one direction, then it's a "0" bit, or if the voltage peaks one way and then the other, it's a "1" bit. After the decision, a re-sync is performed to the nearest peak. Visually, it looks like this:

A slide from a recent presentation.

As can been seen, the algorithm isn't fazed by false peaks. The 9th bit, a "0" bit, has a very prominent false peak. But the algorithm summarizes that the voltage trend in the 8us chunk is a big downward drift, which has to be a "0" bit.

Despite the chaotic nature of this waveform, the algorithm recovered all of the bytes in these "lost" sectors. To crib another slide from the recent presentation:

The recovered sector data from "The Living Daylights".

As noted in the slide, we compare the recovered sector data against the recovered on-disc CRC16. The sector data is 256 bytes followed by a 2 byte CRC16. We got a match.

In case you were wondering, one thing we glossed over is "locate the start of the sector data". The start of a sector body (or a sector header) contains a pulse sequence that cannot occur in a well formed sector body or header. Essentially, it contains missing clock bits.

As a recap, recovering this data was looking hopeless not that long ago. The Greaseweazle returned mostly noise, but now we've got all the bytes off successfully! This is incredibly encouraging: it suggests that many discs that appear "lost" at first glance are in fact recoverable. Faint traces of data often remain and these can be scooped up with an analog read.

The dented "Repton 3" source code disc was a bit more of a headache because of how incredibly weak the signal becomes in the region of the dent, as seen by Phil's drive. The peak detection algorithm wasn't able to resolve it, although in retrospect, it may have done if we had further filtered the signal. In the end, we resorted to what seems to work best for the worst signals: human intervention! By drawing in repaired peaks on the most degraded parts, the peak detection algorithm was able to resolve the dented region:

Original signal above, hand-drawn fixes applied below.

In the end, with a combination of the above techniques and tools, we were able to recover 100% of the file data from all of the source code discs.

One final note on peak recovery: I know very little about the field of signal processing, so I can't help but suspect there's a "proper" way to recover faint digital signals from analog captures. Advice most welcome.

[This section wouldn't be complete without a link to this very interesting FloppyControlApp Waveform editor. I haven't tried it as we only just found it as we've finished this write up. I wish I had found it sooner! That said, the pulses I've been fixing have often degraded in different ways to the notes and pictures in the video. Perhaps it's a difference due to the difference physical spacings or materials on 5.25" vs. 3.5".]


Caring for discs

No discussion of recovering data from old floppy discs would be complete without some notes on how to best handle the discs. So far, we've looked at hardware, tools and techniques to read data from discs but haven't considered the condition of the discs themselves, or risks to the discs.

To be clear why it matters: spinning some random disc in some random drive can damage the disc, perhaps even irreparably! Possible problems with the discs include:

  • The physical disc surface may have degraded to the point where it is either brittle, or goopy.
  • The disc sleeve lining may have become brittle and scratchy.
  • The disc surface may have become covered in mold.
  • The disc sleeve lining and/or disc may be dented.
And possible problems with drives include:
  • The drive read heads are dirty or scratched.
  • Internal mechanical components bent or out of place.
  • Excessive friction placed on disc surface.

If dealing with rare or unique discs, a minimum level of care is needed:

  • Use a well-tested drive that doesn't mark discs.
  • Stop and ask for advice if the disc or disc surface is visible damaged, moldy or dirty. There's a good discussion on cleaning discs (albeit 3.5") in this video.
  • Inspect and clean the drive heads before inserting each different disc.
  • (Strongly recommended) - use a "sealed / bubble head" drive, such as the FB-502, which places less friction on the disc.

A more thorough document on the preservation of BBC Micro discs may be found here: click me.

And a cautionary note on what can go wrong when trying to read old discs:

Somewhat horrifyingly, part of the disc surface is now transparent. At some point in the past, the information carrying oxide particles were stripped off the plastic disc. Perhaps this could have been avoided by using a cleaner or more gentle disc drive.


What on earth happened to that The Living Daylights disc?

We've now encountered two discs that are corrupted in exactly the same and very curious way: one of "The Living Daylights" source code discs, and the "Old Macdonald's Farm" educational title. In both cases, the re-write of sector bodies has left them totally trashed.

Our initial theory is that the writing drive could have become misaligned. This is a common ailment and it's where the drive read/write head get knocked slightly off where it should be. One effect would be that every write would be off-center of the track, possibly leading to read difficulties in a correctly aligned drive. This theory was eliminated by taking the "Old Macdonald's Farm" disc and manually misaligning a drive in both directions to look for a better data signal; none was obtained. Another flaw with the alignment theory is that a misaligned drive might have trouble reading the sector headers, which would prevent the write from occurring at all.

Our current theory, and one that is a reasonable match for the evidence, is that the drive, cabling or floppy disc controller chip failed in way that prevented write pulses taking effect. This means that any write would energize the drive head but never reverse magnetic polarity. In effect, this runs a constant current through the write head and is a type of erase across the disc sector. So, how thoroughly does this erase remove traces of magnetic flux reversals? I tried this by doing this type of erase on top of a FM bit pattern of "1111000011110000....":

The erased bits have left a trace behind.

Normal signal strength for this disc and drive is 650mV peak-to-peak. The signal strength of the erased bits is about 50mV peak-to-peak for "0" bits, but the higher frequency "1" bits are down to about 20mV peak-to-peak, and are hard to discern. Also, a much higher frequency oscillation has crept in, possible around the frequency of the low-pass filter's cutoff?

Obviously, the result here is going to vary wildly depending upon drive electronics, drive head, and disc material. But the point is made: erasing over floppy disc data leaves a signal behind on some drives.

We think that the sector bodies were erased by a write fault 30+ years ago, leaving a faint signal that then further decayed and became chaotic over the decades. But the data is still there.


Closing remarks

We're very pleased to have recovered some old BBC Micro source code and games discs that were not recoverable by standard means. In the case of the recovered source code, it is now with the original author of the games, who is checking over the materials and deciding whether to publicly release them.

In the case of "Old Macdonald's Farm", we pieced a working disc back together from a combination of bit recovery plus some cross-referencing with a related title that uses the same game engine and a similar disc layout. The results were posted here: click me. It's hard to exaggerate how really faint and corrupted the "Old Macdonald's Farm" disc signal is. It's much worse than the "The Living Daylights" source code disc. We didn't previously look at what Greaseweazle thinks of it, so here it is:

The signal on the trashed sectors is just desolation as far as the Greaseweazle sees. It's a miracle that the bits are still there and recoverable.


We wish to warmly extend the general offer: for unique discs, rare discs, or discs of historical interest, we're happy to receive them and do our best to recover them using the care and procedures outlined above. The offer extends beyond BBC discs to any 5.25" discs: Apple, Commodore, PC, etc.

We don't have any tooling to release at this time, because none of it is close to production quality. This is an area of ongoing effort as and when we encounter new discs to recover. It's also non-trivial to put together an overall packaging of this work, because disc drives and oscilloscopes are all different. Reach out if you'd like to wade in to this area and don't mind dealing with messiness.

There's plenty of opportunity to further research and improve the hardware and software story here:

  • On the hardware side, the general theme is to replace old 1980s tech with modern tech, i.e. the filtering and peak detection has been moved to modern software. Continuing along this theme, it'd be interesting to replace the 1980s amplifier with a high quality modern amplifier with excellent gain and signal to noise ratios. At that point, we could directly compare the quality of different drive read heads.
  • On the software side, as previously covered, I'm pretty convinced my peak recovery efforts could be a lot better!

Thanks for reading!

Chris Evans and Phil Pemberton...

... and Mr. Macdonald



22 comments:

Anonymous said...

Thanks for doing cool work!

Tim West said...

Very interesting! :-)

Anonymous said...

Have you heard about Kasettilamerit aka tape lamers?
https://kasettilamerit.fi/in-english/

We archive any kinds of old media regardless of the platform. Over the years we have archived data from tapes and floppies from such platforms as Commodore 64, VIC-20, Amiga, Atari ST, IBM Compatibles, CP/M, Spectrum, Spectravideo, Apple II and Amstrad CPC. It doesn’t matter whether the media is professionally duplicated or written at home, we can create an archival quality image with the copy protection intact.

We have helped the Software Preservation team archive thousands of original
titles, not to mention all the homebrew Commodore 64 turbo tapes, which were the original motivation for founding our group.

The images we create are delivered back to the persons who loaned the media, and if so agreed, to other archival projects / groups. We negotiate the terms of disclosure with each media owner individually and always respect the privacy and confidentiality of the material.

Tetracorp said...

I've heard a rumour that Graeme Ing, formerly of Gremlin Graphics, may still have the floppy disks with the source code of my favourite Amiga game, K240. I'd be really excited if someone managed to get hold of those and archive them (and potentially other data like spritesheets, development tools, prototype builds, design documents, and other Amiga games where such data might exist). There's a current trend of releasing in-house data for other 90s platforms and it'd be really interesting to see such data for surviving Amiga artifacts.

Anonymous said...

How safe is it to send floppy disks through international mail? I recall warnings in the 90s about x-rays from security scanners potentially damaging magnetic media.

IzzyIsles said...

Thanks

Chris J Evans said...

Amazing work!
From the other Chris Evans (of CJE/4D)

Marc Ruef said...

Great work, thanks for sharing!

Bill Froog said...

Amazing work and fascinating write-up. Many thanks. Will check on my old stash of BBC floppies when I get a chance!

chuzzlewit said...

Well done. Preserving them is a real service to computing history! Very interesting to read too.

ASDBigmac said...

Brilliant!

Chris Evans said...

"How safe is it to send floppy disks through international mail? I recall warnings in the 90s about x-rays from security scanners potentially damaging magnetic media."

I believe it's safe. I've had hundreds of discs shipped from the UK to the US (including rare / unique ones) and they've all arrived fine.

Seth said...

I believe the FM used for this data consists of just a few harmonics of the fundamental frequency. That could mean that a comb filter or FIR filter would allow just those frequencies through and block the vast majority of the noise.

Could you make some of your raw data available? I can attempt to filter it that way.

Dave Oldcorn said...

Great work.

If the problem is a weak signal / low S/N ratio, did you consider doing many samples of the disk surface and combining them?

There's good synchronisation in the data either side of the problem area, which should enable things to be mapped up pretty well from pass to pass. Any random noise (e.g. from the drive mechanism, amplifiers etc.) will cancel out on the multiple passes, although noise on the actual disk surface itself will be seen as signal and amplified accordingly.

Combining multiple (resynchronised) captures from different drive mechanisms might be of some interest too.

Would also be interesting to see the statistics on the noise distribution (how much is on the media vs. how much is on the drive). That probably changes based on the disk itself...?

Unknown said...

Great project! I recognize a lot of considerations and thoughts that went through my mind when working on FloppyControl project. Some of the disks had the magnetic material come loose from the surface, which not only made the signal weak, there were no magnetic particles left to carry the signal.
I've only managed to recover some data by using the oscilloscope, most of it I could recover using the normal (digital) read signal and processing it in different ways.
There are a few algorithms in FloppyControl to filter and extract sector data, have a look at the source code on github, if you're interested. I've also experimented with error correction by comparing the crc checksum with likely candidates of bit patterns. I've found that up to 4-6 flux reversals could be recovered. Any more and the false positives went up a lot.
The differences in level could be due to the level of the signal being available from the pre amp, at different stages.
I was considering building a board with 3 ADCs running at 6MHz each, producing about 18MB/s of data which would be enough to capture the differential flux signal and the digital read data at the same time. I didn't go through with it as the returns were minimal at best for my purposes.
Floppy controller are pretty amazing as far as handling low level signals, at least for 3.5" drives. There were not many cases I could read the flux reversals while the drive couldn't. Either the signal was gone or the signal was just strong enough.
I tried using some processing to adaptively boost the weaker signals. It did help in some cases, made it worse in others.
In the end, when you've got data spread across multiple disks, with duplicates etc, it's often more time efficient to piece them together than to hunt for the data in weak signals. Still, if it's really the only copy, that's all you can do.

One avenue of possibility I haven't walked is to use AI and learning networks to do the detective work for you. It should be possible to train a network to recognize the flux reversals and make them guess what a weaker signal should be.

As for error correction, I've found that MSDOS sectors have a start and end pattern that you can use to re-sync the flux reversals after a glitch. That way if a sector only has a few reversals that are wrong it's possible to get the data beyond the glitch too. Controllers often give up because there's a crc error and you don't even get a part of the data. I can imagine such a strategy could be really useful for source code. Missing 200 bytes can be a big deal compared to missing just a few bytes. On the Amiga disks this didn't work as there are no padding/sync bytes between the sectors.
On rare occasions the MSDOS header was damaged but the sector data was still in tact, which could then be recovered by looking at the sector number of a previous or next sector.

It was a lot of fun working on the data, collecting, building filters and tools to get the most from the captures.

Anonymous said...

Very interesting article, thank you

Anonymous said...

Could the unreadable sectors be intentional for anti-copying?

Julien Oster said...

Hello Chris,

Outstanding work, reading your articles is always a joy.

I have a question about the filtering. You state:

Applying aggressive filtering that is sufficient to eliminate "false" peaks also shifts the peaks' positions, exaggerating the jitter to the point that some peak timing deltas become indeterminate between 8us and 4us.

Was this even with a linear phase filter? I am trying to determine if this is because "false peaks" are essentially contributing to the formation of a "new" peak with real peaks, or whether it was simply due to an unsuited low pass filter shifting some frequencies non-linearly.

Chris Evans said...

Hi Julien,

I've re-checked while explicitly using a linear phase filter. It looks a bit better (easier to categorize peak distances in software, and less human intervention). However, it's apparent that many of the peaks in the original capture are already significantly shifted from their ideal positions.

Thanks for mentioning linear phase filters. Definitely an improvement.

Anonymous said...

Amazing article.

About 30-35 years ago I worked in a company repairing and maintaining Floppy drives, 8" (the amazing Persci 299 and 277) and then 5 1/4" and just in to the 3" and 3 1/2" drives but they were too cheap to do anything to. It was interesting watching the move from high quality aluminium castings with precision machining to the latter pressed steel frames used by the latter 5 1/2" drives.

I don't remember much detail but some thoughts/memories have been triggered by your floppy articles;

* The scopes we had back then were not sensitive enough to show any signal direct from the heads of disks or tape drives - the head amps have a LOT of gain! But I like your idea for replacing one with a more modern one, hopefully quieter but variable gain for your data recovery efforts.

* You mention the peak shifting of 0 and 1 bits - this also was an effect of the head/media physics - some drives offered pre-compensation - writing the bits early/late so that they read back in the right place. (I assume later controllers got better data recovery designs so it was no longer needed)

* I'm astonished that you had a drive write to a read only disk in the 8271 - in the drives that we worked on the write current to the heads was gated with the write protect at the lowest level. Obviously something odd about that drive!
It was a saviour for many bench techs who accidentally tried a write test on a (VERY Expensive) Cat's Eye alignment disk.
I think the Persci drives even only enabled write (and tunnel erase through a delay) when the head position servo was in track hold mode so avoiding splatting over multiple tracks if something went wrong.

* Yes the 8271 was a very difficult chip to get hold of at the time, even Acorn had trouble getting supplies - one company I worked for who had managed to buy a few 10s of then even asked Acorn to sell us upgrade kits without the 8271 that they couldn't get - they wouldn't.

* You mention voltage level issues when driving an FDD from the BBC User Port and needing to remove the resistor pack. The FDD (named SA400 after the first Shugart drives) interface is designed to have one, and only one drive with the pull up (to 5V and I can't remember the resistor value) pack (at the far end of the cable) and the interface used open collector drivers - typically 7406/7407 TTL. These may still be easily available and will get you more reliable operation.

I could ramble for a while longer - especially on the 'fun' of aligning heads - once assembled they generally only need radial alignment which is easy, but there's a lot of work in getting the two heads in the assembly properly aligned if you have to change the floating one.

Thanks for all the fascinating Blog posts!
Ian

Anonymous said...

Question from a software-only developer, so no deep hardware/electronic background

Why is there no remake of a floppy read header, or a complete floppy drive, only new fpga controllers? wouldnt it be possible to reach much better input signals or is such a read header a too complex thing to re-create or so simple that it is not technical improveable?


Anonymous said...

While you're dipping into the analog domain, how about microstepping the seek motor to hedge a little to either side of the track? 256x microstep drivers are cheap now, and multiple passes at multiple offsets might yield interesting data.

I'm picturing a follow-on to the GW that has all the analog amplifier and high-rate ADCs previously discussed, plus a microstepping motor driver, and simply drops in place of the PCBA on a couple common floppy mechs.

Actually this sounds like it might be fun to design...