Author Topic: MFM decode (Read 3185 times)

orange · « **on:** December 17, 2017, 05:36:04 PM »

does 'rawread' dump MFM encoded data? if so, how to decode it with given sync, track size, etc..? (not a standard 880K diskette)

guest11527 · « **Reply #1 on:** December 17, 2017, 05:56:15 PM »

Quote from: orange;834180

does 'rawread' dump MFM encoded data?

Yes.

Quote from: orange;834180

if so, how to decode it with given sync, track size, etc..? (not a standard 880K diskette)

That, of course, depends on the format. MFM is a 1:2 encoding, every bit is encoded by two bits. In particular, the filler bit between two data bits is 1 if and only if the two data bits are zero. Otherwise, the filler bit is 0. Now, how the data is laid out and how the data bits are spread is entirely a decision of the format, and also how the track and sector headers look like.

The format of the trackdisk.device is described in the RKRM Hardware, you find all the information there. In this format, payload data is separated into two 256-bit groups (even and odd bits), which is a rather untypical layout, but it allows fast decoding with the blitter. Typically, filler bits are interleaved with the subsequent data bits.

orange · « **Reply #2 on:** December 17, 2017, 07:00:51 PM »

thanks Thomas.

olsen · « **Reply #3 on:** December 18, 2017, 12:31:07 PM »

Quote from: orange;834180

does 'rawread' dump MFM encoded data? if so, how to decode it with given sync, track size, etc..? (not a standard 880K diskette)

Documented MFM decoding program code is pretty scarce (I've seen my share, and I still can't believe that the people who wrote it trusted their own code!). You might want to dip into TrackSalve, which is somewhat old, though. "TrackSalve" is a patch for trackdisk.device by Dirk Reisig, last updated in 1990, fixing Kickstart 1.x-specific bugs which by that time had already been fixed for Kickstart 2.0. "TrackSalve" covers just about everything. Bonus: it uses the blitter for encoding/decoding.

olsen · « **Reply #4 on:** December 18, 2017, 01:43:23 PM »

Quote from: Thomas Richter;834181

Yes.

That, of course, depends on the format. MFM is a 1:2 encoding, every bit is encoded by two bits. In particular, the filler bit between two data bits is 1 if and only if the two data bits are zero. Otherwise, the filler bit is 0. Now, how the data is laid out and how the data bits are spread is entirely a decision of the format, and also how the track and sector headers look like.

The format of the trackdisk.device is described in the RKRM Hardware, you find all the information there. In this format, payload data is separated into two 256-bit groups (even and odd bits), which is a rather untypical layout, but it allows fast decoding with the blitter. Typically, filler bits are interleaved with the subsequent data bits.

I just double-checked: the disk format documentation ended up in "Appendix C" of the 3rd edition "Devices" ROM Kernel Reference Manual. It seems that it was originally part of the 1st edition "Libraries & Devices" ROM Kernel Reference manual in "Appendix L", but you can't find that version online.

Anyway, here's what I found, from way back (1985):

Code: [Select]

COMMODORE-AMIGA DISK FORMAT

The following are details about how the bits on the Commodore-Amiga disk
are actually written.

Gross Data Organization:

	3 1/2 inch disk
	double-sided
	80 cylinders/160 tracks


Per-track Organization:

	Nulls written as a gap, then 11 sectors of data.
	No gaps written between sectors.


Per-sector Organization:

	All data is MFM encoded.  This is the pre-encoded contents
	of each sector:

		two bytes of 00 data    (MFM = AAAA each)
		two bytes of A1*	( &quot;standard sync byte&quot; -- MFM
					encoded A1 without a clock pulse )
					(MFM = 4489 each)
		one byte  of format-byte
					(Amiga 1.0 format = FF)
		one byte  of track number
		one byte  of sector number
		one byte  of sectors until end of write (NOTE 1)

			[above 4 bytes treated as one longword
			  for purposes of MFM encoding]
	
		16  bytes of OS recovery info (NOTE 2)
			[treated as a block of 16 bytes for encoding]
		four bytes of header checksum
			[treated as a longword for encoding]
		four bytes of data-area checksum
			[treated as a longword for encoding]
		512 bytes of data
			[treated as a block of 512 bytes for encoding]

NOTES:

   NOTE	1.  
	    The track number and sector number are constant for each
	    particular sector.  However, the sector offset byte changes
	    each time we rewrite the track.

	    The Amiga does a full track read starting at a random
	    position on the track and going for slightly more
	    than a full track read to assure that all data gets into the
	    buffer.  The data buffer is examined to determine where the
	    first sector of data begins as compared to the start of the 
	    buffer.  The track data is block moved to the beginning of
	    the buffer so as to align some sector with the first location
	    in the buffer.

	    Because we start reading at a random spot, the read data may
	    be divided into three chunks: a series of sectors, the track
	    gap, and another series of sectors.  The sector offset
	    value tells the disk software how many more
	    sectors remain before the gap.  From this the software can
	    figure out the buffer memory location of the last byte
	    of legal data in the buffer.  It can then search past the gap
	    for the next sync byte and, having found it, can block move
	    the rest of the disk data so that all 11 sectors of data are
	    contiguous.    

	    Example:

		first-ever write of the track from a buffer like this:

		<GAP> |sector0|sector1|sector2|.....|sector10|      

		sector offset values:

			 11     10	  9	....    1

		   (if I find this one at the start of my read buffer,
		     then I know there are this many more sectors
		     with no intervening gaps before I hit a gap).

	
		sample read of this track:

		<junk>|sector9|sector10|<gap>|sector0|...|sector8|<junk>

		value of 'sectors till end of write':

		         2	  1	....    11           3

		result of track realligning:

		<GAP>|sector9|sector10|sector0|...|sector8|

		new sectors till end of write:

			11      10        9    ...    1

		so that when the track is rewritten, the sector offsets
		are adjusted to match the way the data was written.


	NOTE 2.	This is operating systems dependent data and relates
		to how AmigaDos assigns sectors to files. 

		Reserved for future use.



	GENERAL:

		When data is MFM encoded, the encoding is performed on
		the basis of a data block-size.  In the sector encoding
		described above, there are bytes individually encoded;
		three segments of 4 bytes of data each, treated as
		longwords; one segment of 16 bytes treated as a block; two
		segments of longwords for the header and data checksums;
		and the data area of 512 bytes treated as a block.

		When the data is encoded, the odd bits are encoded first,
		then the even bits of the block.  

		(Make a block of bytes formed from all odd bits of the block,
	 	 encode as MFM.
		 
		 Make a block of bytes formed from all even bits of the block,
	 	 encode as MFM.   Even bits are shifted left one bit position
		 before being encoded.)



SOURCE CODE FOR DATA ENCODE/DECODE

decodeBlock( mfmbuffer, userbuffer, numwords )
WORD *mfmbuffer;	/* the encoded data */
WORD *userbuffer;	/* where to put the decoded data */
int numwords;		/* the number of WORDS of data (not bytes) */
{
    WORD *oddptr, *evenptr, oddbits, evenbits;

    oddptr = mfmbuffer;

    /* the even region starts right after the odd one */
    evenptr = &mfmbuffer[numwords];

    while( numwords-- > 0 ) {
	/* mask off the mfm clock bits, and shift the word */
	oddbits = ((*oddptr++ << 1) & 0xAAAA);

	/* even bits are already in the right place.  Just mask off clock */
	evenbits = ((*evenptr++) & 0x5555);

	/* recombine the two sections */
	*userbuffer++ = oddbits | evenbits;
    }
}

encodeBlock( mfmbuffer, userbuffer, numwords )
WORD *mfmbuffer;	/* where to put the encoded data */
WORD *userbuffer;	/* the user data, before encoding */
int numwords;		/* the number of WORDS of data (not bytes) */
{
    WORD *oddptr, *evenptr;
    WORD *ubuf;


    oddptr = mfmbuffer;

    /* the even region starts right after the odd one */
    evenptr = &mfmbuffer[numwords];

    /* mfmencode takes one word of mfm data can correctly sets
     * the clock bits
     */

    /* encode the odd bits */
    for( ubuf = userbuffer, i = numwords; i > 0; i-- ) {
	oddptr++ = mfmencode( (*ubuf++ >> 1) & 0x5555 );
    }

    /* encode the even bits */
    for( ubuf = userbuffer, i = numwords; i > 0; i-- ) {
	evenptr++ = mfmencode( *ubuf++ & 0x5555 );
    }
}

Documentation on how the sector header and data area checksums are calculated remains elusive, I'm afraid.

guest11527 · « **Reply #5 on:** December 18, 2017, 02:13:07 PM »

Quote from: olsen;834189

Documentation on how the sector header and data area checksums are calculated remains elusive, I'm afraid.

For German readers, there is the Databecker "Floppybuch" for the Amiga which contains this information. I'm in general pretty careful with second sources, especially Databecker (you find a lot of nonsense in these books), but this one is pretty complete (but also contains nonsense you better filter out).

olsen · « **Reply #6 on:** December 18, 2017, 03:04:28 PM »

Quote from: Thomas Richter;834190

For German readers, there is the Databecker "Floppybuch" for the Amiga which contains this information. I'm in general pretty careful with second sources, especially Databecker (you find a lot of nonsense in these books), but this one is pretty complete (but also contains nonsense you better filter out).

According to the "TrackSalv" source code the respective checksums are calculated for the MFM-encoded header/sector data, respectively.

I think that the checksum algorithm works as follows:

Code: [Select]

ULONG
checksum(const ULONG * encoded_words,int num_words)
{
	const ULONG mask = 0x55555555;
	ULONG sum;
	
	sum = 0;

	while(num_words-- > 0)
		sum ^= (*encoded_words++);

	sum = ((sum >> 1) & mask) ^ (sum & mask);

	return(sum);
}

The XOR operation is quite handy here, I suppose, since it works regardless of whether the MFM fill bits are present or not. This is not the case for the IBM PC floppy disk format, which uses CRC values.

It might be worth looking up the old Amiga 68k NetBSD/Linux kernel floppy driver code for reference.

olsen · « **Reply #7 on:** December 18, 2017, 04:03:24 PM »

Quote from: orange;834180

does 'rawread' dump MFM encoded data? if so, how to decode it with given sync, track size, etc..? (not a standard 880K diskette)

If I understand this correctly, you can tell trackdisk.device in TD_RAWREAD mode to start reading as soon as it finds the sync pattern of your choice. This should save you the trouble to find the beginning of the sector, which can be shifted by 1..15 bits.

You do need to know the sector size that is going to be used, though. In the standard Amiga format you'll encounter 32 bytes of header data in addition to the 512 bytes of sector data, including the sync pattern (four bytes total) which introduces the header. In this format you'll need (32+512) * 2 = 1088 bytes worth of memory to read the MFM-encoded data.

orange · « **Reply #8 on:** December 18, 2017, 08:42:07 PM »

what does ' length 12656/4' mean in output?
how long is the header, what is its format?
thanks.

edit: is it like this:

OFFSET Count TYPE Description
0000h 8 byte 'UAE-1ADF'
0008h 4 byte trackcount
000Ch 4 byte 0=amigados 1=raw mfm
0010h 4 byte tracklength
0014h 4 byte tracklength in bits
0018h 4 byte 0=amigados 1=raw mfm
...

I just cant find '0xAAAA AAAA 4489 4489'

( http://lclevy.free.fr/adflib/adf_info.html#p23 )

olsen · « **Reply #9 on:** December 19, 2017, 08:44:14 AM »

Quote from: orange;834196

what does ' length 12656/4' mean in output?
how long is the header, what is its format?
thanks.

edit: is it like this:

OFFSET Count TYPE Description
0000h 8 byte 'UAE-1ADF'
0008h 4 byte trackcount
000Ch 4 byte 0=amigados 1=raw mfm
0010h 4 byte tracklength
0014h 4 byte tracklength in bits
0018h 4 byte 0=amigados 1=raw mfm
...

Shrug... this does not look like anything I would expect to find on a standard Amiga formatted floppy disk. Are you sure you are looking for MFM data? If this is the data structure layout, I would expect it to be a container format, not the contents.

Quote

I just cant find '0xAAAA AAAA 4489 4489'

( http://lclevy.free.fr/adflib/adf_info.html#p23 )

You may not be able to see this pattern in the encoded MFM data at all. The thing is, this is a bit pattern, not a byte pattern. It can start in the MFM bit stream at virtually any position in the track buffer, but usually it's somewhere near the beginning of the buffer.

So, how do you find the bit position where it starts? The key is the 0xAAAA pattern, which either shows up as 0xAAAA in the MFM bit stream (if the header starts at an even bit position), or as 0x5555 (if it starts at an odd bit position).

The first step to decoding is to find out where the 0xAAAA bit pattern shows up. Because it covers 32 bits, you should be able to find it by looking for any two consecutive bytes which either read as 0xAA or as 0x55.

Code: [Select]

UWORD * mfm_buffer;
int mfm_buffer_size, i;
int num_words = mfm_buffer_size / sizeof(*mfm_buffer);
UWORD pattern;
int word_position = -1;

for(i = 0 ; i < num_words ; i++)
{
   if (mfm_buffer[i] == 0xAAAA)
   {
      pattern = 0xAAAA;
      word_position = i;
      break;
   }
   else if (mfm_buffer[i] == 0x5555)
   {
      pattern = 0x5555;
      word_position = i;
      break;
   }
}

/* Skip the pattern if it shows up again, which happens
 * if it started at the very first bit of the byte.
 */
if(word_position != -1 && word_position + 1 < num_words && mfm_buffer[word_position+1] == pattern)
  word_position++;

If these two bytes are part of a sector header, then they should be followed by two 0x4489 bit patterns in the next 0..14 bits. You need to figure out which bit position they show up at.

Code: [Select]

if(word_position != -1 && word_position + 1 < num_words)
{
   int bit_position = -1;
   ULONG match;

   match = (((ULONG)mfm_buffer[word_position]) << 16) | mfm_buffer[word_position+1];

   for(i = 0 ; i < 15 ; i++)
   {
      if(((match << i) & 0xFFFF0000) == 0x44890000)
      {
          bit_position = i;
          break;
      }
   }
}

At this point you should be able to tell if you found the byte and bit positions of the first 0x4489 sync bit pattern. The next step would be to check if the first 0x4489 pattern you found is followed by another one. If that's the case, you can begin to read the individual words, shift them as needed and reconstruct both the sector header and sector data in their MFM-encoded forms.

Please note that in production code the task of finding the sync words is usually table-driven and does not run in a loop which shifts bits around

orange · « **Reply #10 on:** December 19, 2017, 08:56:34 AM »

Quote from: olsen;834214

Shrug... this does not look like anything I would expect to find on a standard Amiga formatted floppy disk. Are you sure you are looking for MFM data? If this is the data structure layout, I would expect it to be a container format, not the contents.

that is an output of rawread command, it writes 'extended' ADF format, at least when raw tracks are present.

Quote

You may not be able to see this pattern in the encoded MFM data at all. The thing is, this is a bit pattern, not a byte pattern. It can start in the MFM bit stream at virtually any position in the track buffer, but usually it's somewhere near the beginning of the buffer.

thanks. I was searching at bit-level, but will try again. perhaps rawread removes the sync?

Quote

So, how do you find the bit position where it starts? The key is the 0xAAAA pattern, which either shows up as 0xAAAA in the MFM bit stream (if the header starts at an even bit position), or as 0x5555 (if it starts at an odd bit position).

The first step to decoding is to find out where the 0xAAAA bit pattern shows up. Because it covers 32 bits, you should be able to find it by looking for any two consecutive bytes which either read as 0xAA or as 0x55.
...

thanks. will try.

I've tried encoding 'DOS' to MFM data (after splitting to odd and even bits?bytes?), but can't find the bit pattern in input.

olsen · « **Reply #11 on:** December 19, 2017, 10:46:15 AM »

Quote from: orange;834215

that is an output of rawread command, it writes 'extended' ADF format, at least when raw tracks are present.

OK, so this is a container format after all.

Quote

thanks. I was searching at bit-level, but will try again. perhaps rawread removes the sync?

I don't know how the "rawread" command works (any pointers to the source code?), but if it uses the standard Amiga MFM encoded format, then it could drop the sync patterns because they are redundant in this container. Mind you, the sector header and sector data would still have to be preserved in properly-shifted form.

Quote

I've tried encoding 'DOS' to MFM data (after splitting to odd and even bits?bytes?), but can't find the bit pattern in input.

The odd and the even bits are encoded separately and stored separately (512 bytes apart). The encoding of the first bit of "DOS" may vary, depending upon the bit which preceded it. "D" = binary 01000100, which comes out as odd=0000 and even=1010 prior to encoding. That could be encoded either as odd=10101010 or odd=00101010 depending upon the preceding bit (sector data checksum) and even=01000100. So there's already a bit of ambiguity here.

orange · « **Reply #12 on:** December 19, 2017, 01:32:03 PM »

ok, thanks.
finally found the problem.
I was using:
$bitdata = unpack "b*",$data;
instead of
$bitdata = unpack "B*",$data;

in perl :/

kolla · « **Reply #13 on:** December 19, 2017, 03:08:21 PM »

Quote from: orange;834223

finally found the problem.
...
perl :/

Indeed

olsen · « **Reply #14 on:** December 19, 2017, 03:29:05 PM »

Quote from: orange;834223

ok, thanks.
finally found the problem.
I was using:
$bitdata = unpack "b*",$data;
instead of
$bitdata = unpack "B*",$data;

in perl :/

Oh well... give 'C' a try, please. Only a fraction of the expressiveness that leads Perl users to their doom, but the same degree of catastrophic errors easily triggered by a mere single wrong character that is abstrusely difficult to spot

Author Topic: MFM decode (Read 3185 times)

orange

MFM decode

guest11527

Re: MFM decode

orange

Re: MFM decode

olsen

Re: MFM decode

olsen

Re: MFM decode

guest11527

Re: MFM decode

olsen

Re: MFM decode

olsen

Re: MFM decode

orange

Re: MFM decode

olsen

Re: MFM decode

orange

Re: MFM decode

olsen

Re: MFM decode

orange

Re: MFM decode

kolla

Re: MFM decode

olsen

Re: MFM decode