Guidance on implementing FAT file system?

Jim Bancroft · Feb 15, 2006

Hi everybody,

Our operating systems class is working on a team project this semester-- building our own OS. My group is in charge of implementing the FAT file system. Trouble is, we have no idea where to begin on implementation.

We've been reading specs and general whitepapers on how FAT works, but we're lost as to how this might be implemented, even on a simplistic level, which is what we're aiming for. Boot sectors, FAT structures, etc....it's all new to us.

If anyone knows of some decent code or tutorials we can look over to get ideas on how to approach this I'd really appreciate it. Thanks!

Matthias99 · Feb 15, 2006

Well, this Wikipedia article describes the FAT system pretty well.

It's pretty damn simple as far as filesystems go -- if you have 'no idea' how to implement this, I'm not sure where to start. If this is a college-level course, you should be able to take a description like that Wiki article gives and break it down into functional blocks, etc. and figure out how it all works together. If you can't do that... you need more help than someone in a forum like this can give you.

phisrow · Feb 15, 2006

A look at the appropriate portion of the kernel source for Linux or one of the BSDs would give you some idea of what a real world implementation looks like.

Jim Bancroft · Feb 16, 2006

Originally posted by: Matthias99
Well, this Wikipedia article describes the FAT system pretty well.

It's pretty damn simple as far as filesystems go -- if you have 'no idea' how to implement this, I'm not sure where to start. If this is a college-level course, you should be able to take a description like that Wiki article gives and break it down into functional blocks, etc. and figure out how it all works together. If you can't do that... you need more help than someone in a forum like this can give you.

Hi,

Thanks for replying. When I said I have no idea how to implement FAT I should have been more clear-- I understand the ideas and data structures behind FAT; linked lists and all that.

What I'm after is how you actually put a file system onto disk-- how would a format command work, for instance? When someone calls fopen(), what would an OS do to pull the appropriate info from the FAT table and data clusters in question? Those kinds of things are what I'm not sure about.

Peter · Feb 16, 2006

No, no, no.

Step 1: Obtain license from Microsoft. FAT is patented.

CTho9305 · Feb 16, 2006

Originally posted by: Peter
No, no, no.

Step 1: Obtain license from Microsoft. FAT is patented.

Stick to short filenames... the 3 patents all relate to storing both long filenames and short filenames. Does Linux need a license or are there exceptions for software producers? The MS page only talks about hardware manufacturers.

SamurAchzar · Feb 16, 2006

You are a lucky man. I've implemented a full, standard FAT12/16/32 (with LFN support) for using in a commercial project of mine. It was meant to be used on an embedded platform. It's very well documented and has absolutely no dependencies on anything.

It's even thread safe and features file/directory locking (to provide concurrent access).
It might not be fully bug-free, but it's getting there (I'm using it ATM).
I also did not test it on anything other than FAT12 volumes, but it's designed to support FAT16/32.

You could use that as a reference if you like.

Additionally, look for a document named fatgen103.doc / .pdf from Microsoft, which seems like a half-arsed attempt of one of their engineers to document that (seems like he did at unwillingly at a gunpoint ).
There's plenty of information other than that. For volume information, you got the BPB (Boot Parameter Block) - look that up.

You can ask me whatever question you like, I'll do my best to help out.

My suggestion for you would be - start with the volume handling - mount (check partition type, get information, etc.), unmount (commit all cached data to the volume), read next cluster number off table, write next cluster to table.
Another suggestion would be implementing it over a generic Block Device implementation, makes debugging and porting really easy.
A block device essentially has read-from(), write-to(), close(), open() properties.

For example, initially I wrote it to work over a pseudo-Block Device which just used standard stdio (under Visual Studio) to access an image I made from a floppy drive.
Later I just substituted that with a block device which writes to a Flash chip, and also does caching.

It's a fun project. It's quite trivial unless you do that for a multiple access environment (multiple processes accessing the volume), then it's a pain, but I did that anyway...

sao123 · Feb 16, 2006

hardware interfacing = teh hard.
perhaps you should start reading up on bios / os interfacing.

cheesehead · Feb 16, 2006

I'm rather clueless when it comes to programming, but it seems to me as though it would be easier to implement a Linux filesystem. After all, everything's spelled out, and the code's all open-source.

SamurAchzar · Feb 16, 2006

Not necessarily. The FAT is ugly, but simple, and the source code for the FAT driver of Linux is just as open as EXT3 (Linux's FS) is. And you can't beat FAT for interoperabillity (that's why I chose it for my device).

Peter · Feb 16, 2006

Originally posted by: CTho9305

Originally posted by: Peter
No, no, no.

Step 1: Obtain license from Microsoft. FAT is patented.

Click to expand...

Stick to short filenames... the 3 patents all relate to storing both long filenames and short filenames. Does Linux need a license or are there exceptions for software producers? The MS page only talks about hardware manufacturers.

AFAIR, the patent applies to anything that physically ships with a FAT file system on it.

Jim Bancroft · Feb 16, 2006

Originally posted by: SamurAchzar
You are a lucky man. I've implemented a full, standard FAT12/16/32 (with LFN support) for using in a commercial project of mine. It was meant to be used on an embedded platform. It's very well documented and has absolutely no dependencies on anything.

It's even thread safe and features file/directory locking (to provide concurrent access).
It might not be fully bug-free, but it's getting there (I'm using it ATM).
I also did not test it on anything other than FAT12 volumes, but it's designed to support FAT16/32.

You could use that as a reference if you like.

That would be great, if you had a few sections I could look at to get started with. I've read over that Microsoft whitepaper you mentioned before, and you're right-- it's not the best document going.

Ok, so if I can ask...there's the boot sector. First part of it is the BPB, correct? I think I understand that part well enough. Fill in a few bytes' of info about the sector count, etc. I believe at the end of the boot sector I need to have a value of 0xAA55, as a checksum?

It's after the BPB that I get confused. What comes next-- the FAT table itself? Two copies of it, one for backup? What does it look like structurally? And after the FAT, does the data itself follow?

I'm not sure how to go about that stuff, so if you had some code, or even just a general outline with reccomended structures that you've built, I'd really be grateful. Maybe if you pointed me to documents that you relied on to learn this stuff it would help too.

thanks a ton.

sao123 · Feb 16, 2006

Originally posted by: Jim Bancroft

Originally posted by: SamurAchzar
You are a lucky man. I've implemented a full, standard FAT12/16/32 (with LFN support) for using in a commercial project of mine. It was meant to be used on an embedded platform. It's very well documented and has absolutely no dependencies on anything.

It's even thread safe and features file/directory locking (to provide concurrent access).
It might not be fully bug-free, but it's getting there (I'm using it ATM).
I also did not test it on anything other than FAT12 volumes, but it's designed to support FAT16/32.

You could use that as a reference if you like.

Click to expand...

That would be great, if you had a few sections I could look at to get started with. I've read over that Microsoft whitepaper you mentioned before, and you're right-- it's not the best document going.

Ok, so if I can ask...there's the boot sector. First part of it is the BPB, correct? I think I understand that part well enough. Fill in a few bytes' of info about the sector count, etc. I believe at the end of the boot sector I need to have a value of 0xAA55, as a checksum?

It's after the BPB that I get confused. What comes next-- the FAT table itself? Two copies of it, one for backup? What does it look like structurally? And after the FAT, does the data itself follow?

I'm not sure how to go about that stuff, so if you had some code, or even just a general outline with reccomended structures that you've built, I'd really be grateful. Maybe if you pointed me to documents that you relied on to learn this stuff it would help too.

thanks a ton.

I believe the topic to search for is DOS bootstrap.

SamurAchzar · Feb 16, 2006

Gimme an email, I'll send you the entire code. Buy me a beer if I ever get in your way

Basically, you have the BPB, which comes first. It contains the partition table and partition types. The beginning of it IIRC is the machine bootup assembly code - probably used in older PCs.

The actual FATs follow this sector (FATs - as there are two of them - one for backup).
You can determine the location of every partition (and its FAT) by looking at the table - the sizes are listed there.

The FAT types (12/16/32) are determined according to a Microsoft supplied "table" - there are three strict ranges, found in that document.

0x55AA ends the BPB, yea.

My structure is (from lower to higher):

0) Block device - not really a part of the FAT itself but important anyway

1) Volume management - BPB, FAT table, some parametric calculations, etc.
A component called the VRM (Volume resource manager) is contained within - but it's not relevant for you unless you want to provide access locks.

2) Directory management - That's the tricky part. Resloving LFNs, etc.

3) FOPS support - implementing nearly standard stdio stuff

I got a very comprehensive bookmark list on this very subject in my other computer. It should be up soon (can't... resist... fragging...) so I'll post it here too.

Aside from my FAT, there's an implementation in a software named FreeDOS, IIRC. I checked it initially before writing my own, I found that one lacking - not multiprocess safe, mainly.

Jim Bancroft · Feb 16, 2006

Originally posted by: SamurAchzar
Gimme an email, I'll send you the entire code. Buy me a beer if I ever get in your way

You got it. Email is howboutafresca at hotmail dot com. If you're ever in San Francisco/Oakland, drop me a line and I'm buying. :beer:

I'll look at FreeDOS too....any section(s) in particular I should check out there? Thanks again. And again.

SamurAchzar · Feb 16, 2006

Sent.

Wanna share some background info on your O/S, for the meanwhile? I'm curious.

Have fun

EDIT: Oh, that FreeDOS - http://cvs.sourceforge.net/viewcvs.py/freedos/kernel/kernel/
Look for anything with 'fat' prefix.

Jim Bancroft · Feb 17, 2006

Originally posted by: SamurAchzar
Sent.

Wanna share some background info on your O/S, for the meanwhile? I'm curious.

Have fun

It's for a class in operating systems-- we've been broken up into teams of four. Some are working on hard disk/floppy device drivers, others are doing windowing code, some are working on networking. I'm on the file system logical layer team.

The ultimate goal is to have something we can put on a floppy and boot off. In the meanwhile we're hoping to test in a virtual machine, like Bochs.

I've just looked over your code (hotmail marked it as a junk mail; probably saw the "gmail" in your address )....wow. Nice comments. Fat_dir is 100KB! That must have taken a while. How did you accumulate all this knowledge?

Thanks for everything-- I'll email you milestone updates if you like, just to keep you informed.

SamurAchzar · Feb 17, 2006

Well I just read the spec. It might be annoying but it's the only way there is to do it.

Your OS seems like a fun project. Is that high school or college?
Is it running in protected mode or in real mode? Does it feature a scheduler?

I'd like to hear how it progresses - sure other would to - why don't you just update this thread? I'm sure many people around could help in other subjects... besides, I have something going for OS's...

And of course, you can contact me for any issue.

Good luck!

Matthias99 · Feb 17, 2006

Originally posted by: Jim Bancroft
Hi,

Thanks for replying. When I said I have no idea how to implement FAT I should have been more clear-- I understand the ideas and data structures behind FAT; linked lists and all that.

What I'm after is how you actually put a file system onto disk-- how would a format command work, for instance? When someone calls fopen(), what would an OS do to pull the appropriate info from the FAT table and data clusters in question? Those kinds of things are what I'm not sure about.

You've been provided with a bunch of information already -- but basically, the only things you can do to a hard disk are to read and write data to it (usually in 512-byte 'blocks'). For physical hard disks, this is actually implemented through calls to the BIOS, but you should be able to write a wrapper function to handle whatever platform- and device-specific mechanism is used to actually read/write the drive.

Basically, the filesystem maintains the relationship between logical 'files' and physical blocks on disk -- but that metadata is also stored on the disk, usually in some relatively fixed location (in this case, the file allocation table) so that the OS knows where to find it.

To 'format' a disk, you set up the basic data structures that will eventually hold your files and directories, and write those structures onto the drive, usually at the beginning of the partition. Depending on the filesystem in question, this may involve quite a bit of work (FAT is pretty easy in this regard). Optionally you can also zero out all the data blocks in the partition (this is usually the difference between a 'quick' and 'full' format).

When a program calls fopen() in a FAT filesystem, you have to read in entries from the allocation table on disk to find the file they are requesting, and if it exists, the entry for that file should tell you which blocks on disk contain its data (if it doesn't exist, either you create a new file or the operation fails, depending on the options you were given). Then you can build some data structure in memory that maintains that relationship as long as the file is open, and return a pointer or index to that to the caller (which they will use later to call read() or write()).

Jim Bancroft · Feb 17, 2006

Originally posted by: SamurAchzar
Well I just read the spec. It might be annoying but it's the only way there is to do it.

Your OS seems like a fun project. Is that high school or college?
Is it running in protected mode or in real mode? Does it feature a scheduler?

It's a college project. We'll almost certainly be using protected mode, since we plan on implementing some form of virtual memory (among other things), but it's all in the planning stage now. If there are big revelations I'll post 'em here. I've done high-level C++ coding previously, using friendly APIs, so this low-level OS stuff is new to me. My head is swimming reading about it all. My bitshifting and unsigned char skills are rusty.

Thanks once more for the links and code...and in keeping with the sensei/grasshopper dynamic we've established, can I ask what the "_e" and "_t" suffixes on your structures represent? I've noticed them throughout. Also, do you have a reccomended path for me to follow in learning your code-- files to start with, and other tips?

SamurAchzar · Feb 17, 2006

Eh _e means enum, _s means struct, _t is for "type" which either become after a "typedef". I reckon there's no real need for the _e and _s as I always "typedef", but old habits...

Start with "fat_fops.h". If you look at the header files, you'll see that every header is divided to "interfaces" and "internal", much like "public" and "private" in C++.
So always start at the interfaces (but that's true with any program), as they represent the functionallity of the module.

You have some kind of a main.c file in there, it was used for development and testing - you can see how I use the code there. Other than the mount()/unmount() functions which are obviously unevident in regular stdio, the rest is pretty straightforward.

Nothing else from the top off my head - I've made it as easy as possible to understand. I always do, with all of my software - there's nothing more annoying than having to maintain your old code which you already forgot, unless it's tidy.

Jim Bancroft · Mar 1, 2006

Hey Samur,

We're starting to cut code now. Love what you've done, but I do have a couple of questions. If you had some general guidance, or thoughts on where to look in your code, I'd be grateful.

1. How exactly do I put a copy of the FAT onto a floppy? Should we treat it as a raw device of some sort?

2. I notice your file/directory struct keeps track of the clusters the file is in. How do you update that stuff when reading a file's contents and you have to go to the next cluster, in case your data's on a boundary? Do you query the FAT to find the next cluster , then continue reading?

3. How do you add to/remove FAT entries? For all the stuff I've read on the web, I can't find any physical layout of a FAT entry and how you append to the FAT when adding something, or removing a FAT entry when deleting. Do you create a fixed-size FAT table, then scan through to find the next available entry when adding something?

That's it for now....feel free to answer any or all of the above, and thanks once more for the tips!

SamurAchzar · Mar 1, 2006

Hey Jim, good to hear from you!

1. You don't put it directly on a floppy (unless you have a low-level floppy device driver, that is).
For debugging, you should create a block device which writes to a simple file.
You can later download this file to a floppy as a raw binary image.
Actually, the better strategy would be formatting a blank floppy, creating an image file from it, and then using it.
Obviously, your target should be writing directly to the floppy, for this you'll have to write a block device which interacts with some kind of a floppy driver.

2. Exactly. As a sidenote I'd say that you could probably make a more efficient implementation by caching the FAT table or something like that - but this library was meant to be used on a device with 64K of internal RAM and that's it. Speed was a secondary priority, so it works "harder" - reads the FAT for every cluster.

3. There is no "FAT entry", so to speak. The FAT is just a lookup table. Your index is the current cluster, and value is the next cluster - like, for knowing the next cluster relative to cluster number 10, you'll read the value at offset 10 in the FAT table - that would yield the next cluster.

The real "entries" are the ones contained in the directories - they define the entity properties and first cluster, from there on you just walk on the FAT table.
So each of these directory entries leads to a separate cluster chain.

Glad to be helping someone

Jim Bancroft · Mar 2, 2006

Some good talk in your post, Samur. I'll take these suggestions to heart and get back to my teammates....and we'll keep you updated along the way. Of course, since we're newbies at this what we're implementing will be FAT "light" and with plenty of shortcuts. Our device will be a floppy disk, as it turns out. I like your idea of putting a floppy image on disk, then experimenting with it in the interim.

Do you have a link or two handy for writing a block device to interact with a floppy? There's another team that has to write the floppy driver, and I could pass your info along to them. They could use some expert opinion too I'm sure

We'll keep you posted. Thanks again.

Jim Bancroft · Mar 3, 2006

Hey Samur, do you mind if I ask a couple of questions regarding the FAT table and the directory entries? Looks like we're going with a stripped-down version of FAT 16 (no file locking, no LFN, no bells and whistles) and I was a little confused about what things go where.

So, what I know is there's the FAT table, which is (pls correct me if I'm wrong) a series of 16 byte values in our case. For example, if part of a file is located in cluster 564 on the disk, FAT table entry #564 "points" to the next cluster used by that file. Is that how it works? FAT entry #1 corresponds to cluster #1, entry #2 corresponds to cluster #2...etc?

Or are there some hidden entries at the start of the FAT table that I need to be aware of?

Then there's the FAT directory information. Here's where I'm a little more confused. I understand it holds file names and their attributes, as well as their starting cluster numbers. But I'm not sure where it goes on disk in relation to the FAT table itself. There's also the FAT "root" directory, which I understand is where the whole business starts. Again, I'm not sure where it sits on the disk in relation to the FAT table and other FAT directory entries.

Finally, I'm a little lost as to where the FAT should begin. Sector #2, after the boot sector? Or somewhere else? And what's the size of the FAT in bytes, assuming we're using FAT16? I don't mean the max file size you can have using FAT16, but the size of the FAT structure itself.

Thanks again!

Guidance on implementing FAT file system?

Senior member

Diamond Member

Golden Member

Senior member

Elite Member

Elite Member

Platinum Member

Lifer

Lifer

Platinum Member

Elite Member

Senior member

Lifer

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Diamond Member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Senior member