A Fast File System for UNIX*
Review by James Ezick and Ken Hopkinson, Feb 1998
Goals
- Speculate about the nature of file system use and extrapolate possible improvements
- Present performance and efficiency enhancements to the existing UNIX file system.
- Compare and contrast performance and efficiency benchmarks after implementation of
enhancements.
Modifications
- Increasing the block size (from 512 bytes to >= 4096 bytes / block)
- Using principles of locality to position data block on the disk thereby reducing
latency, etc.
- Long filenames
- File Locking (replacing the need for conventional "lock files")
- Symbolic Links (both relative and absolute path names)
- Increase the reliability of rename (without copying)
- Quotas
Implementation
- 4096 or larger blocks 1024 block segments.
- Scatter the superblock across multiple clusters.
- Superblock contained a vector os lists called rotational layout tables.
- Bitmap array to hold "free" segments.
- Three step allocation hierarchy for writing files that had increased in size.
- Cylinder groups comprised of one or more consecutive cylinders on a disk, used to
organize files within a directory.
- Global allocator call the local allocator which uses a four level allocation strategy
for data placement.
Achievements
- Intelligent organization of directories and files to take advantage of locality
resulting in increased throughput.
- Developed a system to derive the benefits of large blocks while still coping with the
resulting waste of internal fragmentation.
- Made the existing file system more robust through the addition of useful features.
Drawbacks
- High CPU usage is required to implements solutions to problems resulting from larger
block size (impractical for commercial use).
- The number of small files that tend to exist on a file system prevent larger blocks from
being feasible without some way of dealing with internal fragmentation (intolerable amount
of waste).
- Quotas do not deal with files created by a user, hard linked to by another user, and
deleted by the creator.
- No built in facility for eliminating "dangling" symbolic links.
- File system relies on a machine dependent parameter that can cause a performance drop if
the file system is moved between physical machines.
Questions
- Are the assumptions on which these improvements are based still valid today.
(I.e., Do most file systems still contain many short files? Do larger blocks still
increase performance?)
- Given the power of today's CPU's, would the percentage of cycles needed to implement the
proposed improvements be the about the same? Is the cost still prohibitive?
- Which optimizations still make sense given today's disks?
- Do you know the parameters necessary to optimize the disk drive in your workstation?
- What do you think of the "enhancements"? Symlinks, locks, long file names,
rename, quotas.
* UNIX was a trademark of Bell Laboratories at the time...