Unix filesystems
EXT and More
What is a filesystem?
- Filesystems abstract the physical view of storage devices from users, and virtualize storage area on a disk using the concept of files and directories.
- Files serve as containers for user data and directories act as containers to a group of user files.
- A file system specifies how files are laid out, organized, indexed, and how metadata is associated with them. (Each filesystem does this a little differently than others)
Blocks and sectors
See this page.
What is a ‘BLOCK’?
- ext4 allocates storage space in units of “blocks”.
- A block is a group of sectors between 1KiB and 64KiB, and the number of sectors must be an integral power of 2.
- Blocks are in turn grouped into larger units called block groups.
- Block size is specified at mkfs time and typically is 4KiB.
- By default a filesystem can contain 2^32 blocks; if the ‘64bit’ feature is enabled, then a filesystem can have 2^64 blocks.
What is a ‘BLOCK’?
- A block is a uniformly sized unit of data storage for a filesystem.
- Block size can be an important consideration when setting up a system that is designed for maximum performance.
- Valid block sizes are 1024, 2048, 4096
- If omitted, system will calculate
Bad Blocks ?
- Cannot reliably be used for storing data
- If the system sees that it cannot reliably read or write data to a particular block, it is marked as bad so that we don’t use it anymore.
Metadata
- Pieces of information that your filesystem maintains in order to track your files and directories.
- These are maintained in a structure called an inode (index node).
- size
- dev id
- UID, GID
- OTher stuff
- DOES NOT STORE FILENAME.
Files and Inodes
Files :
- Are seen by the user as a group of related information.
- transparently stored (you don’t know where all the blocks are)
- Does not contain information about itself (ie. size, who created it, when created, etc…) (Unless someone adds that data)
- This information is referred to as metadata
- When created are:
- given a name
- and a unique integer (inode number)
- name and inode are stored IN THE DIRECTORY
Files and Inodes
- In essence then, a directory is nothing more than an name->inode map.
If I were to access a file called f1.txt
, the operating system would look up the corresponding i-node (which could in-turn lookup related inodes that are part of the file)
Inode tables
The operating system maintains a table of inodes.
- Can see parts of this table with various commands described subsequently.
- A few disk blocks are reserved for storing this inode stuff. The remaining blocks are considered ‘data’ blocks for storing data.
Viewing inodes
df -i
will show us how many inodes we have and how many available to us.
- Can we have more files than inodes?
ls
doesn’t access inode metadata, but ls -l
does.
ls -i
will give us information about inode
stat f1.txt
will give us metadata information.
More testing with inodes
- Examine inodes when:
- we create a symbolic link?
- we create a hard link?
- hard link is an entry in a directory that points directly to an inode bearing the file’s metadata.
- multiple filenames point to same inode number
- So, files can have multiple names.
Other inode implications
- An inode may have no links
- A files inode number stays the same when moved to another directory on the same device.
- root generally has an inode of 2? (
ls -lai /
)
- When a filesystem is damaged, the damaged parts appear in the lost+found directory within the partition in which they once existed.
One more piece of information
When you create a hard link, it just created a new name in the table, along with the inode, without moving the file. When you move a file (or rename it), you don’t copy the data. That would be Slow. You just create the (name,inode) entry in a new directory, and delete the old entry in the table inside the old directory entry. In other words, moving a gigabyte file takes very little time. In the same way, you can move/rename directories very easily. That’s why “mv /usr /Old_usr” is so fast, even though “/usr” may contain (for example) 57981 files.
But what if I move across partitions?
Inode Examples
- Create a dir and cd into it
- What does output of
ls -id .
show?
- Cd back to parent directory and
ls -id testdir
, should show same inode number
- Now do an
ls -la
on that directory. What is the field after the permissions? This is count of hard links to the file.
- We should see that there are (2) hard links, one for filename and one for
.
.
Inode Examples
Look at this output:
drwxrwxr-x 6 joe joe 4096 Oct 6 10:15 it1100
drwxrwxr-x 41 joe joe 4096 Oct 22 11:22 lab6
What does the 41 mean on line 2?
Create a dir (look at that column), create subdir (look), create sub,subdir(look)
Finding with inodes.
Inodes viewing
- Inodes, then, store all the metadata
- View inode metadata with
stat
command (the os uses this too when file access is requested)
Special Inode note
- Space for inodes must be set aside when an operating system (or a new filesystem) is installed and that system does its initial structuring of the filesystem. Within any filesystem, the maximum number of inodes, and hence the maximum number of files, is set when the filesystem is created. (mkfs commands)
- We can run out of space by using up all the inodes or using up all disk blocks.
- How do we decide how many inodes we should format with?
Data block map
- Each inode should record the locations of data blocks in which corresponding file data is stored
Data block map
See the page here
Directories
- A directory is just a special file (hence the “everything in linux is just a file”)
- A directory maintains the file name and the inode number to get to that file.
Unix filesystems
- SO, Most of these filesystems have the following:
- superblock?
- Inodes - previously mentioned
- data blocks - hold the data
Superblock
Contains information about the filesystem as a whole such as:
- filesystem size.
- block size
- empty and filled blocks
- block count
- size and location of inode tables
- disk block map and usage information
- size of block groups
- state of the volume: consistent or dirty
Superblock Viewage
sudo dumpe2fs -h /dev/sda1
- only applies to ext formatted partitions
Superblock Importance
- All file requests must first access the filesystem’s superblock.
- If superblock cannot be found, or is corrupted file access won’t work
- Because of the importance of it, the superblock is stored in RAM as well as other locations within the filesystem.
- If you look at output of
sudo dumpe2fs /dev/sda15
you can see where ‘Backup superblocks’ are stored.
Last word on Superblocks (for today)
- Similar to the FAT (File Allocation Table) of Windows
Operations
- Mount is an operation of enumerating an on-disk superblock and metadata into memory for the filesystem’s use.
- Unmount:
- all metadata and caches are synchronized with disk blocks (finish writing everything)
- creates entry in that fs superblock of ‘consistent’. (If this entry isn’t written, the fs will mark as ‘dirty’)
The EXT filesystem tools
EXT4 is the most common linux filesystem in use.
- mkfs -t ext4 or mkfs.ext4
- creates the superblock and inode table
- man mkfs.ext4
- tune2fs
- tune different filesystem parameters
How to use mount command
mount /dev/sda3 /mnt
- partition is
/dev/sda3
- it will show up in my tree as
/mnt
tune2fs
- Reserved block percentage: (-m flag)
- Set the percentage of the filesystem which may only be allocated by privileged pro‐ cesses. Reserving some number of filesystem blocks for use by privileged pro‐ cesses is done to avoid filesystem fragmentation, and to allow system daemons, such as syslogd(8), to continue to function correctly after non-privileged processes are prevented from writing to the filesystem. Normally, the default percentage of reserved blocks is 5%.
File system size terminology
- MB vs MiB , GB vs GiB , etc…
- MB = 1000*1024
- See this
Computation Example
- Suppose 100M partition 2048 block size
- mkfs.ext4 -b 2048 /dev/sdaX
- results in 51200 2k blocks and 25600 inodes
- Bytes? (100 * 1024 * 1024 = 104857600)
- Reserved? (5%) (104857600 * .05 = 5242880)
- Total blocks? (104857600 / 2048 = 51200)
- Verify with
tune2fs -l /dev/sda7 | grep -i block
- Similarly compute reserved block count
fdisk -l
will show sector count, but if our default sector size is 512B, 104857600⁄512 = 204800.
df vs partition size
Why am I getting different results?
- look at block count and block size
df -B 2048 /dev/sda7
gives 45490
(this is different than 51200 of previous slide)
- I have 2560 (5% reserved)
- 51200 total blocks - 2560 reserved - 45490 df = 3150 blocks left
- What about our inode space (my inode size is 128, inode count 25600 = 3276800 or
1600*2048
blocks). Remaining = 3150 - 1600 = 1550 blocks?
- Where are those remaining 1550 blocks being used?
- Journaling?? Other stuff?
What is a journal
here
- A special file that can help repair inconsistencies in the filesystem.
- information is written into journal and flushed to HDD before each command returns, if the system crashes things will be written or rolled up to maintain consistency.
- faster than scanning the entire HDD (can reboot quicker)
Try this one more time
sudo dumpe2fs -h /dev/sda1