Unix filesystems

EXT and More

What is a filesystem?

Filesystems abstract the physical view of storage devices from users, and virtualize storage area on a disk using the concept of files and directories.
Files serve as containers for user data and directories act as containers to a group of user files.
A file system specifies how files are laid out, organized, indexed, and how metadata is associated with them. (Each filesystem does this a little differently than others)

Blocks and sectors

See this page.

What is a ‘BLOCK’?

ext4 allocates storage space in units of “blocks”.
A block is a group of sectors between 1KiB and 64KiB, and the number of sectors must be an integral power of 2.
Blocks are in turn grouped into larger units called block groups.
Block size is specified at mkfs time and typically is 4KiB.
By default a filesystem can contain 2^32 blocks; if the ‘64bit’ feature is enabled, then a filesystem can have 2^64 blocks.

What is a ‘BLOCK’?

A block is a uniformly sized unit of data storage for a filesystem.
Block size can be an important consideration when setting up a system that is designed for maximum performance.
Valid block sizes are 1024, 2048, 4096
- If omitted, system will calculate

Bad Blocks ?

Cannot reliably be used for storing data
If the system sees that it cannot reliably read or write data to a particular block, it is marked as bad so that we don’t use it anymore.

Metadata

Pieces of information that your filesystem maintains in order to track your files and directories.
These are maintained in a structure called an inode (index node).
- size
- dev id
- UID, GID
- OTher stuff
- DOES NOT STORE FILENAME.

Files and Inodes

Files :

Are seen by the user as a group of related information.
transparently stored (you don’t know where all the blocks are)
Does not contain information about itself (ie. size, who created it, when created, etc…) (Unless someone adds that data)
- This information is referred to as metadata
When created are:
- given a name
- and a unique integer (inode number)
- name and inode are stored IN THE DIRECTORY

Files and Inodes

In essence then, a directory is nothing more than an name->inode map.

If I were to access a file called f1.txt, the operating system would look up the corresponding i-node (which could in-turn lookup related inodes that are part of the file)

Inode tables

The operating system maintains a table of inodes.

Can see parts of this table with various commands described subsequently.
A few disk blocks are reserved for storing this inode stuff. The remaining blocks are considered ‘data’ blocks for storing data.

Viewing inodes

df -i will show us how many inodes we have and how many available to us.
Can we have more files than inodes?
ls doesn’t access inode metadata, but ls -l does.
- ls -i will give us information about inode
stat f1.txt will give us metadata information.

More testing with inodes

Examine inodes when:
- we create a symbolic link?
- we create a hard link?
- hard link is an entry in a directory that points directly to an inode bearing the file’s metadata.
- multiple filenames point to same inode number
So, files can have multiple names.

Other inode implications

An inode may have no links
A files inode number stays the same when moved to another directory on the same device.
root generally has an inode of 2? (ls -lai /)
When a filesystem is damaged, the damaged parts appear in the lost+found directory within the partition in which they once existed.

One more piece of information

When you create a hard link, it just created a new name in the table, along with the inode, without moving the file. When you move a file (or rename it), you don’t copy the data. That would be Slow. You just create the (name,inode) entry in a new directory, and delete the old entry in the table inside the old directory entry. In other words, moving a gigabyte file takes very little time. In the same way, you can move/rename directories very easily. That’s why “mv /usr /Old_usr” is so fast, even though “/usr” may contain (for example) 57981 files.

But what if I move across partitions?

Inode Examples

Create a dir and cd into it
- What does output of ls -id . show?
- Cd back to parent directory and ls -id testdir, should show same inode number
Now do an ls -la on that directory. What is the field after the permissions? This is count of hard links to the file.
- We should see that there are (2) hard links, one for filename and one for ..

Inode Examples

Look at this output:

drwxrwxr-x  6 joe  joe        4096 Oct  6 10:15 it1100
drwxrwxr-x 41 joe  joe        4096 Oct 22 11:22 lab6

What does the 41 mean on line 2?
Create a dir (look at that column), create subdir (look), create sub,subdir(look)

Finding with inodes.

find / -inum 147649

Inodes viewing

Inodes, then, store all the metadata
View inode metadata with stat command (the os uses this too when file access is requested)
- stat bar.txt

Special Inode note

Space for inodes must be set aside when an operating system (or a new filesystem) is installed and that system does its initial structuring of the filesystem. Within any filesystem, the maximum number of inodes, and hence the maximum number of files, is set when the filesystem is created. (mkfs commands)
We can run out of space by using up all the inodes or using up all disk blocks.
How do we decide how many inodes we should format with?

Data block map

Each inode should record the locations of data blocks in which corresponding file data is stored

Data block map

See the page here

Directories

A directory is just a special file (hence the “everything in linux is just a file”)
A directory maintains the file name and the inode number to get to that file.

Unix filesystems

SO, Most of these filesystems have the following:
- superblock?
- Inodes - previously mentioned
- data blocks - hold the data

Superblock

Contains information about the filesystem as a whole such as:

filesystem size.
block size
empty and filled blocks
block count
size and location of inode tables
disk block map and usage information
size of block groups
state of the volume: consistent or dirty

Superblock Viewage

sudo dumpe2fs -h /dev/sda1
only applies to ext formatted partitions

Superblock Importance

All file requests must first access the filesystem’s superblock.
If superblock cannot be found, or is corrupted file access won’t work
Because of the importance of it, the superblock is stored in RAM as well as other locations within the filesystem.
- If you look at output of sudo dumpe2fs /dev/sda15 you can see where ‘Backup superblocks’ are stored.

Last word on Superblocks (for today)

Similar to the FAT (File Allocation Table) of Windows

Operations

Mount is an operation of enumerating an on-disk superblock and metadata into memory for the filesystem’s use.
Unmount:
- all metadata and caches are synchronized with disk blocks (finish writing everything)
- creates entry in that fs superblock of ‘consistent’. (If this entry isn’t written, the fs will mark as ‘dirty’)

The EXT filesystem tools

EXT4 is the most common linux filesystem in use.

mkfs -t ext4 or mkfs.ext4
- creates the superblock and inode table
- man mkfs.ext4
tune2fs
- tune different filesystem parameters

How to use mount command

mount /dev/sda3 /mnt
- partition is /dev/sda3
- it will show up in my tree as /mnt

tune2fs

Reserved block percentage: (-m flag)
- Set the percentage of the filesystem which may only be allocated by privileged pro‐ cesses. Reserving some number of filesystem blocks for use by privileged pro‐ cesses is done to avoid filesystem fragmentation, and to allow system daemons, such as syslogd(8), to continue to function correctly after non-privileged processes are prevented from writing to the filesystem. Normally, the default percentage of reserved blocks is 5%.

File system size terminology

MB vs MiB , GB vs GiB , etc…
MB = 1000*1024
See this

Computation Example

Suppose 100M partition 2048 block size
- mkfs.ext4 -b 2048 /dev/sdaX
- results in 51200 2k blocks and 25600 inodes
- Bytes? (100 * 1024 * 1024 = 104857600)
- Reserved? (5%) (104857600 * .05 = 5242880)
- Total blocks? (104857600 / 2048 = 51200)
- Verify with tune2fs -l /dev/sda7 | grep -i block
- Similarly compute reserved block count
- fdisk -l will show sector count, but if our default sector size is 512B, ^104857600⁄₅₁₂ = 204800.

df vs partition size

Why am I getting different results?

look at block count and block size
df -B 2048 /dev/sda7 gives 45490 (this is different than 51200 of previous slide)
I have 2560 (5% reserved)
51200 total blocks - 2560 reserved - 45490 df = 3150 blocks left
What about our inode space (my inode size is 128, inode count 25600 = 3276800 or 1600*2048 blocks). Remaining = 3150 - 1600 = 1550 blocks?
Where are those remaining 1550 blocks being used?
Journaling?? Other stuff?

What is a journal

here

A special file that can help repair inconsistencies in the filesystem.
information is written into journal and flushed to HDD before each command returns, if the system crashes things will be written or rolled up to maintain consistency.
faster than scanning the entire HDD (can reboot quicker)

Try this one more time

sudo dumpe2fs -h /dev/sda1