Why do we want to analyze a file system?
To access overt content
To access deleted content
To access hidden content
What is the default file systems for many distributions of Linux?
The ExtX file system family.
Overview of main ExtX data structures.
File contents are stored inside blocks (e.g., 4KB)
The blocks allocated to a file are kept by a record called inode
Directory entries associate the file name with the file’s inode
Inode block pointers
Each inode is the root of an unbalanced tree of blocks that
belong to a given file.
How do we keep track of inode and block allocation?
Using bitmaps.
Using bitmaps: bit arrays, each bit indicates allocation status
- Inode bitmap: tells which inodes are allocated to files
- Block bitmap: tells which data blocks are allocated to files
What happens in the creation of a new file in EXT2?
Create a new file
- Allocate a new inode (inode bitmap is updated)
- Allocate required blocks (block bitmap is updated)
- Allocate entry in directory (entry points to inode)
- Update data blocks, inode, and directory entry
What happens in the deletion of a file in EXT2?
Delete an existing file
- Update the inode and block bitmaps, unallocate directory entry
- Most contents of inode, data blocks, and directory entry remain intact.
Deleting a file involves updating the record length of previous file entry in the directory.
No diretório, não é necessário apagar o nome fisicamente; em vez disso:
O rec_len da entrada anterior é atualizado para “engolir” a entrada do arquivo deletado.
Assim, a próxima leitura do diretório pula a entrada apagada porque agora a entrada anterior ocupa o espaço dela.
What are directories?
Directories are special files that contain file name entries.
No Ext2, um diretório não é um arquivo especial com estrutura própria separada.
Ele é simplesmente armazenado em um ou mais blocos do filesystem, assim como qualquer outro arquivo.
O conteúdo desses blocos consiste em uma sequência de entradas de diretório (directory entries).
What does a journal do?
Journal records transactions of FS operations.
A journal is a special area of the disk (or sometimes a file) that the file system uses to keep a record (“log”) of recent operations before they are fully written to the main disk structures.
Think of it as a diary of pending updates to the file system.
If the system crashes or loses power in the middle of those updates, the disk can become inconsistent - e.g., a file is marked in the directory but its inode was never updated.
The journal prevents that by letting the system replay or roll back incomplete changes after a crash.
When the system wants to modify something, it first writes the intended changes to the journal (a log).
Once the journal entry is safely written to disk, the actual data structures are updated.
When that’s done, the journal entry is marked as “committed.”
If the computer crashes during an update:
On reboot, the system checks the journal.
If it finds uncommitted changes, it replays them (finishes them) or discards them safely.
Why are journals good for forensic?
🔍 2️⃣ Journals as circular buffers
The journal isn’t infinite — it’s usually a fixed-size region on disk (say a few MBs).
It’s implemented as a circular buffer, meaning:
New transactions are appended at the “front”.
When it gets full, the system wraps around and overwrites the oldest entries.
So the journal constantly cycles through old and new file system updates.
🔁 3️⃣ Append-only behavior
Journals are append-only — you only add new records.
When updates (transactions) are no longer needed for crash recovery, the OS simply moves a pointer that marks them as “freed”.
Nothing is physically deleted.
The old data remains on disk until it’s overwritten by new transactions.
🕵️♀️ 4️⃣ Why this is useful for forensics
Because old journal entries linger on disk (sometimes for minutes or hours before being overwritten), forensic analysts can:
Recover traces of recent file activity, even after files were deleted or modified.
See metadata about files that once existed — their names, inode numbers, timestamps, etc.
Potentially reconstruct who did what recently, even if the current filesystem no longer shows it.
In which node does Ext3 typically maintain a journal?
In inode 8.
On-disk organization of a ExtX file system.
Organized as sequence of logical blocks.
Blocks are grouped into larger units called block groups.
The first data block aka boot block is not used by the FS.
What is the superblock?
Group block internals: The superblock
Superblock: contains fundamental info about the file system
E.g., block size, total number of blocks, # of blocks per group…
- The size of the superblock is 1024 bytes
The superblock is replicated in all group blocks
- 1st superblock is 1024 bytes past the beginning of the FS
- Copies of the superblock are in the first block of each block group
What is the group descriptor table?
Group block internals: Group descriptor table
Group descriptor table: array of descriptors for all block groups
Every block group descriptor table contains all info about all block groups
Provides the location of the inode bitmap and inode table, block bitmap, number of free blocks and inodes, etc.
The size of the descriptor table depends on how many groups are defined
Group block internals: Block bitmap & Inode bitmap
Block bitmap: monitors the state of each data block
Inode bitmap: monitors the state of each inode
Group block internals: Inode table & data blocks
Inode table: contains table of inode data structures.
Each inode contain the information about a single physical file on the system
Data blocks: contain chunks of data that belong to files or directories
What is TSK?
TSK stands for The Sleuth Kit — it’s one of the most widely used digital forensics toolkits in the world.
TSK is an open-source collection of command-line forensic tools that let investigators:
Analyze disk images (like .dd, .img, .E01, etc.)
Examine file systems (FAT, NTFS, Ext2/3/4, etc.)
Recover deleted files
Inspect metadata, inodes, partitions, and journals
It was created by Brian Carrier, and it’s the backend used by Autopsy, the popular graphical forensic interface.
In ExtX, evidence in this content category includes …?
In ExtX, evidence in this category includes: blocks and block bitmap.
What does the command dcat and where is it from?
dcat: Dump the content of block 1 from a Ext3 FS image.
It’s from TSK.
What is Data Unit viewing?
In digital forensics, data unit viewing refers to the ability to examine the raw contents of a storage device at the smallest addressable unit level — such as sectors, clusters, or blocks — without any interpretation by the file system.
So, instead of seeing “files” or “folders”, you’re looking directly at the binary (or hexadecimal) data that lives in those low-level units on disk.
⚙️ “Data unit” = smallest chunk of readable data
Depending on the context:
On a hard disk → a sector (usually 512 bytes or 4096 bytes)
On a file system → a block or cluster (can be 1 KB, 4 KB, etc.)
On an SSD or flash → a page
So, a “data unit” is basically the smallest chunk the system can read or write at once.
Logical file system-level searching.
We know what to look for, but not the place: a logical file
system search looks in each DU for specific values
- E.g., search for “forensics” or a specific file header value
In logical file system-level searching, if the value is located in two non-consecutive DUs of a fragmented file, will search find it?
Nop.
Unallocated data unit searching.
If we do not know the location of evidence, but we know that it
is unallocated, we can focus our attention there.
Some tools can extract all unallocated DUs to a separate file;
others restrict analysis to only the unallocated areas.
The definition of unallocated space may vary: you need to
know what your analysis tool considers unallocated data.
Consistency checking.
Consistency checks allow us to determine if the file system is in a suspicious state.
Example: orphans and double metadata entries
1️⃣ Orphaned inodes (orphans)
An orphan is an inode that’s marked as allocated (in use),
but no directory entry points to it.
In other words, the data still exists on disk, but the file has no name anymore.
(This often happens when a file is deleted but not fully cleaned up, or after a crash.)
2️⃣ Double metadata entries
A data block or inode is referenced by more than one file, even though the file system doesn’t allow that (except for hard links).
This means two files claim to own the same physical blocks — a serious inconsistency.
Another check examines each of the data units that are listed as damaged filled with non-zero data
🔍 2️⃣ “Listed as damaged”
➡️ Alguns blocos são marcados no bitmap de blocos (ou numa lista separada) como danificados (bad blocks) — ou seja, o sistema operativo considera que não devem ser usados, normalmente por erros físicos no disco.
🔍 3️⃣ “Filled with non-zero data”
➡️ Este é o ponto interessante:
Mesmo que um bloco esteja marcado como “danificado”, pode conter dados válidos.
Um consistency check pode ler esses blocos para ver se:
Eles estão totalmente a zero (como seria esperado — bloqueios realmente inativos ou vazios), ou
Se têm dados não nulos (“non-zero data”), o que pode indicar que:
O bloco foi indevidamente marcado como danificado; ou
Alguém escondeu dados ali de propósito (por exemplo, data hiding em áreas “bad blocks”); ou
O disco tem corrupção e o sistema não limpou corretamente.