Composefs achieves whole-filesystem integrity verification through image sealing: a single cryptographic digest authenticates an entire filesystem, covering both file contents and metadata (directory structure, permissions, ownership, symlinks, and xattrs).

Background: the problem composefs solves

fs-verity can verify the contents of individual files, but it cannot verify filesystem metadata: directory structure, permissions, ownership, symlink targets, or xattrs. dm-verity can verify entire block devices, but requires a fixed partition layout and prevents content deduplication across images.

Composefs bridges this gap by combining three Linux kernel subsystems:

  • EROFS stores the filesystem metadata (directory tree, inodes, xattrs) as a compact read-only image.
  • overlayfs composes the EROFS metadata with a content-addressed object store that holds actual file data.
  • fs-verity provides per-file integrity verification on both the EROFS image itself and every content file.

The result is dm-verity-like whole-filesystem integrity with the flexibility of a content-addressed store (deduplication, sharing objects across images and stores).

The mount chain

At runtime a composefs mount is a three-layer stack:

1
2
3
4
5
6
7
8
9
User-visible filesystem (overlayfs)
    |
    |-- Lower layer (metadata): EROFS mount of the .cfs image
    |     Contains: directory structure, permissions, xattrs,
    |     trusted.overlay.redirect and trusted.overlay.metacopy
    |
    '-- Data-only lower layer (content): objects/ directory
          Contains: content-addressed files named by fs-verity digest
          Each file has fs-verity enabled

The EROFS image provides the directory tree. Every regular file in the EROFS image is a metacopy entry. It carries metadata (mode, owner, timestamps) and two overlayfs xattrs:

  • trusted.overlay.redirect: the path to the backing file in the objects directory (e.g., /ca/fe7b3a...)
  • trusted.overlay.metacopy: marks the entry as metadata-only and contains the expected fs-verity digest of the backing file

The objects directory is mounted as an overlayfs data-only lower layer (the :: double-colon separator in lowerdir, available since kernel 6.5). Data-only layers are invisible in directory listings; only the EROFS layer’s directory structure appears in the merged view.

How overlayfs works

overlayfs is a union filesystem that merges multiple directory trees into a single view. In a typical container setup, overlayfs stacks a writable upper layer on top of one or more read-only lower layers. When a process reads a file, overlayfs looks in the upper layer first, then falls through to lower layers.

Composefs uses overlayfs for the operating system in a specialized configuration with no upper layer: the mount is entirely read-only. The EROFS image serves as the metadata lower layer and the objects directory serves as a data-only lower layer.

overlayfs mount options

The composefs mount configures overlayfs with:

Option Value Purpose
metacopy on EROFS provides metadata only; data comes from the data layer
redirect_dir on Enables trusted.overlay.redirect xattrs for path indirection
verity require Every metacopy file must have a fs-verity digest that matches the backing file (kernel 6.6+)

metacopy=on

Normally overlayfs expects each file in a lower layer to contain actual data. With metacopy=on, a file can be metadata-only: it carries permissions, ownership, timestamps, and xattrs, but has zero data bytes. The actual data comes from a different layer.

In composefs, every regular file in the EROFS image is a metacopy file. The EROFS image stores the file’s metadata and a trusted.overlay.metacopy xattr. This xattr has two roles: it marks the file as metadata-only, and it contains the expected fs-verity digest of the backing file (used when verity=on or verity=require).

redirect_dir=on

Normally overlayfs resolves files by matching paths between layers. A file at /usr/bin/ls in the metadata layer would look for /usr/bin/ls in the data layer. With redirect_dir=on, overlayfs reads a trusted.overlay.redirect xattr to find the actual path of the backing file in the data layer.

In composefs, the redirect points to the content-addressed path. For example, a file at /usr/bin/ls in the EROFS image might have:

1
trusted.overlay.redirect = 5f/a77873a21bd49d3c7e78bc6a350e959...

This tells overlayfs to fetch the data from objects/5f/a77873a21bd49d3c7e78bc6a350e959... instead of looking for usr/bin/ls in the objects directory. This indirection is what makes content addressing possible: the same object file can back multiple filesystem paths (deduplication), and files with identical content across images share a single copy on disk.

verity=require

This option controls how overlayfs handles fs-verity digests stored in the trusted.overlay.metacopy xattr. The three possible values are:

  • verity=off (default): No digest checking. Metacopy xattrs are used only for the metadata-only marker; any embedded digest is ignored.
  • verity=on: If a metacopy xattr contains a digest, verify it against the backing file’s actual fs-verity measurement. Files without digests in their xattr are allowed through.
  • verity=require: All metacopy files must have a digest in their xattr, and the digest must match the backing file’s fs-verity measurement. Any mismatch or missing digest produces EIO.

Composefs uses verity=require in secure mode. This ensures that every file access is cryptographically verified: the EROFS image (trusted via its own fs-verity) specifies what digest each file must have, and the kernel enforces it.

Data-only lower layers

Traditionally, overlayfs lower layers contribute both directory listings and file content to the merged view. Since kernel 6.5, overlayfs supports data-only lower layers (specified with :: in the lowerdir option or via datadir with fsconfig_set_fd). A data-only layer:

  • Is invisible in directory listings; its filenames never appear in the merged view.
  • Only provides file content when referenced by a redirect xattr from a metadata layer above it.
  • Cannot have regular lower layers below it.

In composefs, the objects directory is a data-only layer. Its flat hash-based structure (00/, 01/, …, ff/) is never visible to users. Only the EROFS metadata layer’s directory tree (with familiar paths like /usr/bin/ls) appears in the final mount.

With verity=require, opening any file whose backing object is missing fs-verity or whose digest does not match the xattr produces EIO.

The chain of trust

Sealing creates a trust chain from a single digest to every byte in the filesystem:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Trusted root (kernel cmdline, UKI, or mount option)
  |
  |  contains: composefs=<EROFS-image-fsverity-digest>
  v
EROFS image file
  |  fs-verity digest matches the trusted root
  |  contains: full directory tree + per-file verity digests
  |            in trusted.overlay.metacopy xattrs
  v
Individual content files in objects/
     fs-verity digest of each file matches the digest
     recorded in the EROFS metadata
     |
     v
  Per-page verification by the kernel's fs-verity subsystem
     Every 4K block is verified against the Merkle tree
     on each page-in

Concretely:

  1. A single fs-verity digest authenticates the EROFS image. Because fs-verity covers the entire file content and commits to the file size, any modification to the EROFS binary changes the digest.

  2. The EROFS image contains the complete filesystem metadata including a fs-verity digest for every regular file (in the trusted.overlay.metacopy xattr). Because the EROFS image is authenticated, these per-file digests are also trusted.

  3. When a file is opened, overlayfs reads the expected digest from the EROFS xattr, calls FS_IOC_MEASURE_VERITY on the backing object file, and compares. A mismatch produces EIO.

  4. On every subsequent page read, the kernel’s fs-verity subsystem re-verifies the data block against the file’s Merkle tree.

This means that neither metadata nor data can be tampered with without producing I/O errors at runtime.

fs-verity internals

Merkle tree

fs-verity divides a file into blocks (typically 4096 bytes) and builds a Merkle tree:

  • Level 0: hash of each data block (zero-padded for the last block).
  • Level 1: hash of each group of level-0 hashes that fits in one block.
  • Continues until a single root hash remains.

For SHA-256 with 4K blocks, each block holds 128 hashes (4096 / 32), so the tree overhead converges to ~1/127 of the file size.

File digest

The fs-verity file digest is not just the Merkle root hash. It is the hash of a fsverity_descriptor structure that commits to:

  • Version (1)
  • Hash algorithm (SHA-256 = 1, SHA-512 = 2)
  • log2(block_size) (12 for 4096)
  • Original file size
  • Merkle tree root hash

This prevents substitution attacks where an attacker replaces a file with a different one that happens to have the same root hash under different parameters.

Ioctls

FS_IOC_ENABLE_VERITY (_IOW('f', 133, ...)): Enables fs-verity on a file. The file must be opened read-only with no other writable descriptors. The kernel builds the full Merkle tree and marks the file as permanently read-only for data. This can be slow on large files.

FS_IOC_MEASURE_VERITY (_IOWR('f', 134, ...)): Returns the pre-computed fs-verity digest. Executes in constant time regardless of file size.

Repository structure

A composefs repository stores all data content-addressed by fs-verity digest:

1
2
3
4
5
6
7
8
<repo>/
  meta.json              <- repository metadata (algorithm, format version)
  objects/
    <xx>/<remaining>     <- content files, named by fs-verity digest
  images/
    <hex-digest>         <- symlink to ../objects/... (EROFS images)
    refs/
      <name>             <- human-readable image refs

fs-verity is enabled on meta.json itself to signal that the repository operates in secure mode. In secure mode, every object written to objects/ has fs-verity enabled via FS_IOC_ENABLE_VERITY before being linked into place.

EROFS image creation

When a container image is imported into a composefs repository:

  1. Each tar layer is converted to a splitstream, a binary format that stores tar headers and small files inline (zstd-compressed) while referencing large file content as external objects in the object store.

  2. All layers are replayed in order to build a complete filesystem tree, applying whiteouts (file deletions and opaque directory markers).

  3. The tree is serialized to an EROFS binary image. The EROFS image contains overlayfs xattrs (trusted.overlay.redirect and trusted.overlay.metacopy with the fs-verity digest) on every regular file inode.

  4. The EROFS image is stored as an object in the repository. This enables fs-verity on the image file itself.

The fs-verity digest of the EROFS image is the seal digest, the single value that authenticates the entire filesystem.

Boot image variant

For bootable images, the filesystem is transformed before EROFS generation:

  1. /boot is emptied (to break the circular dependency: the UKI contains the composefs digest, so the digest must be computed without the UKI present).
  2. /sysroot is emptied.
  3. SELinux labels are applied from the image’s policy.
  4. The tree is compacted (orphaned empty directories are removed).

Runtime verification

At mount time, the composefs repository:

  1. Opens the EROFS image file by its hex digest name.
  2. Calls FS_IOC_MEASURE_VERITY on the file and compares the measured digest to the expected one.
  3. If the digests match, assembles the overlayfs mount with verity=require so that every file access is verified.
  4. In insecure mode (the ? prefix on the digest), skips the verity check and mounts without verity=require.

The mount is assembled using the new mount API (fsopen, fsconfig, fsmount, move_mount) and looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
1. Mount EROFS image read-only from the image file descriptor

2. Configure overlayfs:
   - source = "composefs:<name>"
   - metacopy = on
   - redirect_dir = on
   - verity = require        (only in secure mode)
   - lower layer = EROFS mount
   - data layer = objects/ directory

3. Attach the composed mount to the filesystem tree

The boot flow

At boot time, the composefs root filesystem is mounted in the initramfs before the system transitions to it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Bootloader
  |  Loads UKI or kernel+initramfs
  |  Kernel cmdline contains: composefs=<hex-digest>
  v
Initramfs
  |  composefs-setup-root reads /proc/cmdline
  |  Opens repository at /sysroot/composefs
  |  Mounts EROFS image by digest with verity verification
  |  Sets up /etc overlay and /var bind mount
  |  Moves composed root to /sysroot
  v
systemd switches root to /sysroot

Step by step

  1. Read config: Parse /usr/lib/composefs/setup-root-conf.toml for mount configuration (etc, var, root mount types).

  2. Parse kernel cmdline: Read /proc/cmdline and extract the composefs= parameter. The value is a hex digest, optionally prefixed with ? for insecure mode:

    • composefs=<64-hex-chars>: SHA-256, strict mode
    • composefs=<128-hex-chars>: SHA-512, strict mode
    • composefs=?<hex>: insecure mode (no fs-verity enforcement)
  3. Open repository: Open the composefs repository at /sysroot/composefs. In strict mode, verify that meta.json has fs-verity enabled.

  4. Mount image: Open the EROFS image by digest, verify its fs-verity digest matches, and assemble the EROFS + overlayfs mount with verity=require.

  5. Set up state mounts: Create overlay or bind mounts for /etc and /var using the deployment state directory.

  6. Replace sysroot: Move the composed filesystem to /sysroot for systemd to pivot into.

Integration with bootc

bootc is the primary consumer of composefs for bootable container images.

Image build workflow

bootc uses a multi-stage build process to create sealed images:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Stage 1: Build base image (everything except /boot)
  |
  v
Stage 2: Compute composefs digest
  |  Pull base image into composefs repo
  |  Flatten layers into a filesystem tree
  |  Transform for boot (empty /boot, /sysroot, apply SELinux)
  |  Generate EROFS binary
  |  Compute fs-verity digest = seal digest
  |
  v
Stage 3: Build final image with UKI
  |  Write composefs=<digest> to /etc/kernel/cmdline
  |  kernel-install generates UKI with embedded cmdline
  |  Set label containers.composefs.fsverity=<digest>
  |
  v
Signed, sealed OCI image

The circular dependency is broken by emptying /boot before computing the digest. The UKI (which lives in /boot) embeds the digest, but the digest is computed without the UKI present.

The containers.composefs.fsverity label

This OCI config label records the expected fs-verity digest of the sealed EROFS image. It is set at image build time and can be verified against the actual EROFS image in the repository.

The label is placed on the config (not the manifest) because the config represents container identity. The format is a hex digest string.