Gindlesperger

A single drive in my btrfs media library failed

I have a btrfs filesystem with 5 drives, with each drive being 2TB. Because the drives arrived at different times with massive delays between, I did not start the project with all the drives.

Because this was being used as a Jellyfin media library, I opted not to use the btrfs RAID-like features. This means I got all 10TB of space, but I also had no redundancy.

btrfs provides a way to store data across many different physical drives, even when they vary in size. Metadata is duplicated across all the drives, and data is allocated to the drives in a way that “balances” the data across all the drives in chunks, usually about 1GB each.

So, what happens when you lose a drive?

The filesystem will still work, but not really in a way that can be used. My original thinking was that it would lose some movies or TV shows, which kind of seems to be the case, but the system in general cannot meaingfuly access the drive filesystem.

Mounting the drive took a very long time. I simply ran:

mount -a

This took hours until it eventually came online. mount itself did not report any errors, but dmesg was throwing IO related errors related to the failed disk.

Attempting to list the file contents does display many IO related errors and many files are missing or have strange metadata such as ? in the permissions.

ls -lh /external-drive/somewhere
total 36K
-rw-rw-r--  1 user user 4.0K Apr  7 17:02 README.md
?????????x 13 user user 4.0K Apr  8 15:13 baz
?????????x  7 user user 4.0K Feb 27 20:11 config
-rwxrwxr-x  1 user user 2.1K Apr  7 20:25 deploy.sh
-rw-rw-r--  1 user user 5.2K Mar 21 19:31 docker-compose.yml
?????????x  7 user user 4.0K Mar 28 19:57 bar
?????????x  9 user user 4.0K Apr  8 15:13 frontend
-rw-rw-r--  1 user user  747 Mar  2 23:29 foo.caddyfile

This does suck, but not too much. As mentioned, I did not keep anything “important” on that filesystem. All of the media can be acquired again, however, some becomes more difficult to acquire in the original form. As time goes on TV shows are re-released often with significant changes to the original content.

Without redundancy I was able to get the full 10TB of space and I could mix-and-match the size of the drives, but any drive failure destroyed the filesystem.

In theory I would use btrfs in RAID5 or RAID6 mode, but it’s deemed experimental and considered fundamentally broken by many. I would prefer not to use RAID1 for a media library as I would not want to waste half of the space on redundancy.

ZFS is a better solution for this, but I was attempting to use something that is built into the kernel and not a separate module. ZFS is not too difficult to install and configure, but it is a separate module and the kernel may have to be kept back to avoid breaking the ZFS module.

I may also consider repeating this experiment with a similar btrfs setup that does not have redundancy, but with the media library isolated on it’s own btrfs partition. Jellyfin (and other services) were configured to store some other data on the same filesystem.

Contact me on BlueSky if you have any questions or need advice on how to lose your data!