A single drive in my btrfs media library failed
I have a btrfs filesystem with 5 drives, with each drive being 2TB. Because the drives arrived at different times with massive delays between, I did not start the project with all the drives.
Because this was being used as a Jellyfin media library, I opted not to use the btrfs RAID-like features. This means I got all 10TB of space, but I also had no redundancy.
btrfs
provides a way to store data across many different physical drives, even
when they vary in size. Metadata is duplicated across all the drives, and data
is allocated to the drives in a way that “balances” the data across all the
drives in chunks, usually about 1GB each.
So, what happens when you lose a drive?
The filesystem will still work, but not really in a way that can be used. My original thinking was that it would lose some movies or TV shows, which kind of seems to be the case, but the system in general cannot meaingfuly access the drive filesystem.
Mounting the drive took a very long time. I simply ran:
mount -a
This took hours until it eventually came online. mount
itself did not report
any errors, but dmesg
was throwing IO related errors related to the failed
disk.
Attempting to list the file contents does display many IO related errors and
many files are missing or have strange metadata such as ?
in the permissions.
ls -lh /external-drive/somewhere
total 36K
-rw-rw-r-- 1 user user 4.0K Apr 7 17:02 README.md
?????????x 13 user user 4.0K Apr 8 15:13 baz
?????????x 7 user user 4.0K Feb 27 20:11 config
-rwxrwxr-x 1 user user 2.1K Apr 7 20:25 deploy.sh
-rw-rw-r-- 1 user user 5.2K Mar 21 19:31 docker-compose.yml
?????????x 7 user user 4.0K Mar 28 19:57 bar
?????????x 9 user user 4.0K Apr 8 15:13 frontend
-rw-rw-r-- 1 user user 747 Mar 2 23:29 foo.caddyfile
This does suck, but not too much. As mentioned, I did not keep anything “important” on that filesystem. All of the media can be acquired again, however, some becomes more difficult to acquire in the original form. As time goes on TV shows are re-released often with significant changes to the original content.
Without redundancy I was able to get the full 10TB of space and I could mix-and-match the size of the drives, but any drive failure destroyed the filesystem.
In theory I would use btrfs
in RAID5 or RAID6 mode, but it’s deemed
experimental and considered fundamentally broken by many. I would prefer not to
use RAID1 for a media library as I would not want to waste half of the space on
redundancy.
ZFS is a better solution for this, but I was attempting to use something that is built into the kernel and not a separate module. ZFS is not too difficult to install and configure, but it is a separate module and the kernel may have to be kept back to avoid breaking the ZFS module.
I may also consider repeating this experiment with a similar btrfs
setup that
does not have redundancy, but with the media library isolated on it’s own
btrfs
partition. Jellyfin (and other services) were configured to store some
other data on the same filesystem.
Contact me on BlueSky if you have any questions or need advice on how to lose your data!