We are still actively working on the spam issue.

Home Server/RAID

From InstallGentoo Wiki
Jump to: navigation, search
Warning: RAID is NOT a backup. Not even RAID 1. RAID doesn't protect against accidental file deletion or the complete death of an array. See the Backups page for more on backups.

Companion page to Home_server#RAID

There are several reasons why we don't just connect a bunch of disks and call it good:

Data Protection: RAID provides redundancy, which means that if one disk fails, the data is still available on another disk. This is crucial for preventing data loss.

Performance: RAID can significantly improve disk speed by spreading data across multiple disks. This allows for multiple disk reads/writes to occur simultaneously, increasing overall system performance.

Scalability: RAID systems can be easily expanded with additional disks, providing a scalable solution for growing storage needs.

In contrast, if you simply connect a bunch of disks separately (disk A for pictures, disk B for videos), you lose these benefits. If one disk fails, you lose all data on that disk. You also can't take advantage of the speed benefits of simultaneous reads/writes across multiple disks.


RAID Levels

RAID is a method of storing the same data in different places on multiple hard disks to protect data in the case of a drive failure.

There are several different RAID levels, each with its own method of data distribution across the drives. The most common are RAID 0, RAID 1, RAID 5, RAID6, and RAID 10.

RAID 1 is known as mirroring. It duplicates the same data on two or more disks. This means if you have two 1TB drives in a RAID 1 setup, you only get 1TB of storage, not 2TB. This is because the same information is written to both drives for redundancy. So, the storage efficiency is 50%.

RAID 5 uses striping with parity. It requires at least three disks, but it provides a balance of good performance, good fault tolerance, and high capacity and storage efficiency. In RAID 5, data and parity (which is a form of error checking) are striped across three or more drives. If a single drive fails, the original data can be reconstructed from the remaining data and the parity data. However, some space is used for this parity information, so you don't get 100% of the total storage. RAID 5 is no longer recommended for critical data storage.

RAID 6 is an extension of RAID 5, and it provides more redundancy than RAID 5 by using an additional parity block. This means that RAID 6 can withstand the failure of two drives in the array without any data loss, while RAID 5 can only withstand the failure of one drive. RAID 6 requires a minimum of four drives and the storage efficiency is lower because more space is used for parity data. Also, the performance of RAID 6 can be slower than RAID 5, especially for write operations, because the system has to calculate and write two sets of parity data. If this is a concern, use RAID 10.

RAID 10 is a combination of RAID 1 and RAID 0, and it requires a minimum of four disks. It provides the redundancy of RAID 1 along with the increased performance of RAID 0. You typically get only 50% of the total storage in a RAID 10 array, but the trade-off is increased data protection and increased performance. RAID 10 is the go-to choice for professional applications.

Don't get greedy and prioritize storage efficiency over redundancy and reliability. RAID systems are designed to protect data from hardware failure, but if you attempt to maximize storage at the cost of redundancy, you risk losing your data. Sacrificing redundancy for more storage space can lead to catastrophic data loss.

Drives are cheap, your data is priceless.

Resilvering

Resilvering a RAID array is the process of rebuilding data in a RAID array after one of the drives has been replaced. This process involves copying all the data from the remaining disks onto the new disk to restore the RAID array to its full redundancy.

If another disk fails during the resilvering process, there is a risk of data loss. This is because the RAID array is in a vulnerable state during the resilvering process, as it is operating with reduced redundancy. Resilvering is a resource-intensive process.

Two-disk redundancy is ideal because it provides an extra layer of protection against drive failure during resilvering.

Software RAID vs. Hardware RAID

Software RAID: implemented at the operating system level. It doesn't require any special hardware to function. The operating system manages the RAID system, and the CPU of the computer handles the processing required for the RAID. Software RAID is usually cheaper as it doesn't require any additional hardware. It also offers more flexibility as it can be easily configured and modified through the operating system.

Hardware RAID: managed by a dedicated hardware controller. It has its own processor and memory to manage the RAID independently of the host system.

The industry has shifted away from hardware RAID and embraced the benefits that software RAID provides.

Software RAID File Systems

ZFS

ZFS (Zettabyte File System) is an advanced file system originally designed by Sun Microsystems. It provides volume management, data integrity protection, and ability to be used in various types of storage devices. ZFS includes features like storage pooling, snapshots, and dynamic disk striping, and it can automatically repair data corruption. It is one of the most popular software RAID types.

If you're interested in running ZFS, check out the dedicated ZFS page.

mdadm

mdadm (multiple devices admin) is a basic utility for managing software RAID devices in Linux. It supports RAID levels 0, 1, 4, 5, 6, and 10, as well as combinations of these levels.

mdadm does not provide a file system on its own. It is a tool for creating and managing RAID arrays, which are essentially just blocks of raw storage. Once you've created a RAID array with mdadm, you still need to create a file system on it before you can start storing files.

It is generally recommended to use disks of the same size when creating a RAID array with mdadm. This is because the size of the smallest disk in the array will determine the maximum amount of usable space on each disk.

The following file systems are suitable for use with mdadm:

LVM

Logical Volume Management (LVM) is a method of allocating space on mass-storage devices that is more flexible than conventional partitioning methods. LVM can manage partitions, create and restore snapshots, and perform thin provisioning and dynamic resizing. LVM isn't strictly a "file system", but rather a "volume management tool" that you use to create partitions with their own file systems.

ext4

Default file system for most Linux distros. Simple and does everything a good file system should do.

XFS

Another reliable basic file system.

mergerFS

Warning: mergerFS provides ZERO redundancy!

MergerFS is a union filesystem, which means it provides a way to combine multiple directories (from different hard drives or storage devices) into one single directory. It is often used in systems with multiple hard drives to create a single, unified directory structure.

It is not RAID and is included in this list to dispel any false notions.

MergerFS makes all files and directories from multiple sources appear as if they all reside in a single directory. Imagine you have /dev/sda1 mounted at /mnt/a and /dev/sdb1 mounted at /mnt/b. mergerFS would unify these two file systems into /mnt/merger.

Warning: Countless beginners have lost important data because they assumed that a drive failure would never happen to them!

Forums are full of posts asking for help with a broken mergerFS setup. You have been warned.

Btrfs

Btrfs is similar to ZFS and supports advanced features like snapshots, subvolumes, checksumming, compression, and integrated RAID, and so on. The main attraction of Btrfs is that it can mix different sized drives.

Btrfs is still under development and not completely stable, so use at your own risk.

SnapRAID

SnapRAID is a backup program for disk arrays. SnapRAID is often compared to RAID, but there are key differences that make SnapRAID more of a backup program than a traditional RAID system. It is mainly targeted for a home media center, where you have a lot of big files that rarely change.

The "snap" in SnapRAID comes from the way it stores data using snapshots. SnapRAID is not a real-time solution. It only creates snapshots of your data when you manually run it or schedule it. This means that any data written after the last snapshot will be lost in the event of a disk failure.

For a production system where data is constantly changing, a real-time solution like traditional RAID is nessesary. SnapRAID would be a good choice for storing large amounts of data that are read-only or infrequently changed.

SnapRAID manages redundancy through one or more dedicated disks to store parity information. This parity information is calculated from the data stored on the other disks in the array.

Snapraid supports disks of different sizes. However, the size of the parity disk must be equal to or larger than the largest data disk.

SnapRAID presents its storage to the OS as individual disks. It does not create a "virtual" or "pooled" drive. Each disk in the array remains independent, retaining its own filesystem. This means you can access each disk directly and individually. mergerFS is a suitable choice to unify the numerous individual disks into one cohereant file system.