We are still actively working on the spam issue.

Home Server/RAID

From InstallGentoo Wiki
Revision as of 23:19, 19 February 2024 by Cyberes (talk | contribs)
Jump to: navigation, search
Warning: RAID is NOT a backup. Not even RAID 1. RAID doesn't protect against accidental file deletion or the complete death of an array. See the Home_Server/Backups Page for more on backups

Companion page to Home_server#RAID

There are several reasons why we don't just connect a bunch of disks and call it good:

Data Protection: RAID provides redundancy, which means that if one disk fails, the data is still available on another disk. This is crucial for preventing data loss.

Performance: RAID can significantly improve disk speed by spreading data across multiple disks. This allows for multiple disk reads/writes to occur simultaneously, increasing overall system performance.

Scalability: RAID systems can be easily expanded with additional disks, providing a scalable solution for growing storage needs.

In contrast, if you simply connect a bunch of disks separately (disk A for pictures, disk B for videos), you lose these benefits. If one disk fails, you lose all data on that disk. You also can't take advantage of the speed benefits of simultaneous reads/writes across multiple disks.


RAID Levels

RAID is a method of storing the same data in different places on multiple hard disks to protect data in the case of a drive failure.

There are several different RAID levels, each with its own method of data distribution across the drives. The most common are RAID 0, RAID 1, RAID 5, RAID6, and RAID 10.

RAID 1 is known as mirroring. It duplicates the same data on two or more disks. This means if you have two 1TB drives in a RAID 1 setup, you only get 1TB of storage, not 2TB. This is because the same information is written to both drives for redundancy. So, the storage efficiency is 50%.

RAID 5 uses striping with parity. It requires at least three disks, but it provides a balance of good performance, good fault tolerance, and high capacity and storage efficiency. In RAID 5, data and parity (which is a form of error checking) are striped across three or more drives. If a single drive fails, the original data can be reconstructed from the remaining data and the parity data. However, some space is used for this parity information, so you don't get 100% of the total storage. RAID 5 is no longer recommended for critical data storage.

RAID 6 is an extension of RAID 5, and it provides more redundancy than RAID 5 by using an additional parity block. This means that RAID 6 can withstand the failure of two drives in the array without any data loss, while RAID 5 can only withstand the failure of one drive. RAID 6 requires a minimum of four drives and the storage efficiency is lower because more space is used for parity data. Also, the performance of RAID 6 can be slower than RAID 5, especially for write operations, because the system has to calculate and write two sets of parity data. If this is a concern, use RAID 10.

RAID 10 is a combination of RAID 1 and RAID 0, and it requires a minimum of four disks. It provides the redundancy of RAID 1 along with the increased performance of RAID 0. You typically get only 50% of the total storage in a RAID 10 array, but the trade-off is increased data protection and increased performance. RAID 10 is the go-to choice for professional applications.

Don't get greedy and prioritize storage efficiency over redundancy and reliability. RAID systems are designed to protect data from hardware failure, but if you attempt to maximize storage at the cost of redundancy, you risk losing your data. Sacrificing redundancy for more storage space can lead to catastrophic data loss.

Resilvering

Resilvering a RAID array is the process of rebuilding data in a RAID array after one of the drives has been replaced. This process involves copying all the data from the remaining disks onto the new disk to restore the RAID array to its full redundancy.

If another disk fails during the resilvering process, there is a risk of data loss. This is because the RAID array is in a vulnerable state during the resilvering process, as it is operating with reduced redundancy. Resilvering is a resource-intensive process.

Two-disk redundancy is ideal because it provides an extra layer of protection against drive failure during resilvering.

Software RAID vs. Hardware RAID

Software RAID: implemented at the operating system level. It doesn't require any special hardware to function. The operating system manages the RAID system, and the CPU of the computer handles the processing required for the RAID. Software RAID is usually cheaper as it doesn't require any additional hardware. It also offers more flexibility as it can be easily configured and modified through the operating system.

Hardware RAID: managed by a dedicated hardware controller. It has its own processor and memory to manage the RAID independently of the host system.

The industry has shifted away from hardware RAID and embraced the benefits that software RAID provide.

Software RAID File Systems

ZFS

ZFS (Zettabyte File System) is an advanced file system originally designed by Sun Microsystems. It provides volume management, data integrity protection, and ability to be used in various types of storage devices. ZFS includes features like storage pooling, snapshots, and dynamic disk striping, and it can automatically repair data corruption. It is one of the most popular software RAID types. If you're interested in running ZFS, check out the dedicated ZFS page.

mdadm

mdadm (multiple devices admin) is a basic utility for managing software RAID devices in Linux. It supports RAID levels 0, 1, 4, 5, 6, and 10, as well as combinations of these levels.

mdadm does not provide a file system on its own. It is a tool for creating and managing RAID arrays, which are essentially just blocks of raw storage. Once you've created a RAID array with mdadm, you still need to create a file system on it before you can start storing files.

It is generally recommended to use disks of the same size when creating a RAID array with mdadm. This is because the size of the smallest disk in the array will determine the maximum amount of usable space on each disk.

The following file systems are suitable for use with mdadm:

LVM

Logical Volume Management (LVM) is a method of allocating space on mass-storage devices that is more flexible than conventional partitioning methods.

  • Create, resize, and delete partitions
  • Create and restore snapshots
  • Thin provisioning and dynamic resizing

ext4

Default file system for most Linux distros. Simple and does everything a good file system should do.

mergerFS

Warning: mergerFS provides ZERO redundancy!

MergerFS is a union filesystem, which means it provides a way to combine multiple directories (from different hard drives or storage devices) into one single directory. It is often used in systems with multiple hard drives to create a single, unified directory structure.

MergerFS makes all files and directories from multiple sources appear as if they all reside in a single directory.

Warning: Countless beginners have lost important data because they assumed that a drive failure would never happen to them!

mergerFS is not recommended as it is commonly misconfigured, leading to data loss when a drive fails.