Mdadm
Encyclopedia
mdadm is a Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 utility used to manage software RAID devices.

The name is derived from the md (multiple device) device nodes it administers or manages, and it replaced a previous utility mdctl. The original name was "Mirror Disk", but was changed as the functionality increased.

It is free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 licensed under version 2 or later of the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

 - maintained and copyrighted to Neil Brown of Suse.

Types of physical device

mdadm can handle anything which presents to the kernel
Kernel
-Computer science:* Kernel , the central component of most operating systems** The Linux kernel, from GNU/Linux operating systems** The Windows 9x kernel, used in Windows 95, 98 and ME...

 as a block device. This can encompass whole disks (/dev/sda), partitions
Disk partitioning
Disk partitioning is the act of dividing a hard disk drive into multiple logical storage units referred to as partitions, to treat one physical disk drive as if it were multiple disks. Partitions are also termed "slices" for operating systems based on BSD, Solaris or GNU Hurd...

 (/dev/sda1) and USB flash drives.

RAID Configurations

  • RAID 0 - Block level striping
    Data striping
    In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, in a way that accesses of sequential segments are made to different physical storage devices. Striping is useful when a processing device requests access to data more quickly than a...

    . MD can handle devices of different lengths, the extra space on the larger device is then not striped.
  • RAID 1 - Mirror.
  • RAID 4 - Like RAID 0, but with an extra device for parity
    Parity bit
    A parity bit is a bit that is added to ensure that the number of bits with the value one in a set of bits is even or odd. Parity bits are used as the simplest form of error detecting code....

    .
  • RAID 5 - Like RAID 4, but with the parity distributed across all devices.
  • RAID 6 - Like RAID 5, but with two parity segments per stripe.
  • RAID 10 - Take a number of RAID 1 mirrorsets and stripe across them RAID 0 style.

Non-RAID Configurations

  • LINEAR - Concatenate a number of devices into a single large MD device.
  • MULTIPATH - Provide multiple paths with failover to a single device.
  • FAULTY - A single device which emulates a number of disk fault scenarios for testing and development.
  • CONTAINER - A group of devices managed as one, in which RAID systems can be built.

Types of MD device

The original (standard) form was /dev/mdn where n is a number between 0 and 99. More recent kernels have supported the use of names such as /dev/md/Home. Under kernel 2.4 and earlier these two were the only options. Both of them are non-partitionable.

From kernel 2.6 a new type of MD device was introduced, a partitionable array. The device names were modified by changing
md to md_d. The partitions were identified by adding pn; thus /dev/md/md_d2p3 for example.

From kernel 2.6.28 non-partitionable arrays can be partitioned, the partitions being referred to in the same way as for partitionable arrays: /dev/md/md1p2.

Booting

Since support for MD is found in the kernel, there is an issue with using it before the kernel is running. Specifically it will not be present if the boot loader is either (e)LiLo
LILO (boot loader)
LILO is a generic boot loader for Linux.-Overview:LILO does not depend on a specific file system, and can boot an operating system from floppy disks and hard disks. One of up to sixteen different images can be selected at boot time. Various parameters, such as the root device, can be set...

 or GRUB
GNU GRUB
GNU GRUB is a boot loader package from the GNU Project. GRUB is the reference implementation of the Multiboot Specification, which provides a user the choice to boot one of multiple operating systems installed on a computer or select a specific kernel configuration available on a particular...

 legacy. It may not be present for GRUB 2. In order to circumvent this problem a /boot filesystem must be used either without md support, or else with RAID1. In the latter case the system will boot by treating the RAID1 device as a normal filesystem, and once the system is running it can be remounted as md and the second disk added to it. This will result in a catch-up, but /boot filesystems ought to be small.

Create an array

mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/sda1 /dev/sdb1

Create a RAID 1 (mirror) array from two partitions. If the partitions differ in size, the array is the size of the smallest partition. You
can create a RAID 1 array with more than two devices. This gives you multiple copies. Whilst
there is little extra safety in this, it makes sense when you are creating a RAID 5 array for most of your disk space and using RAID 1 only for
a small /boot partition. Using the same partitioning for all member drives keeps things simple.

mdadm --create /dev/md1 --level=5 --raid-devices=3 /dev/sda2 /dev/sdb2 /dev/sdc2

Create a RAID 5 volume from three partitions. If the partitions used in your RAID array are not the same size, mdadm will use the size of the smallest from each partition. If you receive an error, such as: "mdadm: RUN_ARRAY failed: Invalid argument",
make sure your kernel supports (either via a module or by being directly compiled in) the raid mode you are trying to use. Most modern kernels do, but you never know...

It is possible to create a degraded mirror, with one half missing by replacing a drive name with "missing":
mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdb1 missing

The other half mirror is added to the set thus:
mdadm --manage /dev/md1 --add /dev/sda1

This is useful when you are adding a disk to a computer which currently isn't mirrored. The new drive is...
  • partitioned to match the first (unless you are also repartitioning too)
  • turned into a set of "half-mirrors"
  • formatted with appropriate file system
  • mounted
  • the data is copied over,
  • made bootable
  • its grub config and fstab mounts changed

The computer is then booted off the secondary drive (or a rescue disk), the now idle original disk can be repartitioned if required (no need to format), and then the primary drive submirrors are added.

Note that the partition types should be changed to 0xFD with fdisk to indicate that they are mirrored devices.

Recording the array

mdadm --detail /dev/md0

View the status of the multi disk array md0.

mdadm -Es | grep md0 >>/etc/mdadm/mdadm.conf

This adds md0 to the configuration file so that it is recognised next time you boot.

You may wish to keep a copy of /proc/mdstat on another machine or as a paper copy. The information will allow you to restart the array
manually if mdadm fails to do so.

Growing an array by adding devices

mdadm --add /dev/md1 /dev/sdd1
mdadm --grow /dev/md1 --raid-devices=4

This adds the new device to the array then grows the array to use its space.

In some configurations you may not be able to grow the array until you have removed the internal bitmap. You can add the bitmap back again after the array has been grown.
mdadm --grow /dev/md1 -b none
mdadm --grow /dev/md1 -b internal

Growing an array by upgrading devices

An array may be upgraded by replacing the devices one by one, either as a planned upgrade or ad hoc as a result of replacing failed devices.

mdadm /dev/md1 --fail /dev/sda1
(replace the first drive with the new, larger one then partition it)
mdadm --add /dev/md1 /dev/sda1

Allow the new drive to resync. If replacing all the devices repeat the above for each device, allowing the array to resync between repetitions. Finally, grow the array to use the maximum space available and then grow the filesystem(s) on the RAID array to use the new space.

mdadm --grow /dev/md1 --size=max

Deleting an array

mdadm --stop /dev/md0 # to halt the array
mdadm --remove /dev/md0 # to remove the array
mdadm --zero-superblock /dev/sd[abc]1 # delete the superblock from all drives in the array
(edit /etc/mdadm/mdamd.conf to delete any rows related to deleted array)

Convert an existing partition to RAID 5

Assume that the existing data is on /dev/sda1:

mdadm --create /dev/md1 --level=5 --raid-devices=3 missing /dev/sdb2 /dev/sdc2
mdadm -Es >>/etc/mdadm/mdadm.conf
update-initramfs -u
dd if=/dev/sda1 of=/dev/md1
(add /dev/md1 to your boot loader's menu)
(reboot into /dev/md1)
mdadm --add /dev/md1 /dev/sda1
(update your boot loader)

Notes:
  1. A partition may be given as missing to act as a placeholder so that it can be added later.
  2. The /boot directory should be elsewhere, possibly on /dev/md0 or its own partition.
  3. If the reboot fails, do NOT add /dev/sda1 into the array until the problem is corrected!

Mdmpd

Mdmpd is a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 for the GNU
GNU
GNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...

/Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 Operating System
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

. It is part of the mdadm package written and copyrighted by Red Hat
Red Hat
Red Hat, Inc. is an S&P 500 company in the free and open source software sector, and a major Linux distribution vendor. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina with satellite offices worldwide....

. The program is used to monitor multi-path (RAID
RAID
RAID is a storage technology that combines multiple disk drive components into a logical unit...

) devices, and is usually started at boot time as a service, and afterwards running as a daemon
Daemon (computer software)
In Unix and other multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user...

.

mdmpd - daemon to monitor MD multipath devices

Enterprise storage requirements often include the desire to have more than one way to talk to a single disk drive so that in the event of some failure to talk to a disk drive via one controller, the system can automatically switch to another controller and keep going. This is called multipath disk access. The linux kernel implements multipath disk access via the software RAID stack known as the md (Multiple Devices) driver. The kernel portion of the md multipath driver only handles routing I/O requests to the proper device and handling failures on the active path. It does not try to find out if a path that has previously failed might be working again. That's what this daemon does. Upon startup, the daemon will fork and place itself in the background. Then it reads the current state of the md raid arrays, saves that state, and then waits for the kernel to tell it something interesting has happened. It then wakes up, checks to see if any paths on a multipath device have failed, and if they have then it starts to poll the failed path once every 15 seconds until it starts working again. Once it starts working again, the daemon will then add the path back into the multipath md device it was originally part of as a new spare path.

If one is using the /proc filesystem, /proc/mdstat lists all active md devices with information about them. Mdmpd requires this to find arrays to monitor paths on and to get notification of interesting events.

Known problems

A common error when creating RAID devices is that the dmraid-driver has taken control of all the devices that are to be used in the new RAID device. Error-messages like this will occur:

mdadm: Cannot open /dev/sdb1: Device or resource busy

Typically, the solution to this problem involves adding the "nodmraid" kernel parameter to the boot loader config. Another way this error can present itself is if the device mapper has its way with the drives. Issue 'dmsetup table' see if the drive in question is listed. 'dmsetup remove ' will remove the drive from device mapper and the "Device or resource busy" error will go away as well.

RAID already running

First check if the device isn't in use in another array:
cat /proc/mdstat

Probably you will have to stop the array with:
mdadm --stop /dev/

Check the /etc/mdadm/mdadm.conf file (and restart system if possible):
vi /etc/mdadm/mdadm.conf

Then you should be able to delete the superblock of this device:
mdadm --misc --zero-superblock /dev/sdxN

Now the device shouldn't be busy any more.

Sometimes dmraid "owns" the devices and will not let them go. There is a solution.

Tweaking the kernel

To solve this problem, you need to build a new initrd
Initrd
In computing, initrd is a scheme for loading a temporary file system into memory in the boot process of the Linux kernel. initrd and initramfs refer to slightly different methods of achieving this...

without the dmraid-driver. The following command does this on a system with the "2.6.18-8.1.6.el5"-kernel:
mkinitrd --omit-dmraid /boot/NO_DMRAID_initrd-2.6.18-8.1.6.el5.img 2.6.18-8.1.6.el5

After this, the system has to be rebooted with the new initrd. Edit your /boot/grub/grub.conf to achieve this.

Alternatively if you have a self customized and compiled kernel from a distro like Gentoo (the default option in gentoo) which doesn't use initrd then check kernel .conf file in /usr/src/linux for the line
# CONFIG_BLK_DEV_DM is not configured

If the above line is set as follows:
CONFIG_BLK_DEV_DM=yes

then You might have to disable that option, recompile the kernel, put it in /boot and finally edit grub conf file in /boot/grub. PLEASE be careful NOT to disable
CONFIG_BLK_DEV_MD=yes

(Note the MD instead of DM) which is essential for raid to work at all!

If both methods have not helped you then booting from live CD probably will (the below example is for starting a degraded raid-1 mirror array and adding a spare hdd to it and syncing. Creating a new one shouldn't be more difficult because the underlying problem was 'Device or resource busy' error):
modprobe raid1
mknod /dev/md1 b 9 1
mknod /dev/md3 b 9 3
mdadm --assemble /dev/md1 /dev/hda1
mdadm --assemble /dev/md3 /dev/hda3
mdadm --add /dev/md1 /dev/hdb1
mdadm --add /dev/md3 /dev/hdb3

It might be easier to try and automatically assemble the devices
mdadm --assemble --scan

Remember to change the corresponding md* and hd* values with the corresponding ones from your system.
You can monitor the sync progress using:
cat /proc/mdstat

When the sync is done you can reboot in your Linux normally.

Zeroing the superblock

Another way to prevent the kernel autostarting the raid is to remove all the previous raid-related information from the disks before proceeding with the creation, for example:
mdadm --stop /dev/md0
mdadm --zero-superblock /dev/sd[abcd]1

And now the usual create, for example:
mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=0 /dev/sd[abcd]1

Recovering from a loss of raid superblock

There are superblocks on the drives themselves and on the raid (apparently). If you have a power failure, hardware failure, that does not include the drives themselves, and you cannot get the raid to recover in any other way, and wish to recover the data, proceed as follows:

Record all your raid member parameters:
mdadm --examine /dev/sd[abcde...]1 | egrep 'dev|Update|Role|State|Chunk Size'

Look carefully at the Update time. If you have raid members attached to the motherboard and others attached to a raid card, and the card fails, but leaves enough members to keep the raid alive, you want to make a note of that.
Look at Array State and Update Time. For example:

/dev/sdc1:
Update Time : Wed Jun 15 00:32:35 2011
Array State : AAAA.. ('A'

active, '.'

missing)
/dev/sdd1:
Update Time : Thu Jun 16 21:49:27 2011
Array State : .AAA.. ('A'

active, '.'

missing)
/dev/sde1:
Update Time : Thu Jun 16 21:49:27 2011
Array State : .AAA.. ('A'

active, '.'

missing)
/dev/sdf1:
Update Time : Thu Jun 16 21:49:27 2011
Array State : .AAA.. ('A'

active, '.'

missing)
/dev/sdk1:
Update Time : Tue Jun 14 07:09:34 2011
Array State : ....AA ('A'

active, '.'

missing)
/dev/sdl1:
Update Time : Tue Jun 14 07:09:34 2011
Array State : ....AA ('A'

active, '.'

missing)

Devices sdc1, sdd1, sde1 and sdf1 are the last members in the array and will rebuild correctly. sdk1 and sdl1 left the array (in my case due to a raid card failure).

Also note the raid member, starting with 0, the raid needs to be rebuilt in the same order.
Chunk size is also important.

Zero the drive superblocks

mdadm --zero-superblock /dev/sd[cdefkl]1

Reassemble the raid

mdadm --create /dev/md1 --chunk=512 --level=6 --raid-devices=6 /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sde1 missing missing

'missing' tell the create command to rebuild the raid in a degraded state. sdk1 and sdl1 can be added later

Edit /etc/mdadm.conf and add an ARRAY line with a UUID. First get the UUID for your raid:

mdadm -D /dev/md

then:

nano /etc/mdadm.conf

and add something similar to the file (notice there is no # in front of the active line you are adding

#ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
#ARRAY /dev/md1 super-minor=1
#ARRAY /dev/md2 devices=/dev/hda1,/dev/hdb1
ARRAY /dev/md1 UUID=7ee1e93a:1b011f80:04503b8d:c5dd1e23

Save with

Last, mark the array possilbly dirty with:
mdadm --assemble /dev/md1 --update=resync

All your data should be recovered!

Increasing RAID ReSync Performance

In order to increase the resync speed, we can use a bitmap, which mdadm will use to mark which areas may be out-of-sync.
Add the bitmap with the grow option like below :
mdadm -G /dev/md2 --bitmap=internal
Note: mdadm - v2.6.9 - 10th March 2009 on Centos 5.5 requires this to be run on a stable "clean" array. If the array is rebuilding the following error will be displayed:

mdadm: failed to set internal bitmap.

And the following line is added to the log file:
md: couldn't update array info. -16


then verify that the bitmap was added to the md2 device using
cat /proc/mdstat

you can also adjust Linux kernel limits by editing these files
/proc/sys/dev/raid/speed_limit_min
and
/proc/sys/dev/raid/speed_limit_max

You can also edit this with the sysctl utility
sysctl -w dev.raid.speed_limit_min=50000

Increasing RAID5 Performance

To help RAID5 read/write performance, setting the read-ahead & stripe cache size for the array provides noticeable speed improvements.

Note: This tip assumes sufficient RAM availability to the system. Insufficient RAM can lead to data loss/corruption

echo 16384 > /sys/block/md0/md/stripe_cache_size
blockdev --setra 16384 /dev/md0

Write Performance:
dd if=/dev/zero of=/mnt/family/10gb.16384k.stripe.out bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 94.5423 s, 114 MB/s

Read Performance:
dd if=/mnt/family/10gb.16384k.stripe.out of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 28.5435 s, 376 MB/s

These changes must be done on any reboot (add to an init script to set on start-up)

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK