Linux Disk Alignment Reloaded


railtrackmisalignMy all-time high post with the most pageviews is the one on Linux disk alignment: How to set disk alignment in Linux. In that post I showed an easy method on how to set and check disk alignment under linux.

Enough has been written on why disk alignment is important but here a brief reminder: The standard DOS partitioning mechanism by default creates the first partition starting at 63 blocks of 512 bytes. So the first data block of the partition starts at 32256 bytes which is not a multiple of EMC storage disk and cache block sizes (it probably doesn’t match many other storage subsystems either). The problem is that a single 8K write (for example,  to the first block of the partition) causes two 8K block updates on the physical disk backend. Consider the layout:

Disk block 0: 00000 - 08192 * holds DOS partition table and boot sector
Disk block 1: 08192 - 16384 * empty
Disk block 2: 16384 - 24576 * empty
Disk block 3: 24576 - 32768 * Holds the first 512 bytes of the first partition (starting @ 32256 - so offset 32256 = 63 * 512 is offset 0 of partition 1)
Disk block 4: 32768 - 40960 * Holds 8192 more bytes of the first partition
etc.

An 8K write to offset 0 of the first partition causes:
* Read of 8K of disk block 3 – update of the last 512 bytes of block 3 – rewrite to disk again
* Read of 8K of disk block 4 – update of the first 3584 bytes of block 4 – rewrite to disk again
(two block reads – two block writes)

If the partition was aligned at, say, 32KB then partition 1 started exactly at offset 32768 (4 blocks) and the whole 8K write operation would require no backend reads at all and just one backend write (of course ignoring RAID and other aspects). Note that for specialized FLASH arrays (i.e. EMC’s XtremIO and the like) this becomes all the more important – both for performance and to avoid excessive wear of the flash cells.

I found that in CentOS 6, and other recent Linux distributions (i.e. Ubuntu, Mint, probably Red Hat 6 and Oracle Enterprise Linux 6) the command “sfdisk” complains when you want it to create 128 block aligned partitions. I noticed this already when writing my original post but as I was mainly using CentOS 5.x I didn’t bother, until now when I am messing with Oracle on CentOS 6.4 🙂

Strangely enough, on those distros the boot disk comes aligned nicely at 1MB (2048 sectors). But both “fdisk” and “cfdisk” still create partitions by default at 63 sector offsets. However, if you use “fdisk” and follow the guidelines and switch to new behaviour (use “fdisk -c”) it will create aligned partitions by default. Still you have to go through a menu manually and for ease of use when configuring lots of volumes, or when scripting, the old “sfdisk” method was handy.

Here an example where I have a 1GB empty volume as /dev/sdb:

# sfdisk -uS -l /dev/sdb

Disk /dev/sdb: 130 cylinders, 255 heads, 63 sectors/track

# echo 128,, | sfdisk -uS /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 130 cylinders, 255 heads, 63 sectors/track
/dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Warning: The partition table looks like it was made
for C/H/S=*/139/8 (instead of 130/255/63).
For this listing I'll assume that geometry.
Units = sectors of 512 bytes, counting from 0

Device Boot    Start       End   #sectors  Id  System
/dev/sdb1           128   2097151    2097024  83  Linux
start: (c,h,s) expected (0,16,1) found (0,2,3)
end: (c,h,s) expected (1023,138,8) found (130,138,8)
/dev/sdb2             0         -          0   0  Empty
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Warning: partition 1 does not end at a cylinder boundary

sfdisk: I don't like these partitions - nothing changed.
(If you really want this, use the --force option.)

OK, use the force, Luke…

# echo 128,, | sfdisk -uS /dev/sdb --force
Checking that no-one is using this disk right now ...
OK
Disk /dev/sdb: 130 cylinders, 255 heads, 63 sectors/track
/dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Warning: The partition table looks like it was made
for C/H/S=*/139/8 (instead of 130/255/63).
For this listing I'll assume that geometry.
Units = sectors of 512 bytes, counting from 0

Device Boot    Start       End   #sectors  Id  System
/dev/sdb1           128   2097151    2097024  83  Linux
start: (c,h,s) expected (0,16,1) found (0,2,3)
end: (c,h,s) expected (1023,138,8) found (130,138,8)
/dev/sdb2             0         -          0   0  Empty
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Warning: partition 1 does not end at a cylinder boundary
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

The problem with the -force option is, that it is dangerous as it will overwrite existing partitions without any warning:

# echo 256,, | sfdisk -uS /dev/sdb --force
Checking that no-one is using this disk right now ...
OK
Disk /dev/sdb: 130 cylinders, 255 heads, 63 sectors/track
Old situation:
Warning: The partition table looks like it was made
for C/H/S=*/139/8 (instead of 130/255/63).
For this listing I'll assume that geometry.
Units = sectors of 512 bytes, counting from 0

Device Boot    Start       End   #sectors  Id  System
/dev/sdb1           128   2097151    2097024  83  Linux
start: (c,h,s) expected (0,16,1) found (0,2,3)
end: (c,h,s) expected (1023,138,8) found (130,138,8)
/dev/sdb2             0         -          0   0  Empty
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
New situation:
Warning: The partition table looks like it was made
for C/H/S=*/139/8 (instead of 130/255/63).
For this listing I'll assume that geometry.
Units = sectors of 512 bytes, counting from 0

Device Boot    Start       End   #sectors  Id  System
/dev/sdb1           256   2097151    2096896  83  Linux
start: (c,h,s) expected (0,32,1) found (0,4,5)
end: (c,h,s) expected (1023,138,8) found (130,138,8)
/dev/sdb2             0         -          0   0  Empty
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Warning: partition 1 does not end at a cylinder boundary
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

As you see, sfdisk has overwritten the existing partition happily without any warning. In order to overcome, I have been searching for a better way to do this under Linux 6 distros. I found that the “parted” utility provides a much better, safer and cleaner way to set alignment but it took a moment to figure out how to make it create aligned partitions directly from the command line.

Remove the old partition:

# parted /dev/sdb rm 1
Information: You may need to update /etc/fstab.

# parted /dev/sdb print
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 1074MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start  End  Size  Type  File system  Flags

And the magic:

# parted /dev/sdb mkpart primary 1m 100%
Information: You may need to update /etc/fstab.

# parted /dev/sdb print
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 1074MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
1      1049kB  1074MB  1073MB  primary

We now have a 1MB aligned partition. It shows as 1049 KB because default reporting is 1MB = 1000000 bytes. “Engineering” style Megabytes (Mebibytes) don’t seem to be supported so chose either 512 byte blocks, or single bytes:

# parted /dev/sdb "unit s print"
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 2097152s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start  End       Size      Type     File system  Flags
1      2048s  2097151s  2095104s  primary

# parted /dev/sdb "unit b print"
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 1073741824B
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start     End          Size         Type     File system  Flags
1      1048576B  1073741823B  1072693248B  primary

What if we try to overwrite an existing partition? Let’s say you want a partition with the old 128 sector (64k) alignment but forgot to remove the existing partition…

# parted /dev/sdb mkpart primary 128 100%
Error: Can't have overlapping partitions.

Good…

Linux 6, Oracle ASM and alignment

Also note that in Red Hat 6 (and CentOS 6 what I’m using) Oracle ASMLib is no longer supported. ASMLib somehow required ASM volumes to be placed on the first partition of a volume (therefore the requirement to align disks). Although there are ways to avoid having to use DOS partitioning, that leads into other trouble. Fortunately, if you do NOT use ASMLib but use Linux UDEV to provide volumes for ASM, then you don’t have to use partitioning at all (and thus this whole article is redundant).

How you do this is nicely and simply described on Using udev on RHEL 6 / OL 6 to change disk permissions for ASM for example. Make sure you use a method that creates ASM volumes directly on the disk (i.e. on /dev/sdb) and does not use partitioning (/dev/sdb1).

That method has important benefits:

  • You don’t need ASMLib anymore (no dependency on Oracle vendor lock-in)
  • Faster reboots/rescans because ASMLib slows down booting (especially with large number of volumes) although there is a fix for that
  • ASM volumes stay in place even if the LUN order changes between reboots
  • ASM volumes show up in /dev/oracleasm and disappear from /dev (this prevents trigger-happy Linux administrators to be tempted to create partition tables and/or filesystems on existing ASM volumes)

Note that if you run on VMware you need to set

disk.EnableUUID = "true"

in the virtual machine’s VMX file (or through the VMware GUI) for this to work (otherwise VMware will not report SCSI serial ID and udev will not detect anything).

Conclusion:

parted /dev/sdXX mkpart primary 1m 100%

is all you need to know (And who needs ASMlib anyway … 🙂

Filed under: Oracle, Performance, Various Tagged: asm, asmlib, disk alignment, EMC, flash disk, I/O bottlenecks, iops, performance

The post Linux Disk Alignment Reloaded appeared first on Dirty Cache.

Laat een reactie achter