These are some performance tests on a Infortrend EonStor RAID system, attached via a LSI22320RB-F scsi HBA card, also known as LSI22320-R. The Infortrend RAID is a 24-disk box arranged as two RAID-6 arrays of 12 disks each, each disk 1 TB. So each file-system will be 10 TB. The PowerEdge-server operating system is currently Fedora 11 (64-bit), but the box is dual-bootable to the previous Fedora 10 as well. The RAID has a 1GB cache and the server has 2GB RAM and one Intel 5160 dual-core processor. The RAID has a chunk-size of 128 kBytes (which is the factory default and is optimum on it for sequential access).
Also see this Contents list
for earlier and later tests.
The tests show that ext4 is a big improvement over ext3 for speed of
mkfs and fsck and file deletion. Read performance is excellent when
read-ahead settings are set appropriately. Write
performance is very good but can be excellent if write-barriers can be
safely turned-off** in a particular sort of environment.
mkfs -t ext4 -E stride=32,stripe-width=320 -i 65536 devicename
real 4m42.252s
user 0m2.885s
sys 0m30.641s
### For a similar mkfs but with -i 131072, mkfs took real 3m38s, sys 0m18s
### For a similar mkfs but with -t ext3, mkfs took real 27m10s, sys 0m33s
# tune2fs -c 0 -i 0 -r 102400 devicename
# mount ...
# df -T /disk/11a
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sdb ext4 9726161912 171668 9725580644 1% /disk/11a
# With 2 million large files put on the filesystem ....
# df -TH /disk/11a
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdb ext4 10T 5.1T 5.0T 51% /disk/11a
# Pulled-out server power plug, and then restarted the system ....
# time fsck devicename
fsck 1.41.3 (12-Oct-2008)
e2fsck 1.41.3 (12-Oct-2008)
11a: recovering journal
11a: clean, 2003101/152578048 files, 1244696169/2441239040 blocks
real 0m1.251s
user 0m0.763s
sys 0m0.401s
# Now force a full fsck consistency check on this file-system
# time fsck -f devicename
fsck 1.41.3 (12-Oct-2008)
e2fsck 1.41.3 (12-Oct-2008)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
11a: 2003101/152578048 files (19.8% non-contiguous), 1244696177/2441239040 blocks
real 4m45.834s
user 2m42.751s
sys 0m2.272s
# On moving to a Fedora 11 64-bit system, do a further normal fsck check.
# Also forced a full fsck consistency check. Both timings follow.
# Note that there are now 6M files in place of 2M, and it's 25% full not 50%.
# time fsck.ext4 devicename
e2fsck 1.41.4 (27-Jan-2009)
11a: clean, 6193839/152578048 files, 629022036/2441239040 blocks
real 0m0.854s
user 0m0.654s
sys 0m0.155s
# fsck.ext4 -f devicename
e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
11a: 6193839/152578048 files (0.2% non-contiguous), 629022036/2441239040 blocks
real 12m27s
Data size |
Data shape |
Create time ext2 |
Create time ext3o |
Create time ext4o (barrier=1) |
Create time ext4o barrier=0 |
Create time xfs (barrier) |
Create time xfs nobarrier |
---|---|---|---|---|---|---|---|
10GB |
1MiB file, 1000 files/dir, 10 dir |
real
0m48.904s user 0m2.571s sys 0m13.138s real 0m43.675s user 0m2.302s sys 0m15.944s |
real 0m57.437s user 0m2.730s sys 0m21.569s real 0m55.972s user 0m2.343s sys 0m27.640s |
real 0m59.810s user 0m2.484s sys 0m19.168s |
real 0m57.615s user 0m2.491s sys 0m26.440s |
real 3m35.545s user 0m2.751s sys 0m13.102s real 1m56.742s user 0m2.313s sys 0m16.775s |
real 0m43.322s user 0m2.582s sys 0m13.094s |
100GB |
1MiB file, 1000 files/dir, 100 dirs |
real
8m29.126s user 0m26.711s sys 2m4.953s real 7m38.323s user 0m23.194s sys 2m43.536s |
real 12m10.949s user 0m28.111s sys 3m51.506s real 12m35.604s user 0m24.770s sys 4m48.326s |
real 10m33.521s user 0m26.472s sys 4m3.919s |
real 7m42.901s user 0m28.684s sys 5m19.282s |
real 40m33.414s user 0m28.206s sys 2m16.300s real 21m33.492s user 0m24.103s sys 2m52.960s |
real 7m29.459s user 0m26.882s sys 2m14.342s real 7m25.802s user 0m24.389s sys 2m54.731s |
100GB |
10MiB file, 1000 files/dir, 10 dirs |
real 6m53.810s user 0m3.071s sys 1m37.432s |
real 8m47.557s user 0m4.284s sys 2m54.904s real 9m20.885s user 0m3.594s sys 3m53.846s |
real 11m16.134s user 0m3.922s sys 2m38.688s |
real 7m49.848s user 0m3.989s sys 2m53.951s |
real 16m19.207s user 0m4.142s sys 1m50.147s real 13m56.126s user 0m3.636s sys 2m16.774s |
real 7m11.916s user 0m3.646s sys 2m16.462s |
100GB |
10GiB file, 10 files/dir, 1 dir |
real
7m17.119s user 0m0.377s sys 1m19.629s real 7m22.659s user 0m0.330s sys 1m50.320s |
real 8m36.844s user 0m0.368s sys 3m5.284s real 11m11.748s user 0m0.354s sys 4m2.122s |
real 10m38.975s user 0m0.318s sys 2m27.480s |
real 7m19.614s user 0m0.325s sys 2m29.694s |
real 8m0.995s user 0m0.423s sys 1m54.819s real 8m44.645s user 0m0.361s sys 2m21.015s |
real 7m17.117s user 0m0.403s sys 1m50.620s real 7m19.851s user 0m0.339s sys 2m20.550s |
Delete timings: it was important to ensure that all buffers were
flushed from the server cache and RAID cache before starting a delete:
otherwise deletes can be spuriously fast. This was done by interleaving
other creation steps of the 480GB data between creation and deletion
for a particular step.
Data size |
Data shape |
Delete time ext2 |
Delete time ext3o |
Delete time ext4o (barrier=1) |
Delete time ext4o barrier=0 |
Delete time xfs (barrier) |
Delete time xfs nobarrier |
---|---|---|---|---|---|---|---|
10GB |
1MiB file, 1000 files/dir, 10 dir |
real
0m11.577s user 0m0.001s sys 0m0.396s real 0m56.190s user 0m0.005s sys 0m0.344s |
real 0m1.033s user 0m0.005s sys 0m0.891s real 1m23.686s user 0m0.013s sys 0m1.012s |
real 0m0.739s user 0m0.004s sys 0m0.471s |
real 0m0.868s user 0m0.004s sys 0m0.490s |
real 0m29.862s user 0m0.003s sys 0m1.199s real 0m5.900s user 0m0.006s sys 0m0.441s |
real 0m6.108s user 0m0.007s sys 0m1.304s real 0m1.020s user 0m0.005s sys 0m0.478s |
100GB |
1MiB file, 1000 files/dir, 100 dirs |
real
9m27.612s user 0m0.055s sys 0m3.111s |
real 17m4.312s user 0m0.074s sys 0m13.102s real 13m50.109s user 0m0.063s sys 0m9.857s |
real 0m15.999s user 0m0.032s sys 0m6.055s |
real 0m12.935s user 0m0.037s sys 0m4.995s |
real 4m25.052s user 0m0.054s sys 0m8.903s real 1m0.754s user 0m0.049s sys 0m4.449s |
real 1m5.323s user 0m0.062s sys 0m9.784s real 0m8.438s user 0m0.041s sys 0m4.698s |
100GB |
10MiB file, 1000 files/dir, 10 dirs |
real 3m24.135s user 0m0.005s sys 0m1.423s real 2m56.238s user 0m0.010s sys 0m1.302s |
real 4m35.323s user 0m0.002s sys 0m5.805s real 4m57.621s user 0m0.010s sys 0m5.528s |
real 0m10.063s user 0m0.004s sys 0m3.218s |
real 0m12.270s user 0m0.002s sys 0m3.548s |
real 0m26.857s user 0m0.004s sys 0m1.239s real 0m6.023s user 0m0.007s sys 0m0.432s |
real 0m0.896s user 0m0.007s sys 0m0.478s |
100GB |
10GiB file, 10 files/dir, 1 dir |
real
2m21.418s user 0m0.000s sys 0m1.329s real 2m39.210s user 0m0.000s sys 0m1.357s |
real 1m35.205s user 0m0.000s sys 0m5.204s real 2m22.824s user 0m0.000s sys 0m5.376s |
real 0m3.667s user 0m0.000s sys 0m2.563s |
real 0m3.344s user 0m0.000s sys 0m2.203s |
real 0m0.660s user 0m0.001s sys 0m0.434s real 0m0.001s user 0m0.000s sys 0m0.001s |
real 0m0.457s user 0m0.000s sys 0m0.436s real 0m0.001s user 0m0.000s sys 0m0.001s |
blockdev --setra $rab /dev/$devwhere $rab is the read-ahead buffer size in 512-byte sectors; this was equivalent in 2.6 kernel to doing:
echo $rabkb > /sys/block/$dev/queue/read_ahead_kbwhere $rabkb is the read-ahead buffer size in kBytes. The system default is 128 kBytes, which is generally far too small for big files.
bonnie++ step being performed |
Units |
ext2 |
ext3o |
ext4o (barrier) |
ext4o barrier=0 |
xfs (barrier) |
xfs nobarrier |
Block Output 8GB:32kB (8GB data,
32kB blks) |
kB/sec |
244750 |
232150 |
167028 |
234804 |
197991 |
240840 |
Block Input 8GB:32kB, 2MiB
read-ahead |
kB/sec |
252522 |
250428 |
255534 |
257082 |
258847 |
259223 |
Random seeks |
/sec |
389 |
391 |
502 |
514 |
275 |
277 |
Create Sequ,Random |
/sec |
29986,30054 |
23894,31727 |
28018,28885 |
28751,29802 |
1701,1688 | 6475,6420 |
Delete Sequ,Random |
/sec |
184274,69779 |
86227,14194 |
68956,6053 |
71655,13578 |
1752,1311 |
19987,7799 |
ext2----,8G:32k,46572,98,244750,21,95767,12,79417,96,252522,17,389.2,1,
256/256,29986,99,+++++,+++,184274,99,30054,97,430370,99,69779,99
ext3obar,8G:32k,46221,97,232856,50,89583,16,80230,97,250051,17,387.1,1,
256/256,23898,71,444656,100,85154,85,31575,93,427814,100,14294,20
ext3onob,8G:32k,47033,99,232150,49,89235,15,78734,96,250428,16,391.2,1,
256/256,23894,71,446827,99,86227,86,31727,93,430843,99,14194,20
ext4obar,8G:32k,49695,96,167028,20,83297,13,79158,97,255534,17,502.1,1,
256/256,28018,92,423243,99,68956,88,28885,93,406118,100,6053,9
ext4onob,8G:32k,50596,99,234804,29,94345,14,82687,99,257082,18,514.4,1,
256/256,28751,93,420693,99,71655,91,29802,96,406985,99,13578,22
xfs--bar,8G:32k,41380,87,197991,23,91445,23,82362,99,258847,18,275.6,0,
256/256,1701,17,510076,99,1752,5,1688,17,415676,99,1311,4
xfs--nob,8G:32k,47390,99,240840,28,90546,22,82266,99,259223,18,277.0,1,
256/256,6475,61,503892,100,19987,60,6420,61,414236,99,7799,26
kernel: mptscsih: ioc1: attempting task abort! (sc=c86a4e40)- On Fedora 10, when using ext2, when creating the 1MB*1000dirs*100dirs files and when copying directories using cp -a, and on one occasion when simply creating a single low level directory on an empty filesystem, the operations failed with I/O error and there were corresponding messages in /var/log/messages:
kernel: sd 1:0:11:0:
kernel: command: Read(10): 28 00 00 64 00 00 00 00 08 00
kernel: mptscsih: ioc1: task abort: SUCCESS (sc=c86a4e40)
grow_buffers: requested out-of-range block 18446744071757758592 for device sdbThis may be a problem with kernel 2.6.27.5-117.fc10.i686 specifically. But also, it's not clear to me that ext2 actually supports file-systems bigger than 8TB, so that might have been the problem. Nobody would want to use ext2 in production mode on such a big file-system anyway: only for performance comparison tests such as these!
grow_buffers: requested out-of-range block 18446744071787708803 for device sdb