The behaviour you're describing is quite similar to the one I'm experiencing, and there are some similarities in the setups we have. One major difference is that your HDDs are better than mine. While I do feel sorry for you for facing these problems, I'm also somewhat relieved to learn that these problems do not stem from my choice of HDDs.
To tell you the truth, I've decided, even prior your message, to stop wasting time with Intel's sorry excuse for a RAID and ditch it. By now I'm very determined. I'm looking these days for a good deal on a RAID controller that will also support AF disks. I'd suggest you to do the same, it appears that Intel's RAID is, quite simply, a piece of junk.
Phew, so good to know it's not just me. The severe latency problem may still be the drives (I think) - otherwise there would be a lot more folks all bent of shape about this issue here, no? People seem to complain more about RAID5 write speeds than latencies.
I am thinking of either switching my setup to RAID10 by adding another drive - in a hope it will cure the latency problem. If it doesn't, then look for an add-on RAID HBA.
I have to concur with the above poster. After my contact with Intel support and their total lack of acknowledging that there is something wrong with this I have also stopped fiddling with it. It just irks me that the Intel Rapid Storage webpage makes all these claims and makes a lot of people waste their time fiddling with this.
I understand that they do not want RST to eat into their cake of professional controllers but could they not just make it work at least reasonably?
They should clearly state the limitations they have put into their software. So far I have seen the following problems.
Maybe we can pool our setups here or maybe on some shared google spreadsheet and see if we can discover some pattern?
To do this we need some kind of standardized testprograms and testsuites. Help me design one
4K sector HDs: Data from these setups should be analyzed separatly since it is impossible to make sure that they are aligned properly (cannot make sure RST-software puts the volumes in the proper place).
Please make suggestions how to improve tests and other issues to examine.
Test report proposal:
· General Computer configuration
· CPU, RAM
· Motherboard w/BIOS rev.
· Intel RST software driver version.
· RAID configuration :
· Nbr of Harddrives with Brand, model, sector size.
· Array configuration number of arrays and what harddrives are part of them.
· Volume configuration
· Number and size of volumes.
· Partition information.
· Number of partitions and size.
· Starting sector.
· Partition alignment (are they aligned according to the criteria in my previous posts).
· NTFS or GPT. (I am sure nobody uses FAT32 J )
· Formated cluster/file-allocation size.
· Test reports
· Which issue is being tested and testconfiguration
· Test programs with version and settings.
· Results as defined by tests below.
· Your own observations and thoughts.
Question 1: What kind of read and write performance are you getting with different RAID-levels?
Question 2: Is the performance what should be expected?
Test: All tests in with Crystaldiskmark(seq,512k,4k and 4KQD32) and ATTO with different RAID-levels (specifically RAID 0,1,5,1).
Report: Configuration and performance of the different tests.
Number of volumes
Question 1 : Does having multiple identical volumes impact performance on any of the volumes?
Question 2 : Does the filesystem used impact performance with multiple volumes? (GPT vs NTFS)
Test: Sequential Read/write-tests with Crystaldiskmarkand ATTO with 0.5-8kb transfer size and 4GB total size.
Report: Number and configuration of volumes in the array and the performance relating to each one.
Clustersize vs Stripe size
Question 1: How Clustersize (File allocation size) and Stripe size affect performance?
Question 2: Does the ratio between them affect performance?
Question 3: Does different clustersizes affect performance?
Test: Since it is known that stripe size affect performance I suggest running Sequential Read/write-tests with Crystaldiskmark(state version) and ATTO with 0.5-8kb transfer size and 4GB total size on the same identical volume
just altering the clustersize (can be done with some diskmanagement programs or reformatting volume).
Report; Configuration and performance of the different setups. Vital that stripesize tested is reported.
Question 1: Why are some system getting latencies and what configurations do they have?
Question 2: What operations are affected by latency?
Question 3: Is performance normal once it starts?
Test: This one is hard but Perfmonitor can be used to check latency. Maybe copying a large file or opening folders? (chip in here)
Report: Latency and under what conditions they occur with as much detail as possible.
Covered this same issue months ago. Your partition is misaligned. You need to experiment with different alignments, then rebench each time if you want the best result. Took me >50 benches, and over a week to sort out the problem. My final numbers were actually higher than last reported in that topic; 350MB/s W & 500MB/s R on a 5x 1TB drive raid, on ICH10R /Core i3. Pretty close to theoretical max.
Before you say it, I don't care what "improvements" microsoft made to the default partition alignment, it doesn't actually fix the underlying issue, just turns what is as bad as a gunshot wound into "just" a stab wound. Linux isn't really any better by default, people just love bashing M$. They (M$ & LinX) both have room for improvement over default settings. The ICH raid *DOES* work, but takes more effort. There is actually a forumula to building a good raid 5, it's so much more complicated than intel or any HW vendor makes it seem. There is a lot of info out there for people setting up DBs running into the same issues. Do your homework.
Your choices are as simple as this:
1) Deal with poor performance
2) Pay big $$$ for a HW raid card
3) Manually align your partition, and bench your system repeatedly until you hit a jackpot.
I had more time than money, and more determination than laziness. What about you?
Stripes larger than 64k peformed very poorly once the partition was aligned properly, this I believe is a limitation of the chipset overall.
I do regular 2GB to 8GB file writes to the array always at full read speed of the non-RAID drive. I regularly see >100MB/s transfers. Reads are uselessly fast for what I need.
Put your system drive on a drive outside the array.
Haha and why do you say my partitions are not aligned? I have them aligned according to these criteria:
With a Raid implementation you have to match the following alignment criteria for good performance:
To get Partition offset; Start->Run->MSinfo32. in MSinfo32: Components->Storage->Partition Starting Offset. (this is in bytes so divide it with 1024 to get it in kb.
For spreadsheets and some online alignment calculators: http://www.techpowerup.com/forums/showthread.php?t=107126
I have done this before I am used to setting up HW-arrays and just wanted to see if this was a viable "cheap" alternative. The reason for continuing with this is that I am stubborn and I want this to be fixed. It annoys me that Intel have something that is so hard to get acceptable performance from.
The big advantage of Intel RST I think is that you can just plug everything into a new motherboard and just get going without reinstalling the system.
I experienced this firsthand as one of my MBs died, I just plugged it in a replacement one, set BIOS to "Raid" and system booted right up.
If a HW card (or a NAS) dies you have to get an identical replacement card which can be hard if the card is a few years old.
With RST you can plug it into any Intel MB supporting RAID and recover data.
Your comment about stripe size is interesting since I inded have a 128kb stripe on the bad performing volume. Will make it a 64kb one and see what happens..(an age to initialize)...
I do get good performance on the first Volume (5-600 MB/s read, 250ish MB/s write). But very bad performance from the second one. Initially the first volume was as bad but after changing clustersize from 4k->32k i improved tremendously!.
Alignment is indeed on part of the problem however there are more to this than that. I think it has to do with how Intel RST schedules writes and how it interacts with the filesystem. However I think I will switch over to RAID10 and not deal with the parity hassle of RAID5. If Intel could just make the controller intelligently read from both sets.....
You look well informed, but as I suspected your formula did not take into consideration how many drives you are using. I know that sounds counter-intuitive to how the RAID5 performs, but it's not. This single aspect accounted for >10x write performance gain (yes 1000% gain) on IRST. After I got that settled, the rest was a balance between stripe/cluster to achieve the desired performance balance. Ultimately I traded some write performance for a boost in read performance as I didn't have any combination of hardware capable of saturating the array's write ability @ >350MB/s. Remember, nearly all of my bench tests revolved around testing every combination of cluster/stipe allowed by NTFS & IRST. What I'll say is that the best performing cluster/stripe ratio prior to achiving optimal alignment was not nearly the same as the post-alignment choice; it was actually one of the worst performing. Prior to alignment, the cluster/stripe I'm using currently was good for 20MB/s avg W & ~ 200MB/s R, after >350MB/s W & ~500MB/s R.
You see, the 128k stripes look good when your overall performance is low to begin with. Once you're firing on all cylinders, you find that the chipset will perform noticably worse than most other combinations below 128k stripes.
I'd rebench it for you and post screenies, but my array is pretty well filled now. Not only would the writes look worse due to where the free space is available on the drives, but it's stable, has been for months now (knock on wood) and I don't want to rock the boat.
Edit: For the reasons stated above, even numbered drive counts do poorly with R5. I suggest either moving to 5 drives, or dropping to 3 if performance is really important to you.
Edit #2: When you are benching your settings, you only need to initialize the first few GB of the array. I used 40GB I think, so the testing would go much faster.
Wow I am impressed with the amount of work you have put into that What was the magic bullet combo for you?
I seem to be getting good performance from my 32k cluster /64k stripe volume so will change to that on the second volume too.
Most tests online say a 64k-128k stripe should be pretty good for general performance. I understand your point about multiple drives but I am mostly reading/writing very large files (20MB-400MB) which should utilize all the drives in my array. When reading smaller files of course having a larger stripe size will affect performance since the file is stored in fewer stripes and cannot take advantage of multiple drives.
In my case files smaller than ~6*128kb = 768 kb should not perform optimal.
How can an even or odd number of drives affect performance?
How RST actually structures the volumes on the harddrives I do not know. I will change to similar cluster/stripe combo and rebench. They should perform identical then if they are aligned. (somehow I doubt this )
Maybe I will wait for the X79 chipset and its Intel RSTe instead and just go for full RAID10 with many more drives instead.
EDIT: The more I think about this it shows that Intel has done a pretty poor job of how the software sets up the array on the drives. You say you get 10x performance from changing the number of drives. I get a similar boost by changing clustersize. Stripe size also ties into this in some unholy way - What I am saying is that the storage software should be able to handle these configurations automatically for us. If it cannot properly support certaing configurations then those configurations should not be selectable in the software?!
I am fine with the software saying "Use an odd number of drives with 32k cluster size/64k stripe size and for god's sake align your partitions!" then I at least know what to expect. Why not go out on a limb and include a tool for creating aligned partitions taking into account RAID-settings? WD ships an alignment toolf for the AFS drives.
EDIT2: I wonder if Intel RST implements read-ahead?
Thanks, I did put quite the effort into this. --Do or do not, there is no try.
I never changed the number of drives I used. I built the whole server around a 5 drive RAID 5 setup. I know this, however, due to the research I put into this issue before hand. For large write blocks, drive numbers such as 3, 5, 9, (2^n+1) will work better, because the block size will be an even multiple of the sum across the stripe. When you take parity into account, even numbered drive setups produce an odd write size that doesn't divide evenly onto all drives. On R5 parity is distributed across all drives, yes, but writes still work on a basic write stripe + parity stripe basis. If you have 4 drives, you'd be dividing stripe writes onto 3 drives + parity. Normal settings don't allow for a cluster/stripe combination that divides evenly to 3 drives. Your always over (4 stripes on 3 drives) or under (2 stripes on 3 drives) writing stripes, slowing down performance considerably. If you use odd numbers, your writes will divide cleanly onto 2, 4, 6 etc drives, with parity rotating on to the odd drive.
I hope this doesn't come across condesending, I really don't mean it to be, but... I have purposefully omitted my combination of settings from any of my posts after figuring out the issue. The reason why is, it works for my specific setup. Unless you had the same hardware, in the same configuration it's not the right answer for you. People reading this would give my numbers a try without working out the fomula for themselves, and making the *right* changes needed; then continue to bad-mouth a chipset that is quite capable for general use when configured properly.
Regardless of which RAID 5 solution you chose, even number drives will always perform worse. It's not the strip/cluster combination.
Ahh I think I understand what you are getting at
Actually I think I misunderstood what stripe-size is. According to many online sites (and training courses) stripe size is refered to as the size of the actual block written to each disk. This is what I have always thought the definition was.
But according to Wiki:
"RAID works by spreading the data over several disks. Two of the terms often used in this context are stripe size and chunk size.
The chunk size is the smallest data block that is written to a single disk of the array. The stripe size is the size of a block of data that will be spread over all disks. That way, with four disks, and a stripe size of 64 kilobytes (kB), 16 kB will be written to each disk."
Is this really the case?
If I understand this correctly then this will cause two alignment problems.
So if I understand this correctly in order to get fully aligned writes you should obey the following criteria:?
So for instance in a 64k, 4 drive RAID5 you get the following picture. With one chunk stored on each drive.
In my case with 128kb stripe on 6 drives = 21,333 kb chunksize. I do not know what the storage controller does with this? I assume it somehow rounds it up. .
Further complicating this is that the parityblock is included in the stripe size aswell. Using the above example one data chunk in each stripe will be reserved for parity. Hence in the example above the situation will be something like this.
Hence 48Kb of data in each stripe.
So when your filesystem is accesing the system you should do so in chunks that fit within multiples of the stripe to always utilize the drives most efficient?
This could be expressed as:
Then of course having an odd number of drives would be best since then you can really view the filesystem as consisting of an even number of data drives and one parity drive (with the parity drive beeing different in every stripe in RAID5).
So you have to solve these equations whilst also satisfying the other criteria I mentioned in my previous post
Now I understand why people do trial and error .
However I cannot find any good solutions satisfying all of these with integer solutions at the same time (everything in bytes now):
Specificaly the one failing is (3) failing under the conditions of (5) and (6).
I simplified them a little and came out with:
Stripe unit size / Cluster size = (Nbrofdrives) / (Nbrofdrives-1).
|Nbrofdrives||Stripe unit size / Cluster size|
This has no integer solutions above 2.
So which one should be laxed? I assume you cannot physically have non integer chunksizes. I propose that not enforcing (3) Stripe unit size / Cluster size = integer probably not hurting as much as the other ones. This way at least your partitions are aligned to the stripes and you do have reasonable chunksizes.
This tells me that 3,5 and 6 drives are at least possible since they have at least possible ratios of stripe and cluster size.
With three drives you can use a 48kb Stripe with a 32k cluster. But I had to multiply the default partition offset of 127MB with 3 (381MB) to satisfy the other rules
Is this a correct understanding?
But as stated you get different integer solutions for Stripe/Cluster/Partition-offset-combinations depending on the number of drives.
I am writing a small program to find solutions for this problem.
Hmm I cant get the numbers to work for my theory The only amount of drives that yields good results are 4 or 8 drives
I guess the wiki is wrong and stripe is as the actual chunksegment on each disk? This is the definition I was tought.
Stripe-size is the size of the individual segment and Stripe-width is the number of drives.
Intel RST uses both write cache and a write-coalescer. The problem must be in the implementation of them.
I read up more on the net and it seems that when writing larger files (as I am doing) the system should shift to a full stripe writing strategy and should not be impacted by the problems I am seeing. A lot of other tests have been done where cluster size does not impact file performance appreciably. Also stripe size should not affect performance to such a huge degree as we are seeing here?
IMHO, you are trying too hard, looking too deep. Forget what you are reading all over the net, there is a lot of information that looks important, but doesn't pertain to your specific situation. Drop a drive off your array and bench the array keeping all other settings the same, see if it improves in any area. If the capacity hit is more important than the difference in write speed you get, stick with 4 drives. If you stick with 4 drives, there is no combination of settings that will write to the drives in array evenly, regardless of what your forumulas tell you. I was going to try to explain why (again) late last night, but lost the post I was trying to write and gave up. Try excluding the parity block from your stripe size calculations, don't ask why, just see if that helps. Use something plain for testing like 32KB Cluster/64KB Stripe. Play with the partition offset more, I really think your problem lies there. If that still doesn't get what you want, well you could always buy me a plane ticket to whereever, I'll fix it for ya.
OTOH, If you are done banging your head against this particular wall, do a R10. You still lose capacity, but less headaches than R5, and extra redundancy. I would have done R10, had I an even number of drives. Certainly it was simpler than the route I took.
I know this thread is a bit old but I have just read the lot and have a few questions about my latest PC build. I have just got myself an ASRock Z68 Extreme4 Gen3 motherboard with 4 x 3TB drives. I have a SSD drive as my boot drive, so that is separate but I want to run the 4 x 3TB Western Digital Green drives in RAID5.
Initially I set up the RAID5 from the configuration screen just after the BIOS (64kb stripe), and then in Windows 7 x64 I used the IRS software to initialize the drive. When that took abolsutely ages I deleted the volumes and recreated it in IRS with write caching enabled and that too is taking about 75 minutes per 1%. However, before initializing it in IRS I initialized it in Windows so I can see 8.5TB drive in My Computer whilst the initializing through IRS is still going on.
It will take 5 days to initialize the array... is this normal? How do I know what stripe/cluster/offset/alignment etc I need to set and how would I set them?
I am aware of the problem with alignment issues affecting performance and hence use 512k sector drives (2TB Hitachi 7k3000). Write cache is enabled etc.
With this setup I am currently getting around 450-490 MB/s read speed which is great, however my write speed is around 30-40 MB/s?!
I had similar issue with 3 x Hitachi 5k3000 2TB and 2x Samsung HD203WI 2 TB.
Read speed was ~200-300 MB/s and write speed was 25-30 MB/s.
All I had to do was to change stripe size from 128KB to 64KB and magically read speed increased to ~350MB/s and sustained write speed jumped up to ~170MB/s, which is about twice the speed of a single disk and also twice the speed I need, since I will only feed the raid from single disks.
I did some additional tests and the only way I was able to get really poor performance was by again seting stripe size to larger than 64KB or inactivating write cache. Number of disks used and other setings changed made no relevant diference.
My very simple conclusion is the ichr10 cannot handle 128KB stripe size without extreme write penalty.
I thought I'd post the numbers from one of the raid-5 arrays that serve my files at home.
The array consists of 4 2 TB seagate 5900 RPM disks using a strip size of 128K and a 4K cluster size on the filesystem.
The individual disks will bench (at the beginning of the disk) at about 100 MB/sec for seq reads and about 70 MB/sec for seq writes.
When in the aforementioned RAID config, I am getting 235 MB/sec for my reads and 110 MB/sec for my writes.
Seeing as how my write performance is greater than a single disk, I am thrilled, as I know others have encountered abysmal write scores that didn't even muster the bandwidth of an individual disk.
The theoretcial maximum here is 300 MB/sec read and 210 MB/sec write, so writes are doing a lot worse than reads.
Wish the write perf wasn't so sub-par, but I can live with it.
When my other raid-5 array finishes initializing I'll post its performance numbers as well.
It's almost the same except that it uses a strip size of 64K.
Here are the benchmarks for the other raid with the 64k strip size.
This is better than the 128k strip size, as write performance is 43% better.
Read perf = 86% (258/300) of theoretical maximum (per filebench)
Write perf = 74% (156/210) of theoretical maximum (per filebench)
I can live with these numbers.
Are they great? No.
Are they okay? I think so. Good enough for the file services they provide anyways.
Maybe for Xmas I'll buy myself another 2 hard drives (one for each array) and see where the numbers go for a 5 disk RAID-5.