I've had it, this RAID 5 thing is driving me nuts! When I went to training, the Navision trainer (someone who actually was one of THE Navision developers in the US) told us that RAID 5 was the cause of database corruption, and was a performance killer in every case, so it is bad to use this. Since then I have tried and failed to get official documentation on this. It appears that database corruption due to RAID 5 is an urban legend, as far as I can tell it never happened.
Now what I am hearing is that over the years RAID 5 (the hardware kind) has improved to such extent that it has become the preferred RAID level for small NAV implementations (up to 30-40 users). This is according to every hardware guy I ever talked to, and it is apparently the official recommendation by the Microsoft Business Systems Architecture group, which is supposedly THE place to go for Microsoft hardware recommendations.
Since there doesn't seem to be any documentation supporting the database corruption, and there is no official benchmark document either (at least not that MSFT is willing to share), I am most often not able to win the argument not to use RAID 5. And to top it all off, I have seen very impressive performance on this RAID level, so I am about to go to the dark side and accept RAID 5 as an acceptable setup. If both Microsoft and just about every hardware guy is recommending RAID 5 for NAV, then how are we supposed to be credible when we say it is not the right setup?
What I would like to see from MSFT is a white paper dedicated to RAID 5 and NAV specifically. I want to know the definitive answer as to if and to what extent RAID 5 can be used. The Microsoft Business Systems Architecture group is recommending RAID 5 for small setups (they are saying that this is preferred over any other RAID set up for implementations to up to 30-40 users), and I am losing arguments with customers as a result. There has to be a cutoff point that they know about, and they must have done some knd of testing to come to that number, and I would like to read actual documentation.
I would like to read about all the ins and outs, all the advantages and disadvantages, where a break even point or cutover point is for this RAID level. There seems to be a difference between software RAID and hardware RAID, and I want to know which one is OK for NAV.
Feel free to add to this thread, I've been rambling too long now anyway
. Maybe if we would all ask for this, then they may feel some pressure and actually share some information about this.
Comments
I understand that it is impossible to test every posiisble scenario from MSFT side, but how are WE supposed to do that. Unfortunatelly sometimes it looks like "lets get the product our, see how it works in real life and we will document it". So WE are the ones who are testing different scenarious and providing MSFT with results. Is it supposed to be other way around?
Oleg
It is a great way to provide reliability to small sites who do not have a big IT infrastucture.
I have seen Navision on occasion do things in SQL Server that would kill the performance of any configuration, but for everything else it works fine.
For larger installations, you need to get into performance tuning. For smaller ones, RAID 5 is fine, with the next stpe up being mirrored drives for Windows and for log files.
http://mibuso.com/blogs/davidmachanick/
However, they need to update this document. They include no recommendations for SQL 2005 and Nav 4.0 (not surprising given the date this doc was written).
Some more information here:
http://www.sql-server-performance.com/s ... audit3.asp
If you read through this, check out this little tidbit:
And a side rant here. They explain how certain things should be coded optimally for SQL. They rattle off issues that are known, basically how often used posting routines basically don't follow these optimal rules. But yet, instead of actually *FIXING* the code, they expect everyone else to do it, with different results for everyone.
I think that when a DB is small enough with not to many peaks in writing, RAID5 can be good enough, because the exessive writes are caught by the COMMIT-cache of Navision and the logfile of SQL. So when the disks have time they can write it to the DB.
For a DB that does a lot of reading and less writing, it can be vantagious, because the reads are divided between the disks. This ONLY if MOST of the DB is read and always the same records. If the same records are read (most of the time it is like that), the records come from the DBcache, and NOT from the disks.
And about writing to disks : remember this for RAID5 : BEFORE a write , a RAID5 has ALWAYS to do a read! If your disks are not occuppied, you won't see the difference in speed, but if your disks are already stressed, this makes it worse!
No PM,please use the forum. || May the <SOLVED>-attribute be in your title!
I posted it here...
http://www.mibuso.com/forum/viewtopic.php?p=12809#12809
Here's another good rescource on raids
http://www.acnc.com/04_00.html
Maybe the recommendation was due to the fact that if one of the disks go down on a Raid5 it's a bigger headache.
http://www.BiloBeauty.com
http://www.autismspeaks.org
Never had any experience with data corruption problems when the client is on the standard Navision database. I remembe reading a document a while ago (before Navision got bought out) that guarentees that its database never crashes or become corrupt (conditions apply of course).
As a consultant, we can only recommend what's we think is the best. Ultimately, we have to work with whatever the client ends up choosing.
AP Commerce, Inc. = where I work
Getting Started with Dynamics NAV 2013 Application Development = my book
Implementing Microsoft Dynamics NAV - 3rd Edition = my 2nd book
Consider a requirement for a system with 100 GB of usable space. With RAID 5 this could be build using 4 36 GB drives. With RAID 10 that would take 6 36 GB drives. That's a 50% increase in drive cost. Not to metion the more expense controller as many cheaper controller do not support RAID 10.
Now from a performance standpoint RAID 10 provides better write performance than RAID 5. With the same number of disk their read performance is about the same. With some of the higher-end (more expensive) controllers some of the RAID 5 overhead is off-loaded to the controller's processor improving performance.
Considering the above RAID 5 is usually not the best choice for today's transaction processing systems. However it is useful in systems that do not see a lot of write activity. I have used it to support archive databases for report purposes and alos as the receiving end for SQL log shipping.
And unless you are buying 20 drives for a SAN enclosure, the extra cost of a few extra drives to go from RAID 5 to RAID 10 is marginal.
I've just ordered a new SQL server to replace the aging one we have. Many different models are offered, and some even allow you to use SATA or SAS drives, all with RAID capabilities. However, in the spec that I designed, I made sure that I used only 15K SCSI drives, and RAID 10.
The customer does not view these as marginal. Adding $500 in cost to a $4000 server will be viewed as a sizable increase by the customer.
I have seen $100,000+ deals killed because the price on a laser printer was $100 higher then the competitor.
http://www.BiloBeauty.com
http://www.autismspeaks.org
But as it's still applicable... I hope it will be tolerated [-o< , I just had this discussion with a client (again)
Is there a new KB on this?
The link provided has expired, but this one http://support.microsoft.com/kb/872402#top
Shows that it is retired and applicable to 3.7.
I continue to lose the argument because it doesn't specifically say NO RAID 5 anywhere current that I can find.
I sometimes try math, but it doesnt' always work.
First I talk IOPS
IOPS = amount of time it takes to service a 0 byte read or write request.
Then I talk about disk speeds,
Most 7.2K RPM SATA 6.0Gb/s drives have an average latency of about 4ms
SAS 15K RPM drives have about a 2ms avg latency.
1 IOP = 1/seconds per IO
SATA = 1/0.004
= 250 IOPS
SAS = 1/0.002
= 500 IOPS
and just to include them, SSD = 32000 IOPS, but they are expensive and if they are in consideration, then we probably aren't having this conversation anyways...
IF you run the disks at 100% then you start to introduce queing on the disk, which will kill your SQL server, so you want to try keep the IOPS down to about 75-80%.
So the SATA is best suited to 200 IOPS before it has problems. 400IOPS for the faster SAS.
You can run Perfmon and monitor the Disk Transfer/Second to figure out what IOPS you currently need (or make an estimate using an existing like sized client )
RAID 0 = IOP benefit, because you spread across multiple disk so you get increased IOP.
RAID 1 = No IOP benefit, but you get fault tolerance.
Now you state that the RAID 5 has a penalty associated with it.
RAID 5 = 2x IOP read penalty, 4x IOP write penalty.
For every read, you need 2 IOP, 1 to read original data and 1 to read parity data.
For every write you need 4 IOP, 1 to read original data ,1 to read parity data, time to compare the original data and the write data, time to calculate new parity based on the comparison results, 1 IOP to write the data, 1 IOP to write the new parity.
As far as fault tolerance, RAID 5 is good, but you are going to definately take a hit performance wise if you implement it.
Then I of course recommend:
RAID 1+0 for data, logs and tempdb
RAID 1 for the OS and SQL binaries.
RAID 5 for backups.
Finally I ask, "How much does you employee waiting for the database to finish processing cost?"
Sources:
http://www.perftuning.com/images/white_papers/PTC%20Whitepaper_Overview%20of%20IO%20Performance%20and%20RAID.pdf
http://msdn.microsoft.com/en-us/library/ms190764.aspx
http://www.interdynbmi.com
And YES it is definitely still important. And YES it is still not understood by many IT people. And YES it is HIGHLY irritating to have a discussion with an IT guy about RAID 5, and even though you have about a decade of experience with performance issues, they still think RAID 5 is an acceptable alternative. Another one of my favorites is when they don't mention that they have all servers virtualized.
The most important thing here is to explain effectively that what matters is throughput, and not just disk capacity. The thing with RAID 5 is that as you increase the load on the system, there is a sudden dramatic decrease in performance (meaning "all of a sudden your system performance goes down the drain"). At some point, parity calculation will interfere with other system functions, and when it does (note that I did not say 'if') it will completely take your system down.
Make sure to put in writing that if this customer puts in place RAID 5 against your recommendations, that they will never argue billing for addressing performance problems.
RIS Plus, LLC
Well not for long hopefully...
But nothing new from Microsoft? I couldn't find anything saying "NEVER" which would be great, but even an "Advise Against" would be good. [-o<
I'm sure it's out there in the Knowledge base, probably the article right after the Cure for Cancer.
http://www.interdynbmi.com
http://blogs.msdn.com/b/nav/archive/201 ... tions.aspx
RIS Plus, LLC
Small implementations: It doesn't matter.
Medium implementations: RAID 10 is faster, but RAID 5 can be made to work
Large implementations: RAID 10.
VERY large implementations RAID 5 is faster, but RAID 10 can be made to work.
Since they did not want to re-do the system and the performance with 30 users was bad, they bumped the RAM from 4GB to 12GB and the performance then improved to acceptable.
So a lot of RAM overcame the performance issues. Worked for them with 30 users - may not work for everybody - especially a lot more users.
Personally, I think they got lucky.
http://mibuso.com/blogs/davidmachanick/
RIS Plus, LLC
|To-Increase|
Exactly.
In smaller systems, budget is not as significant. The cost between an OK and Ideal system may be 20% more. In very big systems, the cost difference can be more than double. So a RAID 5 SAN might cost $100,000 but the equivalent performance in RAID 10 might cost $200,000.
RIS Plus, LLC
RIS Plus, LLC
In my experience Cost is the single most important factor for a customer. Also it generally ranks as the second third and forth most important factors in some cases.
I have never had a customer say "Cost is not a factor". I would probably be out of a job if customers thought that way.
In reality the better solution is to stick with the mini van and focus your efforts on getting the kids into schools that are closer or at least on the way to one another. Spend more effort on the logistics than worrying about having the fastest car.
Can you please go into the RAID 5 / RAID 10 question that I asked? How can you make a RAID 5 work on larger implementations? What are the keys to look for? In MY experience, it's the parity calculations that make RAID 5 impossible at some point. Obviously you have a different experience, so I am asking you to elaborate a little bit, to explain how that can work. At what point do you say RAID 5 comes back into play?
I am not saying you are wrong, I am asking because I am genuinely interested in this. One of my customers did make RAID 5 work but it came at a big price because they had to get a huge SAN to make the system big enough to deal with parity. I can see how in REALLY large implementations there is a break even point, and so I'd like to know more about that.
RIS Plus, LLC
I at no point intended to be patronizing, and no insult was intended and I apologize if you felt that way.
But to avoid further misunderstandings I will drop out of this discussion.
RIS Plus, LLC
RAID 5 < RAID 10 for NAV database. Always.
However, as your RAID volume grows and it will always be 1 disk extra for RAID 5 and 2 for RAID 10.
This is a lineair cost issue, which will eventually bend the performance/$ towards RAID5.
This is always viewed from the same starting point. So you can't compare a huge RAID5 SAN vs a middle sized RAID10 SAN.
Denster, I truly believe you've interpreted David's post wrong. (Else I did and c'mon, I've never made mistakes :roll: )
However, how would NAV perform on other than the beforementioned RAID setups?
RAID 6 should be similar to RAID 5 performance, but what about RAID 50 (more speed) RAID 51 (more reliable, but equal performance)
|To-Increase|
Say you have 6 disks. Say with RAID 5 you have absolutely crap performance, and when you change it to RAID 10 performance is really good. If you are dead set on keeping it a RAID 5 you would have to add a certain number of disks to get the performance up to acceptable levels. I don't know how many you would need (per case calculation I suppose) but I would expect quite a few. I also don't know how many you would need to add once that same disk array starts having performance problems. Because we always advise against RAID 5, this is just something we don't know.
Obviously there is a certain break even point when the system grows really large, so I'm interested in learning what the factors are. Frankly, I would never even entertain the thought of re-introducing RAID 5 as a factor, so to me that's out of the box, which is of course good.
RIS Plus, LLC
But if you have a sustained heavy writing to disk, the SAN write-cache will fill-up after a while and then you get crappy performance (again:works less or more like the COMMIT-cache of the native DB-server).
No PM,please use the forum. || May the <SOLVED>-attribute be in your title!
The client has 25 users and an in place RAID 5 server, plenty of RAM at 16GB, no SQL on it yet though. If (or when) we start having performance issues with them, I will put an update here with the IOPS and a more detailed build of the server. They plan on using the current server configuration... :?
Daniel that article has a lot of good information, Thanks!
It would be great if anybody comes across a RAID 5 implementation that's performing very poorly, that they provide IOPS, server config and user count so that we can start to get a picture of what limitations it has, and we can avoid sweeping generalizations.
http://www.interdynbmi.com