THE RAID 5 thread

DenSterDenSter Member Posts: 8,307
I've had it, this RAID 5 thing is driving me nuts! When I went to training, the Navision trainer (someone who actually was one of THE Navision developers in the US) told us that RAID 5 was the cause of database corruption, and was a performance killer in every case, so it is bad to use this. Since then I have tried and failed to get official documentation on this. It appears that database corruption due to RAID 5 is an urban legend, as far as I can tell it never happened.

Now what I am hearing is that over the years RAID 5 (the hardware kind) has improved to such extent that it has become the preferred RAID level for small NAV implementations (up to 30-40 users). This is according to every hardware guy I ever talked to, and it is apparently the official recommendation by the Microsoft Business Systems Architecture group, which is supposedly THE place to go for Microsoft hardware recommendations.

Since there doesn't seem to be any documentation supporting the database corruption, and there is no official benchmark document either (at least not that MSFT is willing to share), I am most often not able to win the argument not to use RAID 5. And to top it all off, I have seen very impressive performance on this RAID level, so I am about to go to the dark side and accept RAID 5 as an acceptable setup. If both Microsoft and just about every hardware guy is recommending RAID 5 for NAV, then how are we supposed to be credible when we say it is not the right setup?

What I would like to see from MSFT is a white paper dedicated to RAID 5 and NAV specifically. I want to know the definitive answer as to if and to what extent RAID 5 can be used. The Microsoft Business Systems Architecture group is recommending RAID 5 for small setups (they are saying that this is preferred over any other RAID set up for implementations to up to 30-40 users), and I am losing arguments with customers as a result. There has to be a cutoff point that they know about, and they must have done some knd of testing to come to that number, and I would like to read actual documentation.

I would like to read about all the ins and outs, all the advantages and disadvantages, where a break even point or cutover point is for this RAID level. There seems to be a difference between software RAID and hardware RAID, and I want to know which one is OK for NAV.

Feel free to add to this thread, I've been rambling too long now anyway :). Maybe if we would all ask for this, then they may feel some pressure and actually share some information about this.
«1

Comments

  • zeninolegzeninoleg Member Posts: 236
    I totally agree with DenSter. Naision is very complicated system and performance is extremly important. Right now there are different ways to setup the system but who knows what is the optimal way is? I think MSFT must spend time testing system on various configurations and provide us with the results. It is kinda dangerous to play a guessing game with setups. Moreover it is very time consuming to test every possible setup(to say nothing about the fact that the customer is just thinking about purchsing hardware and does not know which route to go).
    I understand that it is impossible to test every posiisble scenario from MSFT side, but how are WE supposed to do that. Unfortunatelly sometimes it looks like "lets get the product our, see how it works in real life and we will document it". So WE are the ones who are testing different scenarious and providing MSFT with results. Is it supposed to be other way around?
    Best Regards,
    Oleg
  • davmac1davmac1 Member Posts: 1,283
    I have never seen a problem with modern RAID 5.
    It is a great way to provide reliability to small sites who do not have a big IT infrastucture.
    I have seen Navision on occasion do things in SQL Server that would kill the performance of any configuration, but for everything else it works fine.

    For larger installations, you need to get into performance tuning. For smaller ones, RAID 5 is fine, with the next stpe up being mirrored drives for Windows and for log files.
  • thaugthaug Member Posts: 106
    While the Tuning Navision for Better Performance doc doesn't explicitly state that RAID 5 should not be used, however, look at page 4. They explain various RAID scenarios, and they also say RAID 10 recommended. Under RAID 5 they say "File/Application /Internet Servers - Archive systems (with numerous read transactions, few write transactions)". Certainly not applicable for a database environment.

    However, they need to update this document. They include no recommendations for SQL 2005 and Nav 4.0 (not surprising given the date this doc was written).

    Some more information here:
    http://www.sql-server-performance.com/s ... audit3.asp
    If you read through this, check out this little tidbit:
    Testing at Microsoft has found that RAID 5 can be as much as 50% slower than using RAID 10.

    And a side rant here. They explain how certain things should be coded optimally for SQL. They rattle off issues that are known, basically how often used posting routines basically don't follow these optimal rules. But yet, instead of actually *FIXING* the code, they expect everyone else to do it, with different results for everyone.
    There is no data, only bool!
  • krikikriki Member, Moderator Posts: 9,116
    DenSter wrote:
    There seems to be a difference between software RAID and hardware RAID, and I want to know which one is OK for NAV.
    What I always heard is that hardware RAID5 is better and faster.
    I think that when a DB is small enough with not to many peaks in writing, RAID5 can be good enough, because the exessive writes are caught by the COMMIT-cache of Navision and the logfile of SQL. So when the disks have time they can write it to the DB.
    For a DB that does a lot of reading and less writing, it can be vantagious, because the reads are divided between the disks. This ONLY if MOST of the DB is read and always the same records. If the same records are read (most of the time it is like that), the records come from the DBcache, and NOT from the disks.

    And about writing to disks : remember this for RAID5 : BEFORE a write , a RAID5 has ALWAYS to do a read! If your disks are not occuppied, you won't see the difference in speed, but if your disks are already stressed, this makes it worse!
    Regards,Alain Krikilion
    No PM,please use the forum. || May the <SOLVED>-attribute be in your title!


  • SavatageSavatage Member Posts: 7,142
    There is a TechKnowledge Doc that states "Never use Raid 5"
    I posted it here...
    http://www.mibuso.com/forum/viewtopic.php?p=12809#12809

    Here's another good rescource on raids
    http://www.acnc.com/04_00.html

    Maybe the recommendation was due to the fact that if one of the disks go down on a Raid5 it's a bigger headache.
  • Alex_ChowAlex_Chow Member Posts: 5,063
    Personally, I can attest that RAID 5 suffers from performance problems. We had a client installation switch from RAID 5 to RAID 10 and there was an extreme difference on the performance.

    Never had any experience with data corruption problems when the client is on the standard Navision database. I remembe reading a document a while ago (before Navision got bought out) that guarentees that its database never crashes or become corrupt (conditions apply of course).

    As a consultant, we can only recommend what's we think is the best. Ultimately, we have to work with whatever the client ends up choosing.
  • bbrownbbrown Member Posts: 3,268
    The majority of today's hardware sales are made over the internet (or phone). Typically either the person selling or the person buying (usually both) has very little knowledge (or desire to) of the applications requirements. This along with the low profit margins tends to push a "sell the cheapest" approach. Back in the days when we were selling servers with 40% margins, it was possible to spend a little time working out some options. Today's sales approach is sell and move on. A server with RAID 5 can be cheaper and to the average end-user seem to have the same spec.

    Consider a requirement for a system with 100 GB of usable space. With RAID 5 this could be build using 4 36 GB drives. With RAID 10 that would take 6 36 GB drives. That's a 50% increase in drive cost. Not to metion the more expense controller as many cheaper controller do not support RAID 10.

    Now from a performance standpoint RAID 10 provides better write performance than RAID 5. With the same number of disk their read performance is about the same. With some of the higher-end (more expensive) controllers some of the RAID 5 overhead is off-loaded to the controller's processor improving performance.

    Considering the above RAID 5 is usually not the best choice for today's transaction processing systems. However it is useful in systems that do not see a lot of write activity. I have used it to support archive databases for report purposes and alos as the receiving end for SQL log shipping.
    There are no bugs - only undocumented features.
  • thaugthaug Member Posts: 106
    Software RAID should never be used, which really shouldn't be a problem at all with any modern equipment. Even my home PC has hardware RAID built in to the motherboard, using SATA drives.

    And unless you are buying 20 drives for a SAN enclosure, the extra cost of a few extra drives to go from RAID 5 to RAID 10 is marginal.

    I've just ordered a new SQL server to replace the aging one we have. Many different models are offered, and some even allow you to use SATA or SAS drives, all with RAID capabilities. However, in the spec that I designed, I made sure that I used only 15K SCSI drives, and RAID 10.
    There is no data, only bool!
  • bbrownbbrown Member Posts: 3,268
    And unless you are buying 20 drives for a SAN enclosure, the extra cost of a few extra drives to go from RAID 5 to RAID 10 is marginal.

    The customer does not view these as marginal. Adding $500 in cost to a $4000 server will be viewed as a sizable increase by the customer.

    I have seen $100,000+ deals killed because the price on a laser printer was $100 higher then the competitor.
    There are no bugs - only undocumented features.
  • SavatageSavatage Member Posts: 7,142
    **Please Note: We do NOT recommend using a RAID 5 configuration with NAV. This is because it can cause performance problems on the server. The RAID 5 environment does not allow the NAV DBMS to effectively stripe the data over drives. The drives appear as one large drive, and there is no way to guarantee that the database files will be spread over multiple drives. Also, if a drive in RAID 5 fails and has to be replaced, the database can be corrupted. This is because the system will have to rebuild the files from the parity partitions on the RAID. If the RAID file (drive) is not recreated perfectly, the DBMS may not correctly use it. If the RAID 5 hardware was designed to work with Databases, this may not be the case, but it is something you would need to check out with the Vendor before implementing it.

    For further information on RAID and NAV, check out the NAV Hardware Sizing guide on PartnerSource:

    https://mbs.microsoft.com/partnersource ... page=false

    Best regards,
    Tom Brownell

    Microsoft Online Support Engineer
  • Joe_MathisJoe_Mathis Member Posts: 173
    First, :roll: I realize that this is a very old thread.

    But as it's still applicable... I hope it will be tolerated [-o< , I just had this discussion with a client (again)


    Is there a new KB on this?

    The link provided has expired, but this one http://support.microsoft.com/kb/872402#top

    Shows that it is retired and applicable to 3.7.

    I continue to lose the argument because it doesn't specifically say NO RAID 5 anywhere current that I can find.

    I sometimes try math, but it doesnt' always work.

    First I talk IOPS
    IOPS = amount of time it takes to service a 0 byte read or write request.

    Then I talk about disk speeds,
    Most 7.2K RPM SATA 6.0Gb/s drives have an average latency of about 4ms
    SAS 15K RPM drives have about a 2ms avg latency.

    1 IOP = 1/seconds per IO

    SATA = 1/0.004
    = 250 IOPS

    SAS = 1/0.002
    = 500 IOPS

    and just to include them, SSD = 32000 IOPS, but they are expensive and if they are in consideration, then we probably aren't having this conversation anyways... :D

    IF you run the disks at 100% then you start to introduce queing on the disk, which will kill your SQL server, so you want to try keep the IOPS down to about 75-80%.
    So the SATA is best suited to 200 IOPS before it has problems. 400IOPS for the faster SAS.

    You can run Perfmon and monitor the Disk Transfer/Second to figure out what IOPS you currently need (or make an estimate using an existing like sized client :mrgreen: )

    RAID 0 = IOP benefit, because you spread across multiple disk so you get increased IOP.

    RAID 1 = No IOP benefit, but you get fault tolerance.

    Now you state that the RAID 5 has a penalty associated with it.
    RAID 5 = 2x IOP read penalty, 4x IOP write penalty.

    For every read, you need 2 IOP, 1 to read original data and 1 to read parity data.
    For every write you need 4 IOP, 1 to read original data ,1 to read parity data, time to compare the original data and the write data, time to calculate new parity based on the comparison results, 1 IOP to write the data, 1 IOP to write the new parity.

    As far as fault tolerance, RAID 5 is good, but you are going to definately take a hit performance wise if you implement it.

    Then I of course recommend:
    RAID 1+0 for data, logs and tempdb
    RAID 1 for the OS and SQL binaries.
    RAID 5 for backups.

    Finally I ask, "How much does you employee waiting for the database to finish processing cost?"

    Sources:
    http://www.perftuning.com/images/white_papers/PTC%20Whitepaper_Overview%20of%20IO%20Performance%20and%20RAID.pdf
    http://msdn.microsoft.com/en-us/library/ms190764.aspx
  • DenSterDenSter Member Posts: 8,307
    That's funny, just today there was an excellent article about RAID on SQL Server Central: http://www.sqlservercentral.com/articles/RAID/88945/

    And YES it is definitely still important. And YES it is still not understood by many IT people. And YES it is HIGHLY irritating to have a discussion with an IT guy about RAID 5, and even though you have about a decade of experience with performance issues, they still think RAID 5 is an acceptable alternative. Another one of my favorites is when they don't mention that they have all servers virtualized.

    The most important thing here is to explain effectively that what matters is throughput, and not just disk capacity. The thing with RAID 5 is that as you increase the load on the system, there is a sudden dramatic decrease in performance (meaning "all of a sudden your system performance goes down the drain"). At some point, parity calculation will interfere with other system functions, and when it does (note that I did not say 'if') it will completely take your system down.

    Make sure to put in writing that if this customer puts in place RAID 5 against your recommendations, that they will never argue billing for addressing performance problems.
  • Joe_MathisJoe_Mathis Member Posts: 173
    Make sure to put in writing that if this customer puts in place RAID 5 against your recommendations, that they will never argue billing for addressing performance problems.

    Well not for long hopefully... :mrgreen:

    But nothing new from Microsoft? I couldn't find anything saying "NEVER" which would be great, but even an "Advise Against" would be good. [-o<

    I'm sure it's out there in the Knowledge base, probably the article right after the Cure for Cancer. :D
  • DenSterDenSter Member Posts: 8,307
    Nothing "official" but on the NAV team blog there's a bunch of good articles about this. One in particular that I always refer people to. It's about 2 years old now too. The only thing I don't agree with is the autostats recommendation, but that's details :)

    http://blogs.msdn.com/b/nav/archive/201 ... tions.aspx
  • David_SingletonDavid_Singleton Member Posts: 5,479
    IMO in terms of performance per $:

    Small implementations: It doesn't matter.
    Medium implementations: RAID 10 is faster, but RAID 5 can be made to work
    Large implementations: RAID 10.
    VERY large implementations RAID 5 is faster, but RAID 10 can be made to work.
    David Singleton
  • davmac1davmac1 Member Posts: 1,283
    From a real world example - I had a customer who had already been sold a RAID5 server with 4GB or RAM running SQL Server 2005 64 bit Standard. (Database was a few GB - definitely not large.)
    Since they did not want to re-do the system and the performance with 30 users was bad, they bumped the RAM from 4GB to 12GB and the performance then improved to acceptable.
    So a lot of RAM overcame the performance issues. Worked for them with 30 users - may not work for everybody - especially a lot more users.

    Personally, I think they got lucky.
  • DenSterDenSter Member Posts: 8,307
    VERY large implementations RAID 5 is faster, but RAID 10 can be made to work.
    Really? At some point systems grow so big that RAID 5 outperforms RAID 10? That's so counter intuitive, explain that one please.
  • SogSog Member Posts: 1,023
    DenSter wrote:
    VERY large implementations RAID 5 is faster, but RAID 10 can be made to work.
    Really? At some point systems grow so big that RAID 5 outperforms RAID 10? That's so counter intuitive, explain that one please.
    David explicitely says (performance/$), not performance itself.
    |Pressing F1 is so much faster than opening your browser|
    |To-Increase|
  • David_SingletonDavid_Singleton Member Posts: 5,479
    Sog wrote:
    DenSter wrote:
    VERY large implementations RAID 5 is faster, but RAID 10 can be made to work.
    Really? At some point systems grow so big that RAID 5 outperforms RAID 10? That's so counter intuitive, explain that one please.
    David explicitely says (performance/$), not performance itself.

    Exactly.

    In smaller systems, budget is not as significant. The cost between an OK and Ideal system may be 20% more. In very big systems, the cost difference can be more than double. So a RAID 5 SAN might cost $100,000 but the equivalent performance in RAID 10 might cost $200,000.
    David Singleton
  • DenSterDenSter Member Posts: 8,307
    Sog wrote:
    DenSter wrote:
    VERY large implementations RAID 5 is faster, but RAID 10 can be made to work.
    Really? At some point systems grow so big that RAID 5 outperforms RAID 10? That's so counter intuitive, explain that one please.
    David explicitely says (performance/$), not performance itself.
    No he explicitly says "RAID 5 is faster", which has nothing to do with cost. I never think about cost when I look at performance problems, so I am interested to hear how you can make a RAID 5 setup go faster than RAID 10, when the opposite has always been the case in my experience.
  • DenSterDenSter Member Posts: 8,307
    In very big systems, the cost difference can be more than double. So a RAID 5 SAN might cost $100,000 but the equivalent performance in RAID 10 might cost $200,000.
    So you'd just have a HUGE raid 5 SAN as opposed to a moderately sized RAID 10 one, but because of the scale you start saving money? Isn't the performance dropoff a big risk when users are added and/or transaction volume grows more?
  • David_SingletonDavid_Singleton Member Posts: 5,479
    DenSter wrote:
    I never think about cost when I look at performance problems,

    In my experience Cost is the single most important factor for a customer. Also it generally ranks as the second third and forth most important factors in some cases. :mrgreen:

    I have never had a customer say "Cost is not a factor". I would probably be out of a job if customers thought that way.
    David Singleton
  • David_SingletonDavid_Singleton Member Posts: 5,479
    An analogy I often use is to ask "What is faster, a Ferrari or a Dodge Caravan" the answer is "it depends" If the question is a need to get 6 children to school in the morning, and they live in various dispersed locations. Then the fastest solution is to buy 6 Ferraris and hire 5 drivers to take each child to school as fast as possible. But even then each Ferrari can't go faster the 35 mph, so its only advantage is the acceleration in hope that it gets to the next traffic light before it turns red.

    In reality the better solution is to stick with the mini van and focus your efforts on getting the kids into schools that are closer or at least on the way to one another. Spend more effort on the logistics than worrying about having the fastest car.
    David Singleton
  • DenSterDenSter Member Posts: 8,307
    edited 2012-05-02
    Yeah and when you get that minivan, you'd hope it doesn't have a 50HP 4 beater that's not strong enough to haul 10 kids to soccer practice. Don't patronize me David, it's not necessary. When you first get involved with a customer that has performance problems, the first question is never about cost and you know it. The question about cost doesn't come into play until you start talking about solutions.

    Can you please go into the RAID 5 / RAID 10 question that I asked? How can you make a RAID 5 work on larger implementations? What are the keys to look for? In MY experience, it's the parity calculations that make RAID 5 impossible at some point. Obviously you have a different experience, so I am asking you to elaborate a little bit, to explain how that can work. At what point do you say RAID 5 comes back into play?

    I am not saying you are wrong, I am asking because I am genuinely interested in this. One of my customers did make RAID 5 work but it came at a big price because they had to get a huge SAN to make the system big enough to deal with parity. I can see how in REALLY large implementations there is a break even point, and so I'd like to know more about that.
  • David_SingletonDavid_Singleton Member Posts: 5,479
    DenSter wrote:
    Don't patronize me David, it's not necessary.

    I at no point intended to be patronizing, and no insult was intended and I apologize if you felt that way.

    But to avoid further misunderstandings I will drop out of this discussion.
    David Singleton
  • DenSterDenSter Member Posts: 8,307
    I at no point intended to be patronizing, and no insult was intended and I apologize if you felt that way.

    But to avoid further misunderstandings I will drop out of this discussion.
    Why drop out? Just spare me the "tell it to me like I am a child" metaphors and explain how to make RAID 5 work in larger implementations. We've all always advised against RAID 5, you included, and you've obviously come into new insights that I would love to know about.
  • SogSog Member Posts: 1,023
    If you don't mind me taking me another shot at this.
    RAID 5 < RAID 10 for NAV database. Always.
    However, as your RAID volume grows and it will always be 1 disk extra for RAID 5 and 2 for RAID 10.
    This is a lineair cost issue, which will eventually bend the performance/$ towards RAID5.
    This is always viewed from the same starting point. So you can't compare a huge RAID5 SAN vs a middle sized RAID10 SAN.

    Denster, I truly believe you've interpreted David's post wrong. (Else I did and c'mon, I've never made mistakes :roll: )
    However, how would NAV perform on other than the beforementioned RAID setups?
    RAID 6 should be similar to RAID 5 performance, but what about RAID 50 (more speed) RAID 51 (more reliable, but equal performance)
    |Pressing F1 is so much faster than opening your browser|
    |To-Increase|
  • DenSterDenSter Member Posts: 8,307
    Sog wrote:
    However, as your RAID volume grows and it will always be 1 disk extra for RAID 5 and 2 for RAID 10.
    This is a lineair cost issue, which will eventually bend the performance/$ towards RAID5.
    This is always viewed from the same starting point. So you can't compare a huge RAID5 SAN vs a middle sized RAID10 SAN.
    I get the math. What I am not sure about is whether 1 extra disk in a RAID 5 SAN will give you the same improvement as 2 extra disks in a RAID 10 SAN. I just don't think that the performance change is as linear as the cost.

    Say you have 6 disks. Say with RAID 5 you have absolutely crap performance, and when you change it to RAID 10 performance is really good. If you are dead set on keeping it a RAID 5 you would have to add a certain number of disks to get the performance up to acceptable levels. I don't know how many you would need (per case calculation I suppose) but I would expect quite a few. I also don't know how many you would need to add once that same disk array starts having performance problems. Because we always advise against RAID 5, this is just something we don't know.

    Obviously there is a certain break even point when the system grows really large, so I'm interested in learning what the factors are. Frankly, I would never even entertain the thought of re-introducing RAID 5 as a factor, so to me that's out of the box, which is of course good.
  • krikikriki Member, Moderator Posts: 9,116
    It all also depends on what is behind the RAID. If you have a SAN with a writecache of some GB, than the SAN-writecache takes on the writes-peaks of the SQL Server (works less or more like the COMMIT-cache of the native DB-server). SQL thinks it is written to disk and if the writing-peak is NOT sustained, the SAN has time to write it to disk.
    But if you have a sustained heavy writing to disk, the SAN write-cache will fill-up after a while and then you get crappy performance (again:works less or more like the COMMIT-cache of the native DB-server).
    Regards,Alain Krikilion
    No PM,please use the forum. || May the <SOLVED>-attribute be in your title!


  • Joe_MathisJoe_Mathis Member Posts: 173
    davmac1 wrote:
    So a lot of RAM overcame the performance issues. Worked for them with 30 users - may not work for everybody - especially a lot more users.

    Personally, I think they got lucky.

    The client has 25 users and an in place RAID 5 server, plenty of RAM at 16GB, no SQL on it yet though. If (or when) we start having performance issues with them, I will put an update here with the IOPS and a more detailed build of the server. They plan on using the current server configuration... :?

    Daniel that article has a lot of good information, Thanks!

    It would be great if anybody comes across a RAID 5 implementation that's performing very poorly, that they provide IOPS, server config and user count so that we can start to get a picture of what limitations it has, and we can avoid sweeping generalizations.
Sign In or Register to comment.