NAV 4.0 SP3 Database is getting too big and slow

cfeghali · 2011-08-18

Hello,

This is a very general question, i am not sure what technical information would be required from me to explain the current situation better. In summary, we have a 200GB+ NAV 4 database running on SQL 2005 (32bit) on a Windows 2003 64Bit.
Posting, Cost Routine, warehouse functions, accessing date has gotten slower with time. What's bothering us the most is that table locks now last a lot longer so people are idle too long.

Any recomendations on how we can speed up the database? we're already running the Database on SSD drives, enabled SQL to use more than 4GB of RAM, run daily maintenance and backup plans etc.

Anyway to archive the item ledger table? it is huge, has hundreds of millions of records.
or maybe separate the SQL files? would dividing the database into several SQL files get us better performance?

thank you,

Cesar

bbrown · 2011-08-18

There's likely lots that can be done. The one thing that sticks out to be is the 32 bit SQL. Definitely get on 64 bit.

cfeghali · 2011-08-19

it's painful to move from 32bit to 64bit. there is no inplace upgrade. and our SQL instance seems to be messed up. When i tried to uninstall it fails, actually anything change to the instance fail, cannot even upgrade to SP4. We will install a brand new server and migrate the database, should i stick to 2005 64bit or can i go to SQL 2008 R2 64bit? I want the best performance. we already have the license.

bbrown · 2011-08-19

The 32 bit SQL just happens to be something that stuck out to me. It's a mistake in the original config. Which raises the question of what else is not right. I would not just blindly build a new server

cfeghali · 2011-08-19

So you don't think i would get a boost in performance switching to SQL 64bit? instead of using AWE? what about spliting the database files then?

bbrown · 2011-08-19

Understand the problem before deciding on the solution.

cfeghali · 2011-08-19

Our problem is that NAV 4.0 runs really slow on our setup. it used to be very fast but it got slower and slower with time. I think this is due to the database size (Item leger has millions of records) and it keep growing every day.

For a side warehouse project, we created a brand new database (blank) on the same server. everything runs a lot faster there. One process that usually takes 8 hours in the main database was done in under 2 hours.

Posting sales takes a long time, costing routine is neverending, last time we ran it, we started friday afternoon and it finished sunday late night. (luckily for us, we almost canceled the process)

and since everything takes longer, any table lock happening today generate a lot more disconfort.

I hope I explained the problem better this time. Also i do not have solutions, or a solution. but i would like to start somewhere.

bbrown · 2011-08-21

While the 32 bit SQL is definitely a potential issue, I don't believe it to be the sole factor in the problems you describe. I'd start with a review of your server configuration. Does it meet NAV needs? If not and you intend to build a new server, you don't want to repeat the issues.

Then there's the whole subject of index and code optimization. On which you will find many threads.

MBerger · 2011-08-21

Even though i am a coder myself, i wonder why everyone is jumping to code and/or technical solutions.....why not look at the standard NAV Date compression options ?

bbrown · 2011-08-21

MBerger wrote:

Even though i am a coder myself, i wonder why everyone is jumping to code and/or technical solutions.....why not look at the standard NAV Date compression options ?

That is something I would avoid like the plaque

Sog · 2011-08-22

bbrown wrote:

MBerger wrote:

Even though i am a coder myself, i wonder why everyone is jumping to code and/or technical solutions.....why not look at the standard NAV Date compression options ?

That is something I would avoid like the plaque

Because?
Just saying this without some form of explination is not really helpfull.

Anyway, I'd suggest to upgrade to a newer version of NAV. Because I believe the main problem with this version and lots of data is the calculation of the flowfields. (aka the SIFT tables).

bbrown · 2011-08-22

Why? Because they have been known to create problems. Besides deleting data does not solve performance issues. it just delays them.

I would agree on the upgrade point. But don't expext that to solve the issues by itself.

KYDutchie · 2011-08-22

Sog,

I've had to upgrade a couple of our customers that had done an Item Ledger Entry Date Compression and none of them were able to upgrade to newer versions without any issues. The data upgrade toolkit just does not like the date compression.

And what will be the gain when you do a date compression? For example you have 10 million ILE records. Your database analysis shows you that the problem exists in the last 3 million. Then you are going to date compress the first 4 million and never fix the index/fragmentation issues of the last 3 million. The performance problem still exists.

I agree with you that the best thing to do with a 4.0SP3 database is to bring them up to at least 5.0SP1. That way you get rid of the SIFT tables, however, you will have to go through the code and optimize the code for 5.0SP1.

I hope this helps.

Regards,

Willy

Sog · 2011-08-22

[offtopic]
I know datacompression is not recommended. But I never knew why (until know). (Thank you Willy)
My point was more that not everything is as obvious as it is to you. I always try to answer so that even an idiot might grasp the concept and everybody else with some knowledge understands it.
2nd Although data deletion cannot solve performance issues, it can avoid them if done periodically. This is indeed not the best solution, but it's the cheapest one. And some companies choose the cheap solution. (Even though I don't recommend)
[/offtopic]

MBerger · 2011-08-22

bbrown wrote:

Why? Because they have been known to create problems. Besides deleting data does not solve performance issues. it just delays them.

I would agree on the upgrade point. But don't expext that to solve the issues by itself.

I know the date compression CAN cause problems, but i meant that it is at least an option that can be considered. I agree that deleting data is not the holy grail of performance issues, but neither are index/code changes. The ideal solution is a combination, most of the time.

bbrown · 2011-08-22

MBerger wrote:

bbrown wrote:

Why? Because they have been known to create problems. Besides deleting data does not solve performance issues. it just delays them.

I would agree on the upgrade point. But don't expext that to solve the issues by itself.

I know the date compression CAN cause problems, but i meant that it is at least an option that can be considered. I agree that deleting data is not the holy grail of performance issues, but neither are index/code changes. The ideal solution is a combination, most of the time.

Performance tuning involves 3 areas (hardware/systems, maintenance, and code/indexes). These are in no particular order and each is as important as the others. Date compression belongs nowhere on this list.

KYDutchie · 2011-08-22

We're getting off topic a little here , but....

I agree that code changes are not always required, but Index maintenance is a must! If you ignore your indexes you set yourself up for performance issues.

The best solution to performance issues that I have experienced so far is to NOT delete data, but ensure that your indexes are optimized to the business processes. So if your business process creates a lot of ILE records and often needs to build queries on the ILE, then you have to make sure that your queries use the best, most selective indexes. And this does sometimes require code changes. The index maintenance should always include Index rebuilds/defrags, fill levels and updated statistics. I've seen that this has way more effect than data deletion.

Marije_Brummel · 2011-08-22

Moving from 32 to 64 bit is definately a good idea, but you should analyse your entire performance.

There might be slow queries executed by the application code that need a new index, or the application has too many indexes.

NAV has a techology called SIFT that can cause serious issues if not configured properly, especially on large systems.

There are tons of information about these issues on this website. I suggest you go and check them out before implementing somthing that is not your biggest cause of slowness.

And if nothing else helps, and if you can find them, hire the...

kriki · 2011-08-30

And SIFT on 40SP3 has a lot of totalling in the SIFT-tables. Creating 1 ILE can trigger 100 updates of the different levels of the SIFT. So also this has to be checked.

BTW: upgrading to a newer version of NAV gets rid of those SIFT-levels and can also make the DB shrink a lot, but this is only a side-effect and will not help the basic performance problem.

There is so much that can be done it might be a good idea to hire professional NAV-performance-tuners.

cfeghali · 2012-01-11

Hello All and happy new year. It took us a while but here we are.

News update:
Upgrade from SQL 2005 32bit to SQL 2005 64bit on the same Server (Windows 2003 R2 64bit lastest SP and updates) successful

we had to reset some permissions in the database and some other minor issue. so besides the downtime it was a fairly clean process.

Unfortunately performance is exactly the same. and we're still having issues of performance degradation over time (we do add a lot of records on a daily basis)

The SIFTs issues seems to make sense specially for us since we do know we have 100s of millions of records and growing daily. I see some of you spoke of deleting data, this is definitely something we've seen helps drastically, but we've never touched the big tables like "item ledger" etc, the only places we're comfortable moving data away from (we don't really delete, think about it like an archive off main database) is in the tables were we run our own processes prior to feed them to navision ERP. Example of this is taking orders from a website, write to an order table in navision, then another process will convert this order into a navision invoice, create the customer, statement and so on before posting.

Another piece of information we got recently seems like our peaks are 2800 iops. our Database is 230GB in size and hosted on a RAID 10, 8 x 128GB SSD. So i think the bottle neck is not iops since these drives claim to handle 50Kiops each. maybe i am wrong.

Also, when we run backups the system slows down significantly, so much that is almost unusable and we've recently had issues with some automated processes that run late at night around the time that the backup is running.

We've also used a tool that analyze the database from a SQL point of view and it always comes back as Severe (lots of issues to resolve) some related to missing indexes, some with out of date statistics and indexes. Examples:
- high PAGEIOLATCH_XX waits
- too many deadlocks
- queue length average of 4.66
- The SET NOCOUNT ON connection option is not being used
- missing indexes
- fragmented indexes
- Significant non-SQL Server network load (maybe because of NAS and Data Director running on the same server)

the list goes on, and it gets more and more fun everyday

Cheers and thank you very much for your advice and recommendations.

Cesar

Alex_Chow · 2012-01-11

You might want to get one of those SQL performance guys to take a look at your database. There are quite of few of them on this forum.

kriki · 2012-01-11

A backup should not impact that much on performance.

Did you do an indexrebuild? 90% fillfactor is a good average.

Are you sure your disks are aligned correctly and have the correct clustersize? With W2003 that is not guaranteed.

How did you configure DB and log files? Are they on the same disks? They should be separated.

cfeghali · 2012-01-11

I agree but it is impacting performance.
the daily job is:
1- check integrity
2- backup
3- rebuild index
4- reorganize index
5- update statistics

not sure what my fillfactor is

I don't think the disks are aligned but the cluster size should be correct, i have:
Bytes per cluster: 65536

Not only both these are on the same drive but logs are restricted to 2GB (database is 230GB).
Recovery is set to: simple

I will look into separating the logs but my understanding was that it didn't matter since they both sit on a very fast array.

David_Singleton · 2012-01-11

230G is big, but not really what you would call Huge.

How many concurrent users do you have?

What are they doing?

cfeghali wrote:

Anyway to archive the item ledger table? it is huge, has hundreds of millions of records.

You mention at one point 100's of millions of Item Ledger entries, that would be a database base much bigger than 230gig. How many are there?

beware of SSDs. Sure they are not likely to slow the system down, but they are not the magic bullet that people think they are. Think of a 10 lane freeway vs a 4 lane. The 10 lane only has an advantage if you can actually get enough cars on and off it. If its just a few cars on the freeway, then even if you have 100 lanes your trip will still be limited to how fast your car can drive.

Also upgrading can help, but it can also cause more problem than it solves. What exact version are you on? By that I mean build number. Different builds use different cursors, so you could be doing clustered seeks becasue of Dynamic cursors. Worse you may be on a Fast forward version and an upgrade to a VSFIT product (5.00sp1+) would then put you on Dynamics cursors and that could just bring the whole system to a grinding halt.

Step one is to do a full analysis of the cause of the performance issues, step 2 is to fix them.

Savatage · 2012-01-11

Out of curiosity.. how many years of data do you have?

Posted Invoices
Posted Credits
Posted Purchase Invoices
Posted Shipments
Posted Receipts

Old doc's like these can be deleted .. the ledger tables hold the vital info.

at the end of every year I make & save a complete backup (incase it's ever needed) & then delete (clean up) the oldest year..in our case all posted docs from 2008 and older are cleaned out.

If your public then you probably need to save more years (Sarbanes–Oxley)

kriki · 2012-01-12

Were do you write your backups?
On the same disks? VERY BAD!!!!!! Slows down a lot and if your disks catch fire, you lose both DB and backups!
Directly on another server using the network? Slows down the network. You can do that if your 2 servers are connected using there own LAN-segment so it does not interfere with normal LAN-traffic.
SAN? Ok.

cfeghali wrote:

I agree but it is impacting performance.
the daily job is:
1- check integrity
2- backup
3- rebuild index
4- reorganize index
5- update statistics

not sure what my fillfactor is

If you do a rebuild index, you don't need a reorganize index nor update statistics anymore because rebuild index is better than reorganize index and it also updates statistics with 100% data sample!
Your update statistics probably does a lesser datasample.

fillfactor : if you use a maintenance plan, you can easily check the fillfactor. 90% to 95% is a good average.

BTW: it is better to change your job like this:
1-differential backup
2-check integrity
3-rebuild index
4-full backup

reasons:
-the diff backup takes a backup of all changes done after the last full backup. (even better would be using transactionlog backups)
-the check integrity is best before the rebuild index (but no need to check indexes because you will rebuild them after this step)
-full backup after rebuild index because a rebuild index changes most of the DB, making differential backup almost = to full backups

cfeghali wrote:

I don't think the disks are aligned but the cluster size should be correct, i have:
Bytes per cluster: 65536

Not only both these are on the same drive but logs are restricted to 2GB (database is 230GB).
Recovery is set to: simple

I will look into separating the logs but my understanding was that it didn't matter since they both sit on a very fast array.

FIRST RULE: you need AT LEAST 3 spindles-groups:
1) system+tempDB (RAID1)
2) DB (RAID10)
3) transaction log (RAID1, each logfile (if you have multiple logfiles per DB, they can be on the same volume) should have its own dedicated RAID1)

and if possible : best extra spindle for temp DB.
In your case: put your NAV-DB logfile(s) on 1 drive. On THAT drive must be ONLY THOSE FILES and nothing else. Logfiles are almost write-only and SEQUENTIAL! This is the reason that RAID1 generally is enough.
Putting the log and DB together, makes the writing random access killing write performance (and performance in general).

Check also fragmentation of the disks and the fragmentation inside the logfile (http://mibuso.com/blogs/kriki/2008/12/23/some-tips-tricks-for-the-transaction-log/).
I generally create a fixed size logfile (to be sure this logfile is STATIC with no external and internal fragmentation) and then an overflow logfile in case the primary logfile overflows.
This way the DB does not come to a crashing halt.

Clustersize is ok. But if your disks are not aligned, you lose 30%-40% of performance because any read of 64KB will actually read 2 x 64KB (=128KB)!!!!

Another topic: do you only have 1 full backup a day?
In that case: if your DB dies 1 minute before the backup finishes, you lose 1 day of work! Best is to implement transactionlog backups.

cfeghali · 2012-01-12

David Singleton wrote:

230G is big, but not really what you would call Huge.

How many concurrent users do you have?

What are they doing?

cfeghali wrote:

Anyway to archive the item ledger table? it is huge, has hundreds of millions of records.

You mention at one point 100's of millions of Item Ledger entries, that would be a database base much bigger than 230gig. How many are there?

beware of SSDs. Sure they are not likely to slow the system down, but they are not the magic bullet that people think they are. Think of a 10 lane freeway vs a 4 lane. The 10 lane only has an advantage if you can actually get enough cars on and off it. If its just a few cars on the freeway, then even if you have 100 lanes your trip will still be limited to how fast your car can drive.

Also upgrading can help, but it can also cause more problem than it solves. What exact version are you on? By that I mean build number. Different builds use different cursors, so you could be doing clustered seeks becasue of Dynamic cursors. Worse you may be on a Fast forward version and an upgrade to a VSFIT product (5.00sp1+) would then put you on Dynamics cursors and that could just bring the whole system to a grinding halt.

Step one is to do a full analysis of the cause of the performance issues, step 2 is to fix them.

I apologize for the wrong information, you are correct, I just checked the table and i have 11million+ records in item ledger and 30million+ record in G/L entry table.

As for the build, not sure where to find it. my navision version is NA 4.00 SP1 (4.0 SP3)

Any recommendation on where to start to determine the cause of the performance issues? and/or who could help me with this?

Savatage · 2012-01-12

Alex Chow wrote:

You might want to get one of those SQL performance guys to take a look at your database. There are quite of few of them on this forum.

:thumbsup:

cfeghali · 2012-01-12

kriki:

SAN 1 RAID 5 8xHDD stores the system files (SQL 2005 engine) and the swap file
SAN 2 RAID 5 20xHDD stores the Backup files
SAN 3 RAID 10 8xSSD stores database main file, Log File, master, temp etc database and log. and nothing more. you will only see mdf,ndf,ldf files on that array.

I have enough "spindel" groups in that sense. 3 different fiber channel SANs
only thing is that logs sit with database and are restricted on growth. but it seems like the SSD drives are handling that ok. at least so we think since it's not capped on iops.

thank you for all your backup recommendations, they are very helpful.

Clustersize is ok. But if your disks are not aligned, you lose 30%-40% of performance because any read of 64KB will actually read 2 x 64KB (=128KB)!!!!

Can you please point me in the right direction on how to align the drives on Windows 2003 R2 64bit.

regards,
Cesar

David_Singleton · 2012-01-12

cfeghali wrote:

As for the build, not sure where to find it. my navision version is NA 4.00 SP1 (4.0 SP3)

Try this page form my wiki.

http://wiki.dynamicsbook.com/index.php?title=Build_No

PS I sent a PM about the second question.

David_Singleton · 2012-01-12

kriki wrote:

FIRST RULE: you need AT LEAST 3 spindles-groups:
1) system+tempDB (RAID1)
2) DB (RAID10)
3) transaction log (RAID1, each logfile (if you have multiple logfiles per DB, they can be on the same volume) should have its own dedicated RAID1)

and if possible : best extra spindle for temp DB.
In your case: put your NAV-DB logfile(s) on 1 drive. On THAT drive must be ONLY THOSE FILES and nothing else. Logfiles are almost write-only and SEQUENTIAL! This is the reason that RAID1 generally is enough.
Putting the log and DB together, makes the writing random access killing write performance (and performance in general).

Check also fragmentation of the disks and the fragmentation inside the logfile (http://mibuso.com/blogs/kriki/2008/12/23/some-tips-tricks-for-the-transaction-log/).
I generally create a fixed size logfile (to be sure this logfile is STATIC with no external and internal fragmentation) and then an overflow logfile in case the primary logfile overflows.
This way the DB does not come to a crashing halt.

Clustersize is ok. But if your disks are not aligned, you lose 30%-40% of performance because any read of 64KB will actually read 2 x 64KB (=128KB)!!!!

Whilst I agree with this in the general sense, most of it (IMHO) is not relevant in this case. Most of these issues relate to the Latency and seek time of hard drives. IN an SSD environment both of these are at least an order of magnitude less than the fastest hard drives, so it really wont make that much difference. In an SSD environment the issue will be the controller and how fast you can get data to and from SQL and the drives.

Even alignment wont matter too much (definitely not 40% performance loss) becasue that will all be absorbed by the fast read speed, but of course it would be nicer to align them.

In this scenario though, it is not likely to gain a significant performance increase from these things.

Of course you need hard numbers and statistics measured in the live environment to be sure, but the only things I could see wrong with the drive configuration is if the SAN itself is not "SSD" aware or maybe an issue with the throughput on the fiber Chanel, both of which are unlikely.

NAV 4.0 SP3 Database is getting too big and slow

Comments

Categories