Why would NAVIGATE cause a block/deadlock

rsaritzky · 2009-09-25

Hi Everyone,

Like many of us, we're starting to have record blocking issues, particularly during posting, e.g. "The G/L Entry table is locked by another user...".

However, my users are now getting this same message when they click "Navigate" on a posted item. e.g. go to particular vendor ledger entry and click "Navigate".

Why would an inquiry process like Navigate put any locks on any records at all?

ara3n · 2009-09-25

Have you done any modification that has FINDSET used in incorrect way?

rsaritzky · 2009-09-25

ara3n wrote:

Have you done any modification that has FINDSET used in incorrect way?

The only change to the Navigate form is to add one more table into the FINDRECORDS function. "NI Posted Invoice Header" is one of our custom tables.

IF xNIH.READPERMISSION THEN BEGIN
xNIH.RESET;
xNIH.SETCURRENTKEY("Invoice No.");
xNIH.SETFILTER("Invoice No.",DocNoFilter);
InsertIntoDocEntry(
DATABASE::"NI Posted Invoice Header",0,'NewInvoice Posted Invoice',xNIH.COUNT);
END;

We're on the 5.0SP1 NAV client using a 3.60 database, SQL 2005.

ara3n · 2009-09-25

I tested 3.6 db with 5 sp1 executables and 3.6 Navigate does not issue a lock and does not block while the GL entry is being locked by another client.

This was tested on exe build no 26084.

Marije_Brummel · 2009-09-27

It might be another block than a index block. You can also have CPU and PAGEIO blocks.

Check if Navigate is causing a disk queue and if paralelism is disabled.

rsaritzky · 2009-09-28

Mark Brummel wrote:

It might be another block than a index block. You can also have CPU and PAGEIO blocks.

Check if Navigate is causing a disk queue and if paralelism is disabled.

Thanks for the idea. 95% of our blocks are coming during posting (we have 6 or 7 people posting payables items all day long). I will look at other performance indicators on the server, but on the surface, that does not seem to be problem. Regular perfmon measures show that the server's CPU utilization is low (average around 25%) and the disk, although a single drive, is a 15KRPM drive.

Has anyone out there run into G/L Entry block issues having a G/L entry table of more than 5 million records?

Ron

rsaritzky · 2009-09-28

Hi guys,

My IT manager just sprung one on me. Turns out this SQL Server "box" is a virtual server. Any new ideas spring from this information?

Ron

ara3n · 2009-09-29

Virtual server have bad performances because of overhead.

rsaritzky · 2009-09-29

ara3n wrote:

Virtual server have bad performances because of overhead.

I agree this can sometimes be true, but NAV performs very well in general and has for a long time. The problem started 2 weeks ago when I added a flowfield and sumindex field to Job Ledger Entry. But I've removed those fields and Optimized both Job and General Ledger Entry to "undo" the effects of that change.

I am however going to check out the hardware environment - perhaps something in the disk channel has changed. Thanks for the opinions - it helps me think.

Ron

Marije_Brummel · 2009-09-29

Virtualisation can work perfectly, we have a lot of very large implementations running on VmWare with perfect performance.

It brings another level of complexity to the table which was complex anyway.

The biggest concern with virtualisation is to have enough physical resources and to avoid sharing physical disks with other virtual machines. This however is always a recomendation on SAN's that host multiple luns for multiple machines. The only exeption is the HP eva that does it's own striping (perfectly).

I am almost sure that your problem is in the HW somewhere, virtual or physical.

rsaritzky · 2009-10-01

More troubleshooting information:

These bad blocking situations seem to last 20 - 40 minutes, then everything is OK for awhile. No appreciable change in user activity when it resolves - everything just starts working OK.

The other virtual server on this VMWare box shows consistant CPU and disk activity - no peaks, relatively low utilization (my NAV box gets most of the hardware resources).

When the blocks start, we are observing a brief (15 seconds) spike in CPU utilization, then 20-40 minutes of very high disk activity - all "reads". Task Manager shows SQLSRVR.EXE at very low CPU utilization - 3-5%. Then the disk activity stops and everything is OK.

Are these reads all caused by the application trying to lock the GL entry table, i.e. a huge volume of re-reads?

Ron

rsaritzky · 2009-10-01

More troubleshooting info.

The problem just happened again. There were 5 people all posting purchase orders and 1 person posting a 30-line sales invoice.

All the Purchase-posting people got the message "The G/L Entry table is locked by another user." The person posting the Sales Invoice did not get the lock message, but the "progress window" that shows the posting progress was not moving - this was for 7-8 minutes or so.

During this time, high disk activity was noticed on the server - all by sqlserver.exe - I'm assuming it was trying to read/lock, read/lock, read/lock, etc.

I had one of the purchasers click "OK" on the lock message. No effect
I had the second of the purchasers click "OK". No effect
When the third purchaser (out of 5) clicked OK, disk activity dropped drastically and the sales invoice began normal progress through the posting process.

All the purchasers were able to then post. Some of the postings took 15 seconds or so, but no one got the locked message.

Marije_Brummel · 2009-10-02

Ron,

The last reported issue has nothing to do with Navigate.

NAV isolates all posting transactions on the last record in the G/L table. This is standard out of the box behaviour which is almost impossible to solve.

Best way to avoid it is to speed up the posting process. A sales invoice with 30 lines should only take milliseconds to post.

The high HW pressure leads me to believe that you have either a virtual or phisical issue on your box that needs to be solved first, but also look at the application since that might be the second place to look. (Usualy the first).

rsaritzky · 2009-10-02

Yes, I know I started with what I thought was some relationship to NAVIGATE, and I wasn't clear that the "real" problem was this blocking/deadlock issue between people who were trying to post at the same time. But with some observation and some digging into the application (as you suggest, usually the FIRST place to look, under the the assumption that the hardware is basically OK), I found what I believe to be the culprit (pending confirmation during production today)...

A previous NAV consultant had inserted 2 flowfields into the "Document Dimension" table, linking each document to the Purchase Invoice and Sales Invoice header tables.

As you know, the posting routines do a lot of copying of document dimension records into "Temporary" copies of this table. I can just imagine what the NAV kernel was doing every time this happened - calculating flowfield values for the 350,000+ records in the Document Dimension table each time a record was inserted during the posting routine. As lots of us know, the "emulation" of SIFT/flowfield technology in SQL is not nearly as elegant - or efficient - as in C/SIDE.

I posted a sales invoice with 36 lines with the flowfields still in the table - 15 minutes.

I then removed the flowfields, optimized the document dimension table and posted a similar-sized invoice in about 3 seconds.

We'll see for sure today during live use, but thank you to everyone who made suggestions - it helped me "navigate" through the troubleshooting process. And I hope to mark this thread "SOLVED" later today....

David_Singleton · 2009-10-05

ara3n wrote:

Virtual server have bad performances because of overhead.

We have a client with a hugely visualized system running on IBM SANs. Mark did the tuning on this, and showed that Navision can fly in a visualized environment if done properly.

No hardware changes were made, just software and configuration changes recommended by Mark.

The key here is getting in an expert (which is what we did) to get the job done properly. In this case the client has MULTIPLE Terra bytes running on a big RAID 5 array (it was not possible to replace the hardware), and without Mark's help there is no way this system would never have worked, irrelevant of how much hardware was thrown at it. It was proven to me that its not what hardware you have, its how you use it.

Marije_Brummel · 2009-10-05

:oops:

Please David, no more feathers....

Thanks.

ara3n · 2009-10-05

David Singleton wrote:

No hardware changes were made, just software and configuration changes recommended by Mark.

What were the software and configuration changes. Could you share them. Thanks.

Marije_Brummel · 2009-10-06

The usual stuff. :P

Maintenance plan, index and SIFT changes, Code review,client upgrade etc.

I don't think it is fair to share customer details on public forums.

David_Singleton · 2009-10-06

ara3n wrote:

David Singleton wrote:

No hardware changes were made, just software and configuration changes recommended by Mark.

What were the software and configuration changes. Could you share them. Thanks.

No sorry; as Mark says that is customer confidential. But if you have a specific customer with visualization issues, I am sure Mark can help you.

ara3n · 2009-10-06

Mark Brummel wrote:

The usual stuff. :P

Maintenance plan, index and SIFT changes, Code review,client upgrade etc.

I don't think it is fair to share customer details on public forums.

Which executable version did you put them on?

Marije_Brummel · 2009-10-07

That information would be obsolete now, since versions are comming and going.

Check Waldo's version page on www.waldo.be for the latest version.

Howdy, Stranger!

Quick Links

Categories

Why would NAVIGATE cause a block/deadlock

Comments