Job Queue Deadlocks

james_cs · 2012-09-13

hello everyone.

I have an issue that is really confusing me.

Background: I am working on a site using V4 SP3 executables. I have implemented the job queue (without the GETLASTERRORTEXT / CLEARLASTERROR key words) and have implemented separate queues to segregate the work being done. I have two application servers reading the job queue table, each looking for their own jobs. One queue will insert work for the other queue to do.

"Your activity was deadlocked with another user modifying the Job Queue Entry table.

Start again."

They are deadlocking each other. In a way this isn't too bad because the services recover and carry on, but I would rather they do not deadlock each other. The reason I am confused is that I thought you have to have two tables in order to cause a deadlock and that if I lock the table before reading it then one NAS will wait for the other to finish before starting.

the deadlock happens here (CU448)

GetNextRequest(VAR JobQueueEntry : Record "Job Queue Entry") : Boolean
JobQueueEntry.LOCKTABLE;
JobQueueEntry.SETFILTER("Expiration Date/Time",'>%1|%2',CURRENTDATETIME,CREATEDATETIME(0D,0T));
JobQueueEntry.SETFILTER("Earliest Start Date/Time",'<=%1',CURRENTDATETIME);

IF JobQueueProcess."Queue ID"<>'' THEN  //JCS210612
  JobQueueEntry.SETFILTER("Queue ID",'%1|%2','',JobQueueProcess."Queue ID");

IF JobQueueProcess."Maximum Job Queue Priority" <> 0 THEN
  JobQueueEntry.SETRANGE(Priority,JobQueueProcess."Minimum Job Queue Priority",JobQueueProcess."Maximum Job Queue Priority")
ELSE
  IF JobQueueProcess."Minimum Job Queue Priority" <> 0 THEN
    JobQueueEntry.SETFILTER(Priority,'>=%1',JobQueueProcess."Maximum Job Queue Priority")
  ELSE
    JobQueueEntry.SETRANGE(Priority);
JobQueueEntry.SETRANGE(Status,JobQueueEntry.Status::Ready);
JobQueueEntry.SETCURRENTKEY(Priority);
Found := JobQueueEntry.FINDFIRST;

called from here

HandleRequest()
JobQueueSetup.GET;
IF NOT JobQueueSetup."Job Queue Active" THEN
  EXIT;

NavTimer.Enabled := FALSE;
ThisSessionIsActive := UpdateJobQueueSession(JobQueueEntry,FALSE);
COMMIT;
SELECTLATESTVERSION; //JCS110912
CURRENTTRANSACTIONTYPE(TRANSACTIONTYPE::Update);
LOCKTIMEOUT(FALSE);

IF ThisSessionIsActive THEN
  MoreRequests := GetNextRequest(JobQueueEntry);  // locks table

as you can see I have attempted to resolve the issue by adding a table lock before the read and removing the lock time out. I'm still getting deadlocks reported in the application log.

I have thought about having a separate job queue table for the other application server to read but I don't properly understand what is going on. Can anyone shed any light on this for me?

Thanks in advance

ara3n · 2012-09-13

try and insert gaps (records) between the records. older nav clients lock not just the record but sibling records as well.

james_cs · 2012-09-13

Thanks for replying!

Nice if I could - Primary key of the Job queue table is a GUID.

Would an upgrade to V6 executables help, do you think? This has been on the cards for some time.

David_Singleton · 2012-09-13

james_cs wrote:

Would an upgrade to V6 executables help, do you think? This has been on the cards for some time.

It helps but doesn't completely resolve the issue.

David_Singleton · 2012-09-13

BY the way I don't have time to look at the code in detail, but you may be able to solve this by creating a token in a new table and tracking that instead of locking.

ara3n · 2012-09-13

you can always insert new records multiple ones and setup only every other record.

james_cs · 2012-09-14

Hi David and Rashed.

Thanks very much for your replies. I think before I start inserting extra records into the table i would look at splitting the table per job queue.

I am still very confused as to how a deadlock can be caused by the first read of a transaction. If my understanding is correct, the transaction starts with the Commit in HandleRequest. The program then calls a LOCKTABLE to ensure that the database read will read only commited data and that the read will be WITHLOCK (?) thus locking not the table but the read record and a couple of records around it. How can this read cause a deadlock? Why doesn't the program simply wait until the record has been released? I would be quite happy with "The xxx table is locked by another user" message but removing the locktimeout should prevent that one anyway.

I know the deadlock is on that FINDFIRST because I have debugged it.

I probably won't do anything about this because it isn't causing an operational issue but I would love to understand why it is happening.

If anyone has the time I would be very grateful for any ideas.

thanks

David_Singleton · 2012-09-14

james_cs wrote:

...but removing the locktimeout should prevent that one anyway.

Really don't do that, it is a global setting and will have deep effects way further than this issue.

james_cs wrote:

I know the deadlock is on that FINDFIRST because I have debugged it.

Bad assumption there. It is very difficult to debug blocks, and virtually impossible to debug deadlocks, because the Debugger works differently, cache, bulk insert etc. act differently, so don't rely on the assumption that where the debugger stops is where your issue is.

Creating multiple tables for each queue sounds very much like a last resort. Step one is to follow Rashed's advice on inserting gaps. If you can't resolve it like that, then work on a semaphore system with a token created in a new table.

mdPartnerNL · 2012-09-14

So you have two NAS running and every NAS logs in with a different windows account?

james_cs · 2012-09-14

Hi David.

I'm very interested by your reply.

by "removing the Lock Time Out I was not referring to the setting in File - Database - alter but using the LOCKTIMEOUT(FALSE); command. As I understand it this will only apply to the current transaction. Is this correct?

I'm also interested when you say that the debugger may not be telling me the whole story. I think V4 SP3 does not use bulk inserts - the commands are a little more literal than that. The Bulk insert cam in in V5 sp1 I think. But I am still very confused as to how the first database read of a transaction can cause a deadlock to be reported and have to conclude that there is much I don't understand about SQL and deadlocking. (I know that anyway - it is a know unknown as Rumsfeld would have it).

I think I will leave it and see what effect using v6 executables has. If there is still a problem I will bite the bullet and start learning how to use the SQL profiller to troubleshoot this problem.

To answer the last question - I actually have 3 application servers running a job queue. The third one runs larger repetitive trawling jobs and causes no problems. They all log in on the same windows account and I segregate the work by passing a different startup parameter into CU1 specifying that they should only pick up job queue entries with their "Queue ID" (which I store in the job queue processes table). One of the (COMMS) handles communication to the website and the other (POST) caries out system postings and creation of warehouse shipments. The system is working very nicely apart from all the deadlocks being reported. COMMS and POST run on the same server. The BATCH nas runs on a different server.

There are also two listening NASs on this site and an LS retail scheduler - very busy all in all

rdebath · 2012-09-15

I've had a look at the v5 code and I think you're getting the deadlock because of read locks (for update) set by the search. Depending on the various filters the SQL engine would be very erratic about which of the keys it chooses to do a search down. Especially SQL 2005 with it's love of the cluster ... index.

As there's a COMMIT a few moments later I suggest a great big lock ... perhaps ...

JobQueueEntry.LOCKTABLE;
JobQueueEntry.RESET;
JobQueueEntry.FINDFIRST;
JobQueueEntry.SETFILTER("Expiration Date/Time",'>%1|%2',CURRENTDATETIME,CREATEDATETIME(0D,0T));

OTOH, if I'm right, turning off MaintainSQLIndex for the second index on "Job Queue Entry" should (probably) fix the problem too.

james_cs · 2012-09-15

Hi Robert.

I have been an avid follower of your posts and am very happy that you have had the time to look at my problem.

In fact, I have already taken off the second index in an attempt to resolve this. I like the idea of locking the entire table before selecting the entry we want to process. I'll have a go with this in a quiet moment on Tuesday morning and see what happens. I'll let you know.

Many thanks

James

james_cs · 2012-10-08

Hi All.

I'm sorry to say that I have not managed to move this forward.

Robert - I've put in a IF FIND('-') THEN; after the locktable and before setting all the filters. It has made no difference.

I am still confused on this basic question - if since a commit a process has referenced only one table (and that table has only one index - the primary key) how can the database report a deadlock?

We have not yet upgraded to V6 executables - I don't see this happening before Christmas.

Any ideas anyone has will be gratefully received.

rmv_RU · 2012-10-09

It's strongly recommend to use most better selectivity key with locktable.
Compare:

GLEntry.locktable;
GLEntry.setrange("G/L Account No.",'My Account');
GLEntry.setrange("Posting Date",today);
GLEntry.setrange("G/L Account No.","Posting Date");
GLEntry.find('-');

with

GLEntry.locktable;
GLEntry.setcurrentkey("G/L Account No.", "Posting Date");
GLEntry.setrange("G/L Account No.",'My Account');
GLEntry.setrange("Posting Date",today);
GLEntry.setrange("G/L Account No.","Posting Date");
GLEntry.find('-');

Use MSSMQ and sp_lock to trace locked records.

In this case, you should add JobQueueEntry.setcurrentkey(Priority, "Expiration Date/Time", "Earliest Start Date/Time"); before findset

james_cs · 2012-10-09

I think, if you look carefully, it does

JobQueueEntry.SETCURRENTKEY(Priority);
Found := JobQueueEntry.FINDFIRST;

no ideas how we can have a deadlock on a single record?

rmv_RU · 2012-10-09

james_cs wrote:
I think, if you look carefully, it does
JobQueueEntry.SETCURRENTKEY(Priority);
Found := JobQueueEntry.FINDFIRST;

Yep, but it's probably not best key in this case.

james_cs wrote:

no ideas how we can have a deadlock on a single record?

I think you have lot of locked records.

rdebath · 2012-10-15

rmv_RU wrote:

I think you have lot of locked records.

Yup, at this point we're guessing we need at least a little data; turning on the deadlock trace could help ...

-- SQL 2000
DBCC TRACEON (1204, -1)

-- SQL 2005+
DBCC TRACEON (1222, -1)

Otherwise you're looking into the DMV (Data Management Views) and that's never fun.

james_cs · 2012-10-15

Thanks Robert.

I have been trying to avoid this but I think it is time. I'll let you know what I find.

Gil · 2012-10-19

you might want to set your database property to rowlock, it will minimize your deadlocks in exchange for a bit of performance.

james_cs · 2013-05-31

Finally managed to look at this with someone who really knows how to use Profiler.

it seems that we had one record being inserted and two simultaneous reads which the database driver had issued with the locking hint UPDLOCK. In this case, SQL server decided that the second read had to be kicked out. This seems to be the way that the SQL database driver behaves in this version. (V4 sp3 / SQL2005)

Job Queue Deadlocks

Comments

Categories