hello everyone.
I have an issue that is really confusing me.
Background: I am working on a site using V4 SP3 executables. I have implemented the job queue (without the GETLASTERRORTEXT / CLEARLASTERROR key words) and have implemented separate queues to segregate the work being done. I have two application servers reading the job queue table, each looking for their own jobs. One queue will insert work for the other queue to do.
"Your activity was deadlocked with another user modifying the Job Queue Entry table.
Start again."
They are deadlocking each other. In a way this isn't too bad because the services recover and carry on, but I would rather they do not deadlock each other. The reason I am confused is that I thought you have to have two tables in order to cause a deadlock and that if I lock the table before reading it then one NAS will wait for the other to finish before starting.
the deadlock happens here (CU448)
GetNextRequest(VAR JobQueueEntry : Record "Job Queue Entry") : Boolean
JobQueueEntry.LOCKTABLE;
JobQueueEntry.SETFILTER("Expiration Date/Time",'>%1|%2',CURRENTDATETIME,CREATEDATETIME(0D,0T));
JobQueueEntry.SETFILTER("Earliest Start Date/Time",'<=%1',CURRENTDATETIME);
IF JobQueueProcess."Queue ID"<>'' THEN //JCS210612
JobQueueEntry.SETFILTER("Queue ID",'%1|%2','',JobQueueProcess."Queue ID");
IF JobQueueProcess."Maximum Job Queue Priority" <> 0 THEN
JobQueueEntry.SETRANGE(Priority,JobQueueProcess."Minimum Job Queue Priority",JobQueueProcess."Maximum Job Queue Priority")
ELSE
IF JobQueueProcess."Minimum Job Queue Priority" <> 0 THEN
JobQueueEntry.SETFILTER(Priority,'>=%1',JobQueueProcess."Maximum Job Queue Priority")
ELSE
JobQueueEntry.SETRANGE(Priority);
JobQueueEntry.SETRANGE(Status,JobQueueEntry.Status::Ready);
JobQueueEntry.SETCURRENTKEY(Priority);
Found := JobQueueEntry.FINDFIRST;
called from here
HandleRequest()
JobQueueSetup.GET;
IF NOT JobQueueSetup."Job Queue Active" THEN
EXIT;
NavTimer.Enabled := FALSE;
ThisSessionIsActive := UpdateJobQueueSession(JobQueueEntry,FALSE);
COMMIT;
SELECTLATESTVERSION; //JCS110912
CURRENTTRANSACTIONTYPE(TRANSACTIONTYPE::Update);
LOCKTIMEOUT(FALSE);
IF ThisSessionIsActive THEN
MoreRequests := GetNextRequest(JobQueueEntry); // locks table
as you can see I have attempted to resolve the issue by adding a table lock before the read and removing the lock time out. I'm still getting deadlocks reported in the application log.
I have thought about having a separate job queue table for the other application server to read but I don't properly understand what is going on. Can anyone shed any light on this for me?
Thanks in advance
Comments
Independent Consultant/Developer
blog: https://dynamicsuser.net/nav/b/ara3n
Nice if I could - Primary key of the Job queue table is a GUID.
Would an upgrade to V6 executables help, do you think? This has been on the cards for some time.
It helps but doesn't completely resolve the issue.
Independent Consultant/Developer
blog: https://dynamicsuser.net/nav/b/ara3n
Thanks very much for your replies. I think before I start inserting extra records into the table i would look at splitting the table per job queue.
I am still very confused as to how a deadlock can be caused by the first read of a transaction. If my understanding is correct, the transaction starts with the Commit in HandleRequest. The program then calls a LOCKTABLE to ensure that the database read will read only commited data and that the read will be WITHLOCK (?) thus locking not the table but the read record and a couple of records around it. How can this read cause a deadlock? Why doesn't the program simply wait until the record has been released? I would be quite happy with "The xxx table is locked by another user" message but removing the locktimeout should prevent that one anyway.
I know the deadlock is on that FINDFIRST because I have debugged it.
I probably won't do anything about this because it isn't causing an operational issue but I would love to understand why it is happening.
If anyone has the time I would be very grateful for any ideas.
thanks
Bad assumption there. It is very difficult to debug blocks, and virtually impossible to debug deadlocks, because the Debugger works differently, cache, bulk insert etc. act differently, so don't rely on the assumption that where the debugger stops is where your issue is.
Creating multiple tables for each queue sounds very much like a last resort. Step one is to follow Rashed's advice on inserting gaps. If you can't resolve it like that, then work on a semaphore system with a token created in a new table.
I'm very interested by your reply.
by "removing the Lock Time Out I was not referring to the setting in File - Database - alter but using the LOCKTIMEOUT(FALSE); command. As I understand it this will only apply to the current transaction. Is this correct?
I'm also interested when you say that the debugger may not be telling me the whole story. I think V4 SP3 does not use bulk inserts - the commands are a little more literal than that. The Bulk insert cam in in V5 sp1 I think. But I am still very confused as to how the first database read of a transaction can cause a deadlock to be reported and have to conclude that there is much I don't understand about SQL and deadlocking. (I know that anyway - it is a know unknown as Rumsfeld would have it).
I think I will leave it and see what effect using v6 executables has. If there is still a problem I will bite the bullet and start learning how to use the SQL profiller to troubleshoot this problem.
To answer the last question - I actually have 3 application servers running a job queue. The third one runs larger repetitive trawling jobs and causes no problems. They all log in on the same windows account and I segregate the work by passing a different startup parameter into CU1 specifying that they should only pick up job queue entries with their "Queue ID" (which I store in the job queue processes table). One of the (COMMS) handles communication to the website and the other (POST) caries out system postings and creation of warehouse shipments. The system is working very nicely apart from all the deadlocks being reported. COMMS and POST run on the same server. The BATCH nas runs on a different server.
There are also two listening NASs on this site and an LS retail scheduler - very busy all in all
As there's a COMMIT a few moments later I suggest a great big lock ... perhaps ...
OTOH, if I'm right, turning off MaintainSQLIndex for the second index on "Job Queue Entry" should (probably) fix the problem too.
TVision Technology Ltd
I have been an avid follower of your posts and am very happy that you have had the time to look at my problem.
In fact, I have already taken off the second index in an attempt to resolve this. I like the idea of locking the entire table before selecting the entry we want to process. I'll have a go with this in a quiet moment on Tuesday morning and see what happens. I'll let you know.
Many thanks
James
I'm sorry to say that I have not managed to move this forward.
Robert - I've put in a IF FIND('-') THEN; after the locktable and before setting all the filters. It has made no difference.
I am still confused on this basic question - if since a commit a process has referenced only one table (and that table has only one index - the primary key) how can the database report a deadlock?
We have not yet upgraded to V6 executables - I don't see this happening before Christmas.
Any ideas anyone has will be gratefully received.
Compare: with Use MSSMQ and sp_lock to trace locked records.
In this case, you should add JobQueueEntry.setcurrentkey(Priority, "Expiration Date/Time", "Earliest Start Date/Time"); before findset
Nav, T-SQL.
no ideas how we can have a deadlock on a single record?
I think you have lot of locked records.
Nav, T-SQL.
Otherwise you're looking into the DMV (Data Management Views) and that's never fun.
TVision Technology Ltd
I have been trying to avoid this but I think it is time. I'll let you know what I find.
it seems that we had one record being inserted and two simultaneous reads which the database driver had issued with the locking hint UPDLOCK. In this case, SQL server decided that the second read had to be kicked out. This seems to be the way that the SQL database driver behaves in this version. (V4 sp3 / SQL2005)