SQL Wait ASYNC_NETWORK_IO on NAV Production

sarma · 2017-09-28

Hi All,

While we are looking into NAV SQL Performance issue, we found that one of the top wait is "ASYNC_NETWORK_IO".As per my understading the NAV App server is not consuming the data generated by DB quickly.

Our Prod Setup is:

-2 Terminal services for NAV client, and they are load balanced.
-2 App Servers are load balanced
-1 DB server in AG group with Sync

Can someone help me how to trouble shoot this issue with NAV client? or some one come across this type of issue before.When i asked my Infrastructure team, they told me that every thing is fine, and nothing they suspect from NAV App server or Terminal server.

Is there any setting with in NAV defines Client memory size etc.? or is it down to NAV coding etc.?

Any help would be appreciate.

Regards
Sarma

Slawek_Guzek · 2017-09-28

Hi

"Everything's fine" may simply mean everything works, not necessarily "everything works at fastest possible pace"

Since you have an infrastructure team I guess you have quite decent and expensive hardware around, including all network gear. Yet it still may be worth to check a few things. If you google around for "TCP/IP offloading SQL Server problem" you will find a good few articles about how to not to setup SQL server's network cards to avoid SQL Server performance problems.

Massive ASYNC_NETWORK_IO does not always mean problems. The real meaning of it is that "the NAV App server is not consuming the data generated by DB quickly enough.". That's it. It does not have to consume the data slowly, it just means that the SQL Server is quicker delivering it. Which is good in fact, as long as users are not complaining about the slow NAV.

If they are, then the sad truth is that it is almost always down to NAV coding - investing time in NAV code and table structure/indexing optimisations usually give the best pay-offs.

Memory-wise as far as I know you can only play with the cache size settings on the NST server, I am not aware of any RTC client side settings. They would not make much sense or impact anyway as the data flows between the NST and SQL server, the RTC clients are involved in data processing.

Regards,
Slawek

David_Singleton · 2017-10-11

sarma wrote: »

Hi All,

While we are looking into NAV SQL Performance issue, we found that one of the top wait is "ASYNC_NETWORK_IO".As per my understading the NAV App server is not consuming the data generated by DB quickly.
...

Had the same problem in NAV 2013.

How many CPU cores do you have on your SQL server? And what is you MAXDOP? if you have MAXDOP set to 4 or 8 then setting it to 1 may give you a quick fix whilst you sort out the root cause. But this will most likely slow down the system, but at least it wont lock out new users logins.

Ultimately take a look at your worker threads, as this is the most likely cause. But don't set them too high if you don't have enough CPU cores.

David_Singleton · 2017-10-12

Slawek_Guzek wrote: »

...
Massive ASYNC_NETWORK_IO does not always mean problems. The real meaning of it is that "the NAV App server is not consuming the data generated by DB quickly enough.". That's it. It does not have to consume the data slowly, it just means that the SQL Server is quicker delivering it. Which is good in fact, as long as users are not complaining about the slow NAV. ...

Noit always a problem but if it reaches the limit, then the server will refuse new connections and they will have to restart the service tier, which is never a good thing.

Ideally you want your Middle tier matched to the work load and the SQL server so that everything runs smoothy, and it is best to address these issues before the whole system grinds to a halt.

Slawek_Guzek · 2017-10-12

Are you sure that the ASYNC_NETWORK_IO wait can force the server to refuse connections? Personally I can't see why it would or could cause that. And how dropping the MAXDOP could help.

David_Singleton · 2017-10-12

Slawek_Guzek wrote: »

Are you sure that the ASYNC_NETWORK_IO wait can force the server to refuse connections? Personally I can't see why it would or could cause that. And how dropping the MAXDOP could help.

How many CPU cores do you have? And how many worker threads?

Slawek_Guzek · 2017-10-12

@sarma didn't say but I guess minimum sensible configuration is 4 cores, as you can't bus SQL Server license for less than 4 cores.

Default worker threads for 4 or less CPU core is 512

And?

David_Singleton · 2017-10-13

sarma wrote: »

Any help would be appreciate.

Need more info please.

David_Singleton · 2017-10-13

Slawek_Guzek wrote: »

Default worker threads for 4 or less CPU core is 512

Are you getting the same error as Sarma?

Slawek_Guzek · 2017-10-13

David_Singleton wrote: »

Are you getting the same error as Sarma?

Have you read his post at all?

He is not getting errors. He is complaining about high level of ASYNC_NETWORK_IO waits, that's it.

sarma · 2017-10-19

Hi Guys,

We have server in AWS environment,

The DB server has 1 CPU and 16 cores.
The SQL Memory 100GB
The MAX_WORKING_THREAD is default one i.e. 0(Internally SQL calculating 700+ threads)
The MAXDOP value 1
The Always on group with FULL Sync mode.

We are getting this issue intermittent and one of the NAV session consumes 400+ threads(even after MAXDOP to 1).We are unable to identify root cause.Can some one helps me that would be great.

Regards

Slawek_Guzek · 2017-10-19

You need to define exactly what the issuse is.

High level of ASYNC_NETWORK_IO waits is not an issue on it own. It is just a measure of SQL Server waiting for something.

EDIT

The NST uses connection pooling and MARS so I guess it is possible that those 400 threads on one NST session is caused by NST handling many user requests through that connection

Slawek

David_Singleton · 2017-10-19

sarma wrote: »

Hi Guys,

We have server in AWS environment,

The DB server has 1 CPU and 16 cores.
The SQL Memory 100GB
The MAX_WORKING_THREAD is default one i.e. 0(Internally SQL calculating 700+ threads)
The MAXDOP value 1
The Always on group with FULL Sync mode.

We are getting this issue intermittent and one of the NAV session consumes 400+ threads(even after MAXDOP to 1).We are unable to identify root cause.Can some one helps me that would be great.

Regards

I would slowly increase MAXDOP, try 2, 4 then 8 stop at 8. At the same time you would increase the worker threads, Take regular recordings under similar conditions and get it in balance.

FYI 400+ threads sounds pretty normal for using one core.

This of course all depends on the number of users and the complexity of the queries. For example a page with a lot of cues and flow fields (Especially if the flow fields don't have VSIFTS) can use a huge number of worker threads. I have seen a NAV server shut down with MAXDOP = 4 and 3,500 worker threads. Obviously the proper way to fix this is to look at the queries and fix the code, but as if it gets critical, then increase the worker threads.

How big is the database(s) how many users, how many tenants?

David_Singleton · 2017-10-19

Just to be clear, increasing MAXDOP and Worker Threads is not the solution to the problem, it's the band aid fix to get your system going again to give you time to analysis the code and fix the code problems.

sarma · 2017-10-20

Hi Guys,

Issue is:- some times NAV server consuming all threads in SQL server, and DB is becoming unresponsive.
I have to log into the DB as DAC and kill couple of sessions to make DB responsive again.The
observation was one NAV session consumes 400+ worker threads.

Just want to double check with you all, if i set MAXDOP to 1, how come nav is using multiple worker threads with in single DB session. Is NAV overwriting MAXDOP or something which i missed from SQL server concept?

Regards

David_Singleton · 2017-10-20

Sarma, how many concurrent users do you have, how big is the database, are the users launching background tasks, do you have web services or application servers running?

There are a lot of tasks that consume worker threads. 400 is quote normal, especially if you have very complex pages.

I would suggest that you double worker threads and monitor CPU usage and blocks to see what is happening. But really need more info ...

sarma · 2017-10-20

Hi David

DB Size: 400GB+
No of NAV Licences : 80-100
Features used:- Mobile NAV(add on from third pary company),Web servervices and NAV front-end
Max No of DB connections:140

David_Singleton · 2017-10-20

OK so it's not huge, and with 100gig of RAM you should be fine, so the thing to focus on is what is using up all the workers. Look at the long running queries, but really 3 worker threads per connection it not a lot. So either way you need to increase them.

How is performance otherwise? Are you getting a lot of blocking?

sarma · 2017-10-24

With default thread config one connection has 300+.Example provided below

Because NAV consumes all working threads it brings down the whole NAV environment.

Any suggestion are welcome

Example:-

SPID HOST #ofThreads
174 Axx_VPABCVPP02 3
504 Axx_VPABCVPP02 5
508 Axx_VPABCVPP02 4
400 Axx_VPABCVPP02 288
452 Axx_VPABCVPP02 391
135 Axx_VPABCVPP02 1
141 Axx_VPABCVPP02 2
175 Axx_VPABCVPP02 1
182 Axx_VPABCVPP02 1
76 Axx_VPABCVPP03 1
140 Axx_VPABCVPP03 1
155 Axx_VPABCVPP03 1
156 Axx_VPABCVPP03 3
167 Axx_VPABCVPP03 3
191 Axx_VPABCVPP03 1
65 Axx_VPXXXSQL01 1

David_Singleton · 2017-10-24

sarma wrote: »

...
Because NAV consumes all working threads it brings down the whole NAV environment.

Any suggestion are welcome

Have you tried any of the suggestions above?

in terms of a long term fix you need to figure out which code is generating the complex queries.

sarma · 2017-10-25

Hi David,

Today, we have increased the max thread pool to 1500 and monitoring the situation.We don't want to change MAXDOP because of NAV best practice guide suggests set it to 1.

sarma · 2017-10-25

Another biggest problem in current environment is deadlocks in NAV DB. On average, we have around 120+ deadlocks per day.We know the queries that are causing these dead locks,

But, NAV developers says these queries are generated by NAV System and they can't pin point exact code location based on Queries provided from DB side.

Any one of you are experienced these many number of deadlocks?

One point, we thought of running DB Snapshot isolation mode.

David_Singleton · 2017-10-25

sarma wrote: »

Another biggest problem in current environment is deadlocks in NAV DB. On average, we have around 120+ deadlocks per day.We know the queries that are causing these dead locks,.

That is a lot of deadlocks. Which really should not happen on a regular basis, they fixed a lot of code in 2013 and 2015 but you will still get dead locks. But these are code issues. On older versions VSIFT used to cause deadlocks, but I don't see a lot of that from 2013 on. I would more likely say it is due to custom code.

Since you have such complex queries using so many workers and causing dead locks, I would suggest that you need to look at the code and find where it is dead locking. Are you also getting blocks and time outs?

David_Singleton · 2017-10-25

Which tables are involved in the dead locks.

rmv_RU · 2018-05-23

the most blocked queries:
SELECT TOP 100
[Average Time Blocked] = (total_elapsed_time - total_worker_time) / qs.execution_count,
[Total Time Blocked] = total_elapsed_time - total_worker_time,
[Execution count] = qs.execution_count,
[Individual Query] = SUBSTRING (qt.text,qs.statement_start_offset/2,
(CASE
WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2),
[Parent Query] = qt.text,
[DatabaseName] = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Average Time Blocked] DESC;

the greatest use of I/O:
SELECT TOP 100
[Average IO] = (total_logical_reads + total_logical_writes) / qs.execution_count,
[Total IO] = (total_logical_reads + total_logical_writes),
[Execution count] = qs.execution_count,
[Individual Query] = SUBSTRING (qt.text,qs.statement_start_offset/2, (CASE
WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2),
[Parent Query] = qt.text,
[DatabaseName] = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Total IO] DESC

I/O workload:

select top 50
(total_logical_reads/execution_count) as avg_logical_reads,
(total_logical_writes/execution_count) as avg_logical_writes,
(total_physical_reads/execution_count) as avg_phys_reads,
Execution_count,
statement_start_offset as stmt_start_offset,
plan_handle,
qt.text
from sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
order by (total_logical_reads + total_logical_writes) Desc

Howdy, Stranger!

Quick Links

Categories

SQL Wait ASYNC_NETWORK_IO on NAV Production

Answers