Hi All,
While we are looking into NAV SQL Performance issue, we found that one of the top wait is "ASYNC_NETWORK_IO".As per my understading the NAV App server is not consuming the data generated by DB quickly.
Our Prod Setup is:
-2 Terminal services for NAV client, and they are load balanced.
-2 App Servers are load balanced
-1 DB server in AG group with Sync
Can someone help me how to trouble shoot this issue with NAV client? or some one come across this type of issue before.When i asked my Infrastructure team, they told me that every thing is fine, and nothing they suspect from NAV App server or Terminal server.
Is there any setting with in NAV defines Client memory size etc.? or is it down to NAV coding etc.?
Any help would be appreciate.
Regards
Sarma
0
Answers
"Everything's fine" may simply mean everything works, not necessarily "everything works at fastest possible pace"
Since you have an infrastructure team I guess you have quite decent and expensive hardware around, including all network gear. Yet it still may be worth to check a few things. If you google around for "TCP/IP offloading SQL Server problem" you will find a good few articles about how to not to setup SQL server's network cards to avoid SQL Server performance problems.
Massive ASYNC_NETWORK_IO does not always mean problems. The real meaning of it is that "the NAV App server is not consuming the data generated by DB quickly enough.". That's it. It does not have to consume the data slowly, it just means that the SQL Server is quicker delivering it. Which is good in fact, as long as users are not complaining about the slow NAV.
If they are, then the sad truth is that it is almost always down to NAV coding - investing time in NAV code and table structure/indexing optimisations usually give the best pay-offs.
Memory-wise as far as I know you can only play with the cache size settings on the NST server, I am not aware of any RTC client side settings. They would not make much sense or impact anyway as the data flows between the NST and SQL server, the RTC clients are involved in data processing.
Regards,
Slawek
Dynamics NAV, MS SQL Server, Wherescape RED;
PRINCE2 Practitioner - License GR657010572SG
GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
Had the same problem in NAV 2013.
How many CPU cores do you have on your SQL server? And what is you MAXDOP? if you have MAXDOP set to 4 or 8 then setting it to 1 may give you a quick fix whilst you sort out the root cause. But this will most likely slow down the system, but at least it wont lock out new users logins.
Ultimately take a look at your worker threads, as this is the most likely cause. But don't set them too high if you don't have enough CPU cores.
Noit always a problem but if it reaches the limit, then the server will refuse new connections and they will have to restart the service tier, which is never a good thing.
Ideally you want your Middle tier matched to the work load and the SQL server so that everything runs smoothy, and it is best to address these issues before the whole system grinds to a halt.
Dynamics NAV, MS SQL Server, Wherescape RED;
PRINCE2 Practitioner - License GR657010572SG
GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
How many CPU cores do you have? And how many worker threads?
Default worker threads for 4 or less CPU core is 512
And?
Dynamics NAV, MS SQL Server, Wherescape RED;
PRINCE2 Practitioner - License GR657010572SG
GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
Need more info please.
He is not getting errors. He is complaining about high level of ASYNC_NETWORK_IO waits, that's it.
Dynamics NAV, MS SQL Server, Wherescape RED;
PRINCE2 Practitioner - License GR657010572SG
GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
We have server in AWS environment,
The DB server has 1 CPU and 16 cores.
The SQL Memory 100GB
The MAX_WORKING_THREAD is default one i.e. 0(Internally SQL calculating 700+ threads)
The MAXDOP value 1
The Always on group with FULL Sync mode.
We are getting this issue intermittent and one of the NAV session consumes 400+ threads(even after MAXDOP to 1).We are unable to identify root cause.Can some one helps me that would be great.
Regards
High level of ASYNC_NETWORK_IO waits is not an issue on it own. It is just a measure of SQL Server waiting for something.
EDIT
The NST uses connection pooling and MARS so I guess it is possible that those 400 threads on one NST session is caused by NST handling many user requests through that connection
Slawek
Dynamics NAV, MS SQL Server, Wherescape RED;
PRINCE2 Practitioner - License GR657010572SG
GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
I would slowly increase MAXDOP, try 2, 4 then 8 stop at 8. At the same time you would increase the worker threads, Take regular recordings under similar conditions and get it in balance.
FYI 400+ threads sounds pretty normal for using one core.
This of course all depends on the number of users and the complexity of the queries. For example a page with a lot of cues and flow fields (Especially if the flow fields don't have VSIFTS) can use a huge number of worker threads. I have seen a NAV server shut down with MAXDOP = 4 and 3,500 worker threads. Obviously the proper way to fix this is to look at the queries and fix the code, but as if it gets critical, then increase the worker threads.
How big is the database(s) how many users, how many tenants?
Issue is:- some times NAV server consuming all threads in SQL server, and DB is becoming unresponsive.
I have to log into the DB as DAC and kill couple of sessions to make DB responsive again.The
observation was one NAV session consumes 400+ worker threads.
Just want to double check with you all, if i set MAXDOP to 1, how come nav is using multiple worker threads with in single DB session. Is NAV overwriting MAXDOP or something which i missed from SQL server concept?
Regards
There are a lot of tasks that consume worker threads. 400 is quote normal, especially if you have very complex pages.
I would suggest that you double worker threads and monitor CPU usage and blocks to see what is happening. But really need more info ...
DB Size: 400GB+
No of NAV Licences : 80-100
Features used:- Mobile NAV(add on from third pary company),Web servervices and NAV front-end
Max No of DB connections:140
How is performance otherwise? Are you getting a lot of blocking?
Because NAV consumes all working threads it brings down the whole NAV environment.
Any suggestion are welcome
Example:-
SPID HOST #ofThreads
174 Axx_VPABCVPP02 3
504 Axx_VPABCVPP02 5
508 Axx_VPABCVPP02 4
400 Axx_VPABCVPP02 288
452 Axx_VPABCVPP02 391
135 Axx_VPABCVPP02 1
141 Axx_VPABCVPP02 2
175 Axx_VPABCVPP02 1
182 Axx_VPABCVPP02 1
76 Axx_VPABCVPP03 1
140 Axx_VPABCVPP03 1
155 Axx_VPABCVPP03 1
156 Axx_VPABCVPP03 3
167 Axx_VPABCVPP03 3
191 Axx_VPABCVPP03 1
65 Axx_VPXXXSQL01 1
Have you tried any of the suggestions above?
in terms of a long term fix you need to figure out which code is generating the complex queries.
Today, we have increased the max thread pool to 1500 and monitoring the situation.We don't want to change MAXDOP because of NAV best practice guide suggests set it to 1.
But, NAV developers says these queries are generated by NAV System and they can't pin point exact code location based on Queries provided from DB side.
Any one of you are experienced these many number of deadlocks?
One point, we thought of running DB Snapshot isolation mode.
That is a lot of deadlocks. Which really should not happen on a regular basis, they fixed a lot of code in 2013 and 2015 but you will still get dead locks. But these are code issues. On older versions VSIFT used to cause deadlocks, but I don't see a lot of that from 2013 on. I would more likely say it is due to custom code.
Since you have such complex queries using so many workers and causing dead locks, I would suggest that you need to look at the code and find where it is dead locking. Are you also getting blocks and time outs?
SELECT TOP 100
[Average Time Blocked] = (total_elapsed_time - total_worker_time) / qs.execution_count,
[Total Time Blocked] = total_elapsed_time - total_worker_time,
[Execution count] = qs.execution_count,
[Individual Query] = SUBSTRING (qt.text,qs.statement_start_offset/2,
(CASE
WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2),
[Parent Query] = qt.text,
[DatabaseName] = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Average Time Blocked] DESC;
the greatest use of I/O:
SELECT TOP 100
[Average IO] = (total_logical_reads + total_logical_writes) / qs.execution_count,
[Total IO] = (total_logical_reads + total_logical_writes),
[Execution count] = qs.execution_count,
[Individual Query] = SUBSTRING (qt.text,qs.statement_start_offset/2, (CASE
WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2),
[Parent Query] = qt.text,
[DatabaseName] = DB_NAME(qt.dbid)
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Total IO] DESC
I/O workload:
select top 50
(total_logical_reads/execution_count) as avg_logical_reads,
(total_logical_writes/execution_count) as avg_logical_writes,
(total_physical_reads/execution_count) as avg_phys_reads,
Execution_count,
statement_start_offset as stmt_start_offset,
plan_handle,
qt.text
from sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
order by (total_logical_reads + total_logical_writes) Desc
Nav, T-SQL.