SQL Server is overloaded and Servicetier is idle

Hey,

Details about us:
Currently live with AS400
~200 User
Around 10 concurrent users are doing light tests in NAV
Large customer base with a lot of telesales both b2c and b2b, so reducing any stuttering during the creation of orders is crucial.
Single Company
Dynamics NAV 2017
MS SQL Server 2016

We have one VM with an SQL Server for a Dynamics NAV Database and one VM with a ServiceTier. Every now and then we're running into performance issues most of those having to do with our 4,5 million customers. (like looking up their balance or trying to select one in a sales order)

I would've expected that having more than one ServiceTier with enough power and data cache should solve the issue. Right now it seems as though the ServiceTier is only handling the business logic and not actually caching the data and therefore lowering the load for the SQL Server.

Am I missing something in the settings or is the 3-Tier Architecture not meant to lower the load for the SQL Server? Of course there's things we can accomplish by adjusting the programming but it's happening with stock standard NAV functionalities.

Servicetier Settings:
0swu4gojupft.png

Cheers

Michael

Answers

  • Slawek_GuzekSlawek_Guzek Posts: 1,649Member
    edited 2019-03-15
    Try increasing the Data Cache Size from 10 to some higher number. Just remember that increasing the value by one doubles the cache memory, so do it in small steps, observing the memory allocation on the NST. Going too large is pointless as the guest OS might start to swap the memory out to the swapfile, also the host VM might start doing it too if there is too much memory pressure from from other VM guests.

    Each change requires restarting the NST as far as I am aware.

    The other issue might be that if you don't do the calc balance too often the data in cache might expire, rendering it useless. Unfortunately I don't know any time scale how long the data in the NST cache can sit not accessed without being expired.
    Slawek Guzek
    Dynamics NAV, MS SQL Server, Wherescape RED;
    PRINCE2 Practitioner - License GR657010572SG
    GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
  • MichaelG89MichaelG89 Posts: 5Member
    Thank you for your answer!

    Right now my go-to example of overloading the SQL Server is sorting the customer list for balance (LCY).
    Maybe the somewhat artificial nature of my test is causing the issue but I've tested the cache setting incrementally from 10 to 14 and the result has always been 20-25% CPU usage per client doing the sorting and around 10% CPU usage on the Servicetier.

    If I just open the customer list lots of times it's very slow but neither NST nor SQL Server is doing very much so maybe then my local PC is the limiting factor.

    As always actually emulating those crafty users and their creative behaviour is hard. The next time we'll run into issues I'll try playing around with the cache setting again.
  • Slawek_GuzekSlawek_Guzek Posts: 1,649Member
    edited 2019-03-15
    Increased case size will not help you with sorting by Balance (LCY) I'm afraid.

    The reasons are that 1) this is a calculated field, 2) NAV is not doing any sorting at all, it passes the query back to the SQL Server with required sorting order, in this case in form of ORDER BY SUM(...) which is never efficient.

    Opening Customer List may also be slow due to including the Balance (LCY) field on the page. Such fields should be generally kept in fact boxes, displaying the balance for currently highlighted customer only. If you include this field on the list page the each time you open it NAV will calculate the balance for each customer sending one very complicated query per one line on your customer page list. If you preset any arbitrary sorting different than sorting by PK (Customer no) you adding to the misery asking SQL Server to sort all the records (and possibly to calculate all the results before sorting) according to required sorting order.

    Slawek Guzek
    Dynamics NAV, MS SQL Server, Wherescape RED;
    PRINCE2 Practitioner - License GR657010572SG
    GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
  • BlackTigerBlackTiger Posts: 1,207Member
    There is no point to cache cached.
    SQL Server is smart enough.
    "You can’t just ask customers what they want and then try to give that to them.
    By the time you get it built, they’ll want something new.” Steve Jobs
  • Slawek_GuzekSlawek_Guzek Posts: 1,649Member
    edited 2019-03-15
    @BlackTiger you are wrong, for quite a few reasons.

    If "SQL Server was smart enough" there would be no cache in the NST. Both SQL and NST are developend by the same vendor, and the manufacturer of both products should know it better if the SQL Server level caching is good enough, and certainly would not waste time and resources to implement 'pointless' cache and all cache synchronisation mechanisms...
    Slawek Guzek
    Dynamics NAV, MS SQL Server, Wherescape RED;
    PRINCE2 Practitioner - License GR657010572SG
    GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
  • MichaelG89MichaelG89 Posts: 5Member
    @BlackTiger you are wrong, for quite a few reasons.

    If "SQL Server was smart enough" there would be no cache in the NST. Both SQL and NST are developend by the same vendor, and the manufacturer of both products should know it better if the SQL Server level caching is good enough, and certainly would not waste time and resources to implement 'pointless' cache and all cache synchronisation mechanisms...

    Thanks for your insights. So it seems as though the only way to handle the load caused by more complex queries it to increase the power of the SQL Server and that would mean needing a license for more than 8 cores.. I always thought STs are supposed to help with scaling. I'm not saying they are pointless but it definately feels somewhat misleading. SAPs 3 tier architecture appears superior in that regard.
  • Slawek_GuzekSlawek_Guzek Posts: 1,649Member
    Try it before you spend extra $$$$ on SQL Server licensing.

    I'd say that in this particular scenario in question the query generated by NAV will not benefit from increased concurrency capabilities, as, as far as I am aware, SQL Server sorting implementation is not parallel, therefore increasing the number of cores might not help.
    Slawek Guzek
    Dynamics NAV, MS SQL Server, Wherescape RED;
    PRINCE2 Practitioner - License GR657010572SG
    GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
  • FiddiFiddi Posts: 41Member
    So it seems as though the only way to handle the load caused by more complex queries it to increase the power of the SQL Server and that would mean needing a license for more than 8 cores.

    Before you update Hardware, you probably should switch on brain. B)
    First of all analyze where the real bottleneck is. Databases with this size need optimization by creating usefull aditinal indexes, by fixing badly designed code, or by doing things in an other way, than the standard does.

    First of all i would use the Monitor feature of the SQL-Server Mangement Studio (SSMS) to have a look what causes the problems. You may also analyze the performance- counters of the system(s).
  • MichaelG89MichaelG89 Posts: 5Member
    Try it before you spend extra $$$$ on SQL Server licensing.

    I'd say that in this particular scenario in question the query generated by NAV will not benefit from increased concurrency capabilities, as, as far as I am aware, SQL Server sorting implementation is not parallel, therefore increasing the number of cores might not help.

    I don't think that's correct since each client sorting for balance had an increased CPU usage of 12,5% and only with 8 clients did it reach 100%. Meaning Vanilla NAV has a built-in functionality that for our company means, 8 users (many of whom only work here during our seasonal 3 month-period) can accidentally lock down the entire SQL Server.

    What do you mean we should try? Because I did increase the ST cache and it really didn't change anything.

    We just have to weigh the risk against the cost. More CPUs would mean more wiggle room.. The real problem is how long it will take to identify issues when we are live. I doubt we will catch everything before that because users are crafty and 250 simultaneous users will absolutely find ways to break things in ways you cannot imagine, which is one of the reasons we want to change standard NAV as little as possible for now.
  • Slawek_GuzekSlawek_Guzek Posts: 1,649Member
    MichaelG89 wrote: »

    I don't think that's correct since each client sorting for balance had an increased CPU usage of 12,5% and only with 8 clients did it reach 100%.

    If those 8 clients are opening the Customer List sorted by the Balance LCY field then the behaviour described proves that whatever operation they do - it is a single threaded operation. Each client takes one SQL Server core off - most likely doing the final portion of the query - sorting.

    If you goal is to enable more users to open the Customer List at the same time then yes, increasing number of cores will help. But if your goal is to make one client to open the Customer List faster then adding extra cores will not make any difference.

    By the way - this is not "Vanilla NAV" functionality fault - this is the result of the workload you've thrown at your SQL Server. As I mentioned earlier the Balance (LCY) is a calculated field - NAV (or ratehr SQL Server) calculates it on-the-fly. It is not stored anywhere, and the data to make up the amount comes from the Detailed Ledger Entry table.

    If you have 4.5 million customers you probably have a good few dozen millions records in the Customer Ledger Entry table, and for each record Customer Ledger Entry you would normally have 4 records (or more) in Detailed Ledger Entry table. In order to sort your result SQL Server must first calculate the balance for the 4.5 million records, effectively summing up and grouping some 100m records I guess, and then once ALL the balances are calculated SQL Server needs to sort them according to your requirements.


    Slawek Guzek
    Dynamics NAV, MS SQL Server, Wherescape RED;
    PRINCE2 Practitioner - License GR657010572SG
    GDPR Certified Data Protection Officer - PECB License DPCDPO1025070-2018-03
  • FiddiFiddi Posts: 41Member
    As Slawek mentioned, the Balance is a value that is calculated at runtime by the NAV/SQL-Server, which needs - in your case - a lot of records for calculation.
    So in your case i would think, there is more need for additional memory than more CPU-cores. First you should check the SQL-Server "cache hit ratio", it should be above 90% for a good performance.
    The second thing you should think about is: Why do we need our customers sorted by Balance?
    Is it because you want to know which customers have open entries? If so, add a boolean FlowField to the customer table, which tests if there exits an open- Customer Ledger Entry for the customer. So if done, you may filter your customers on this FlowField with perhaps less load to the servers.
  • MichaelG89MichaelG89 Posts: 5Member
    edited 2019-03-26
    MichaelG89 wrote: »
    By the way - this is not "Vanilla NAV" functionality fault - this is the result of the workload you've thrown at your SQL Server. As I mentioned earlier the Balance (LCY) is a calculated field - NAV (or ratehr SQL Server) calculates it on-the-fly. It is not stored anywhere, and the data to make up the amount comes from the Detailed Ledger Entry table.
    Using Vanilla NAV functionality resulted in that workload. Sorting for a flowfield is very much something that is enabled in NAV from the start. I'm not quite sure what you are trying to say. I'm quite aware that the CRONUS NAV company wouldn't have any issues with that, which is why Im saying vanilla functionality not data. Our users are quite adamant about keeping their 4.5 million customers for some reason.
    Fiddi wrote: »
    As Slawek mentioned, the Balance is a value that is calculated at runtime by the NAV/SQL-Server, which needs - in your case - a lot of records for calculation.
    So in your case i would think, there is more need for additional memory than more CPU-cores. First you should check the SQL-Server "cache hit ratio", it should be above 90% for a good performance.
    The second thing you should think about is: Why do we need our customers sorted by Balance?
    Is it because you want to know which customers have open entries? If so, add a boolean FlowField to the customer table, which tests if there exits an open- Customer Ledger Entry for the customer. So if done, you may filter your customers on this FlowField with perhaps less load to the servers.

    Yes the real issue is not this isolated instance. Remove that column or remove sorting functionality - done. If we are only reacting to instances of users doing something to spike the server we will lose a lot of money in the time it takes to fix the issues. We need preventive measures.

    After finding as many instances of "sorting customers for balance" and fixing them the only thing left to do to prevent system stalls is add power.
Sign In or Register to comment.