How optimize Key Lookup with Wildcard on SELECT?

nav_studentnav_student Member Posts: 175
edited 2011-12-10 in SQL Performance
Hi guys

i´m monitoring a process with SQL Profiler and i see this:
CPU - 0
Read - 2
Writes - 0
Duration - 892
SELECT TOP 1 * FROM <database_companyname>$Value Entry" WITH (READUNCOMMITTED)   WHERE (("Item Ledger Entry No_"=@P1)) ORDER BY "Item Ledger Entry No_","Document No_","Document Line No_","Entry No_

The Index Seek shows me Cost of 49%
The Key Lookup shows me Cost of 52%

I read that i can optimize Key Lookup by creating an index with Included Columns but with a wildcard (*) how can acomplish that?
On the styrk´s book i see that every FINDSET, FINFIRST, ... generates a wilcard, on SQL Query.

Can you help me?

Comments

  • krikikriki Member, Moderator Posts: 9,094
    The technique is called "Covering index", but you can't do it with a SELECT *.
    The best that you can do is creating a covering index that at least includes the fields in the WHERE and the ORDER BY clause.

    But I think that that key already exists for your case.

    Have you done an index rebuild with a good fillfactor (generally 90% is not bad as a default for all tables).
    Regards,Alain Krikilion
    No PM,please use the forum. || May the <SOLVED>-attribute be in your title!


  • nav_studentnav_student Member Posts: 175
    Thanks Kriki,

    normally i check if the key exists on Dynamics NAV and on SQL (index), but i check only the fields on the ORDER BY.

    I have made a rebuild index with the values by default (Reorganize pages with the default amount of free space
    ) on SQL maintenance plans.


    The Fill factor is 90% (Change free space per page percentage to 10%) for all tables?

    According to Styrk
    To reduce the amount of "Page Splits" it is feasible to apply an "Index Fill-Factor" smaller than the standard 100%, leaving some free space for faster data insertion. With the corresponding MP task "Index Rebuild" this amount of free space could be defined; best practice recommendation is to apply 5% to 10% (= Fill-Factor of 95% or 90%).
    The attached script does not take care about FF, if required the FILLFACTOR clause has to be added (e.g. WITH FILLFACTOR = 90). Have in mind that those available standard features cannot apply specific FF according to the NAV requirements – sequential indexes (e.g. "Entry No.") might need a different FF (100%) than other indexes (FF on basis of relative index growth).

    http://dynamicsuser.net/blogs/stryk/archive/2010/02/10/technical-airlift-2009-munich-nav-sql-performance-optimization-indexes.aspx

    I´m a bit confused about the sequential indexes like Entry No.
  • krikikriki Member, Moderator Posts: 9,094
    The "Entry No." is perfect as clustered index because it is an always increasing value, so it will never generate page splits. When a page is full, it creates a new page. But we are sure we will never add a value in the middle of it. So for these indexes, it is best to have a fillfactor of 100%.
    But other indexes will have values added in the middle of it, so to avoid page splits as much as possible, we try to leave some empty space, but not too much because to lower the fillfactor, the slower reading can be.
    The problem with the rebuild index of SQL is that you cannot give a fillfactor per index, but only a fillfactor for all, so 90% to 95% is as best as we can get.

    The only tool I know that calculates the best fillfactor per index is is the tool of SQL perform, but they come at a price (I think a good price because they get the most out of your existing hardware instead of adding extra hardware).
    Regards,Alain Krikilion
    No PM,please use the forum. || May the <SOLVED>-attribute be in your title!


  • NaviDeveloper_NLNaviDeveloper_NL Member Posts: 42
    It's very strange to have only 2 reads and a duration of 892. A query with such a low number of reads is normally very fast. Are you sure these values are correct?
  • plc5700plc5700 Member Posts: 10
    You should not be looking at one single statement execution - but at many executions of a statement to see if there is anything you can optimize.
    The average execution time and read counts for the statement is very much relevant, but not for a single execution.

    If this only gets executed once - then you do not have a problem. However - I would expect this partcular statement to be execute many times within NAV.

    For this particular statement with only 2 reads - I would investigate if you have any IO_LATCH on your performance monitor while executing this statement.
    Best Regards

    Poul Lindholm Christiansen
    plc@cmcs.dk / +45 4050 9306
  • leebertleebert Member Posts: 2
    I'm not familiar with nav, but yes you can optimize wildcard searches in MS SQL Server, surely 2008, if not 2003. B/c of poor docu it became canonical that all wildcards are bad all the time. They are not, and there are workarounds.

    So, with a covering index, this is optimizable:

    select field1 from table1 where field1 like 'abcdefg%' ;

    but this is *not* optimizable

    select field1 from table1 where field1 like '%cdefg%' ;

    In fact this ability has existed in every DBMS engine for years. In Sybase 10, SQL 6, & DB/2 7, we would do this:

    field1
    BETWEEN "abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    AND "abcdefgzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"

    HOWEVER,

    It's possible to optimize a prefixed wildcard functional equivalent, by keeping a reversed version of those strings (in an ancilliary table) to accomodate prefixed wildcards.

    That is, storing abcdefg as gfedcba w/ a covering index
    so that the expression

    field1 LIKE "%defg"

    becomes

    field1reversed BETWEEN
    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaagfed'
    and
    'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzgfed'

    (assuming you've stored the string in a string-reversed indexed field ... the REVERSE() function is your pal(indrome).** )

    **really the function name oughtta be " ESREVER() "

    ===

    And I never tried it, I'd have to test this, if double-wildcards can be optimized:

    field1 LIKE '%cd%'

    might.. maybe... or not .... optimize as:

    field1 BETWEEN "cdaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" AND "cdzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
    AND
    field1rev BETWEEN 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaadc' and 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzdc'


    =====

    Test it. Your mileage may vary. A million+ rows might optimize just fine so long as the string search is sufficiently defined (minimum length string criteria....) so the plan doesn't escalate to a table scan. Even an index scan could be forced, however, w/ an index-only access directive (SQL Server has this, I believe).

    Perhaps parse the query plan first before running user queries, check against table scan escalations.

    That said, if a specialized seek-only research table will fit into RAM & is pinned in its own bufferpool, and that BP is itself pinned & won't page out (is Windows & SQL Server there yet?), further study might reveal that tablescans won't tax the machine overly much.

    And ... there's always full text search (FTS).
  • leebertleebert Member Posts: 2
    kriki wrote:
    The only tool I know that calculates the best fillfactor per index is is the tool of SQL perform, but they come at a price (I think a good price because they get the most out of your existing hardware instead of adding extra hardware).

    There's always the empirical approach ... benchmark test tables w/ the existing data & test a variety of fillfactors against the expected rowcount.
  • DenSterDenSter Member Posts: 8,304
    leebert wrote:
    There's always the empirical approach ...
    The good old trial and error :mrgreen:
  • strykstryk Member Posts: 645
    kriki wrote:
    The only tool I know that calculates the best fillfactor per index is is the tool of SQL perform, but they come at a price
    There are other tools available without any license fees and stuff :mrgreen:
    http://www.stryk.info/english/toolbox.html
    Jörg A. Stryk (MVP - Dynamics NAV)
    NAV/SQL Performance Optimization & Troubleshooting
    STRYK System Improvement
    The Blog - The Book - The Tool
  • strykstryk Member Posts: 645
    Hi guys

    i´m monitoring a process with SQL Profiler and i see this:
    CPU - 0
    Read - 2
    Writes - 0
    Duration - 892
    SELECT TOP 1 * FROM <database_companyname>$Value Entry" WITH (READUNCOMMITTED)   WHERE (("Item Ledger Entry No_"=@P1)) ORDER BY "Item Ledger Entry No_","Document No_","Document Line No_","Entry No_
    

    The Index Seek shows me Cost of 49%
    The Key Lookup shows me Cost of 52%

    I read that i can optimize Key Lookup by creating an index with Included Columns but with a wildcard (*) how can acomplish that?
    On the styrk´s book i see that every FINDSET, FINFIRST, ... generates a wilcard, on SQL Query.

    Can you help me?

    Have in mind that the "Cost" are just showing the ratio of the spent "effort", e.g. CPU time. Thus even a perfect operation might hav an operation costing "100%". So in your case we are talking about 49% and 52% od 0 milliseconds, means we're talking about microseconds here.
    Actually this query is performed optimally - from a SQL Server perspective. Problem is the remarkable difference between CPU time and Duration - where have these 800 msec been gone? This usuallx indicates I/O problems or blocks ...

    (And as kriki mentioned: to get rid of the "Key Lookup" you need to have covering indexes; creating such on large tables is not recommended!)
    Jörg A. Stryk (MVP - Dynamics NAV)
    NAV/SQL Performance Optimization & Troubleshooting
    STRYK System Improvement
    The Blog - The Book - The Tool
  • strykstryk Member Posts: 645
    kriki wrote:
    The "Entry No." is perfect as clustered index because it is an always increasing value, so it will never generate page splits

    This is not fully true. Yes, this will never create Page Splits. "Entry No." is a 4 Bytes integer, as the CI is used as "bookmark" info in all other NCI this keeps the indexes small.

    Big disadvantage with such sequential "Entry No." CI is, that it sorts the data contrary to the most common query requirements.
    E.g. "Value Entry" sorts by "Entry No.", but usually it is filtered by "Item Ledger Entry No.". This could mean, that VE belonging to a ILE are distributed over wide areas of the VE table, which could increase the effort for such "Key Lookups" - in worst case SQL starts scanning the CI! Having the VE clustered by "Item Ledger Entry No.", "Entry No." arranges the data according to this query requirement, the records are physically in a smaller range. This could dramatically improve certain queries, e.g. "Item Adjust Cost" routines etc.!

    Thus, generally it is a smart idea to have the CI on basis of the unique PK - but there might be reasons to change this. As so often, there's PRO and CONTRA ... it depends ...
    Jörg A. Stryk (MVP - Dynamics NAV)
    NAV/SQL Performance Optimization & Troubleshooting
    STRYK System Improvement
    The Blog - The Book - The Tool
Sign In or Register to comment.