How optimize Key Lookup with Wildcard on SELECT?

nav_student · 2011-09-28

Hi guys

i´m monitoring a process with SQL Profiler and i see this:
CPU - 0
Read - 2
Writes - 0
Duration - 892

SELECT TOP 1 * FROM <database_companyname>$Value Entry" WITH (READUNCOMMITTED)   WHERE (("Item Ledger Entry No_"=@P1)) ORDER BY "Item Ledger Entry No_","Document No_","Document Line No_","Entry No_

The Index Seek shows me Cost of 49%
The Key Lookup shows me Cost of 52%

I read that i can optimize Key Lookup by creating an index with Included Columns but with a wildcard (*) how can acomplish that?
On the styrk´s book i see that every FINDSET, FINFIRST, ... generates a wilcard, on SQL Query.

Can you help me?

kriki · 2011-10-05

The technique is called "Covering index", but you can't do it with a SELECT *.
The best that you can do is creating a covering index that at least includes the fields in the WHERE and the ORDER BY clause.

But I think that that key already exists for your case.

Have you done an index rebuild with a good fillfactor (generally 90% is not bad as a default for all tables).

nav_student · 2011-10-06

Thanks Kriki,

normally i check if the key exists on Dynamics NAV and on SQL (index), but i check only the fields on the ORDER BY.

I have made a rebuild index with the values by default (Reorganize pages with the default amount of free space
) on SQL maintenance plans.

The Fill factor is 90% (Change free space per page percentage to 10%) for all tables?

According to Styrk
To reduce the amount of "Page Splits" it is feasible to apply an "Index Fill-Factor" smaller than the standard 100%, leaving some free space for faster data insertion. With the corresponding MP task "Index Rebuild" this amount of free space could be defined; best practice recommendation is to apply 5% to 10% (= Fill-Factor of 95% or 90%).
The attached script does not take care about FF, if required the FILLFACTOR clause has to be added (e.g. WITH FILLFACTOR = 90). Have in mind that those available standard features cannot apply specific FF according to the NAV requirements – sequential indexes (e.g. "Entry No.") might need a different FF (100%) than other indexes (FF on basis of relative index growth).
http://dynamicsuser.net/blogs/stryk/archive/2010/02/10/technical-airlift-2009-munich-nav-sql-performance-optimization-indexes.aspx

I´m a bit confused about the sequential indexes like Entry No.

kriki · 2011-10-06

The "Entry No." is perfect as clustered index because it is an always increasing value, so it will never generate page splits. When a page is full, it creates a new page. But we are sure we will never add a value in the middle of it. So for these indexes, it is best to have a fillfactor of 100%.
But other indexes will have values added in the middle of it, so to avoid page splits as much as possible, we try to leave some empty space, but not too much because to lower the fillfactor, the slower reading can be.
The problem with the rebuild index of SQL is that you cannot give a fillfactor per index, but only a fillfactor for all, so 90% to 95% is as best as we can get.

The only tool I know that calculates the best fillfactor per index is is the tool of SQL perform, but they come at a price (I think a good price because they get the most out of your existing hardware instead of adding extra hardware).

NaviDeveloper_NL · 2011-10-24

It's very strange to have only 2 reads and a duration of 892. A query with such a low number of reads is normally very fast. Are you sure these values are correct?

plc5700 · 2011-10-24

You should not be looking at one single statement execution - but at many executions of a statement to see if there is anything you can optimize.
The average execution time and read counts for the statement is very much relevant, but not for a single execution.

If this only gets executed once - then you do not have a problem. However - I would expect this partcular statement to be execute many times within NAV.

For this particular statement with only 2 reads - I would investigate if you have any IO_LATCH on your performance monitor while executing this statement.

leebert · 2011-12-08

I'm not familiar with nav, but yes you can optimize wildcard searches in MS SQL Server, surely 2008, if not 2003. B/c of poor docu it became canonical that all wildcards are bad all the time. They are not, and there are workarounds.

So, with a covering index, this is optimizable:

select field1 from table1 where field1 like 'abcdefg%' ;

but this is *not* optimizable

select field1 from table1 where field1 like '%cdefg%' ;

In fact this ability has existed in every DBMS engine for years. In Sybase 10, SQL 6, & DB/2 7, we would do this:

field1
BETWEEN "abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
AND "abcdefgzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"

HOWEVER,

It's possible to optimize a prefixed wildcard functional equivalent, by keeping a reversed version of those strings (in an ancilliary table) to accomodate prefixed wildcards.

That is, storing abcdefg as gfedcba w/ a covering index
so that the expression

field1 LIKE "%defg"

becomes

field1reversed BETWEEN
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaagfed'
and
'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzgfed'

(assuming you've stored the string in a string-reversed indexed field ... the REVERSE() function is your pal(indrome).** )

**really the function name oughtta be " ESREVER() "

===

And I never tried it, I'd have to test this, if double-wildcards can be optimized:

field1 LIKE '%cd%'

might.. maybe... or not .... optimize as:

field1 BETWEEN "cdaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" AND "cdzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
AND
field1rev BETWEEN 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaadc' and 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzdc'

=====

Test it. Your mileage may vary. A million+ rows might optimize just fine so long as the string search is sufficiently defined (minimum length string criteria....) so the plan doesn't escalate to a table scan. Even an index scan could be forced, however, w/ an index-only access directive (SQL Server has this, I believe).

Perhaps parse the query plan first before running user queries, check against table scan escalations.

That said, if a specialized seek-only research table will fit into RAM & is pinned in its own bufferpool, and that BP is itself pinned & won't page out (is Windows & SQL Server there yet?), further study might reveal that tablescans won't tax the machine overly much.

And ... there's always full text search (FTS).

leebert · 2011-12-08

kriki wrote:

The only tool I know that calculates the best fillfactor per index is is the tool of SQL perform, but they come at a price (I think a good price because they get the most out of your existing hardware instead of adding extra hardware).

There's always the empirical approach ... benchmark test tables w/ the existing data & test a variety of fillfactors against the expected rowcount.

DenSter · 2011-12-08

leebert wrote:

There's always the empirical approach ...

The good old trial and error

stryk · 2011-12-10

kriki wrote:

The only tool I know that calculates the best fillfactor per index is is the tool of SQL perform, but they come at a price

There are other tools available without any license fees and stuff

http://www.stryk.info/english/toolbox.html

stryk · 2011-12-10

nav_student wrote:
Hi guys

i´m monitoring a process with SQL Profiler and i see this:
CPU - 0
Read - 2
Writes - 0
Duration - 892
SELECT TOP 1 * FROM <database_companyname>$Value Entry" WITH (READUNCOMMITTED)   WHERE (("Item Ledger Entry No_"=@P1)) ORDER BY "Item Ledger Entry No_","Document No_","Document Line No_","Entry No_
The Index Seek shows me Cost of 49%
The Key Lookup shows me Cost of 52%

I read that i can optimize Key Lookup by creating an index with Included Columns but with a wildcard (*) how can acomplish that?
On the styrk´s book i see that every FINDSET, FINFIRST, ... generates a wilcard, on SQL Query.

Can you help me?

Have in mind that the "Cost" are just showing the ratio of the spent "effort", e.g. CPU time. Thus even a perfect operation might hav an operation costing "100%". So in your case we are talking about 49% and 52% od 0 milliseconds, means we're talking about microseconds here.
Actually this query is performed optimally - from a SQL Server perspective. Problem is the remarkable difference between CPU time and Duration - where have these 800 msec been gone? This usuallx indicates I/O problems or blocks ...

(And as kriki mentioned: to get rid of the "Key Lookup" you need to have covering indexes; creating such on large tables is not recommended!)

stryk · 2011-12-10

kriki wrote:

The "Entry No." is perfect as clustered index because it is an always increasing value, so it will never generate page splits

This is not fully true. Yes, this will never create Page Splits. "Entry No." is a 4 Bytes integer, as the CI is used as "bookmark" info in all other NCI this keeps the indexes small.

Big disadvantage with such sequential "Entry No." CI is, that it sorts the data contrary to the most common query requirements.
E.g. "Value Entry" sorts by "Entry No.", but usually it is filtered by "Item Ledger Entry No.". This could mean, that VE belonging to a ILE are distributed over wide areas of the VE table, which could increase the effort for such "Key Lookups" - in worst case SQL starts scanning the CI! Having the VE clustered by "Item Ledger Entry No.", "Entry No." arranges the data according to this query requirement, the records are physically in a smaller range. This could dramatically improve certain queries, e.g. "Item Adjust Cost" routines etc.!

Thus, generally it is a smart idea to have the CI on basis of the unique PK - but there might be reasons to change this. As so often, there's PRO and CONTRA ... it depends ...

How optimize Key Lookup with Wildcard on SELECT?

Comments

Categories