Data Cleaning & Matchin

SoloperSoloper Member Posts: 102
edited 2010-05-25 in SQL General
Hi experts;

I need an advice for a specific issue.

I have a table with 65000 records. There are duplicate records in this table. These duplicate records depends on various fields. For example;

If name, surname and email info of a customer is the same;
If name, surname and mobile info of a customer is the same;
If name, surname and one of the other phone numbers is the same etc..

I need to match the records with these rules. By the way, I shouldnt delete duplicates, mark them with a number or another way and keep in the table.

Can you please share any usefull info? Should I use just TSQL and some queries? Any other specific software? Anyone has experience?

Thanks.

Comments

  • kinekine Member Posts: 12,562
    What about just looping through the entries, setting the filter for the selected fields on second record variable, looping the result (skipping the record with same primary key as the base record) and processing the duplicities? (and of course skip already processed record in the base loop).

    Where is the problem? :wink:
    Kamil Sacek
    MVP - Dynamics NAV
    My BLOG
    NAVERTICA a.s.
  • SoloperSoloper Member Posts: 102
    edited 2010-05-25
    I tried it but query works too long and looks like the process will never finish. In this case I think query processes 60.000 * 60.000 records.

    What should I do?
  • kinekine Member Posts: 12,562
    Native or SQL? If Native, have you selected correct key when filtering? Do you see the "Counter" counting the records?
    Kamil Sacek
    MVP - Dynamics NAV
    My BLOG
    NAVERTICA a.s.
  • SoloperSoloper Member Posts: 102
    SQL.

    I tried it in sql query builder. Here is the query;

    USE ProjectDB

    UPDATE dbo.[BASETable]

    SET DuplicateRecNo = (

    SELECT TOP 1 RecordNo
    FROM dbo.[ReferenceTable]
    WHERE [ReferenceTable].Name = [BASETable].Name AND
    [ReferenceTable].LastName = [BASETable].LastName AND
    [ReferenceTable].EMail = [BASETable].EMail)

    Thanks Kine;
  • kinekine Member Posts: 12,562
    It will mark all as duplicity, because it will at least find itself... ;-)

    Question is your code in NAV. Of course, going through 60000 record and filtering for 3 sets of filters could take some time, it is on you to display some dialog with the progress and check if all is optimized (indexes).
    Kamil Sacek
    MVP - Dynamics NAV
    My BLOG
    NAVERTICA a.s.
Sign In or Register to comment.