Data Cleaning & Matchin

Soloper
Member Posts: 102
Hi experts;
I need an advice for a specific issue.
I have a table with 65000 records. There are duplicate records in this table. These duplicate records depends on various fields. For example;
If name, surname and email info of a customer is the same;
If name, surname and mobile info of a customer is the same;
If name, surname and one of the other phone numbers is the same etc..
I need to match the records with these rules. By the way, I shouldnt delete duplicates, mark them with a number or another way and keep in the table.
Can you please share any usefull info? Should I use just TSQL and some queries? Any other specific software? Anyone has experience?
Thanks.
I need an advice for a specific issue.
I have a table with 65000 records. There are duplicate records in this table. These duplicate records depends on various fields. For example;
If name, surname and email info of a customer is the same;
If name, surname and mobile info of a customer is the same;
If name, surname and one of the other phone numbers is the same etc..
I need to match the records with these rules. By the way, I shouldnt delete duplicates, mark them with a number or another way and keep in the table.
Can you please share any usefull info? Should I use just TSQL and some queries? Any other specific software? Anyone has experience?
Thanks.
0
Comments
-
What about just looping through the entries, setting the filter for the selected fields on second record variable, looping the result (skipping the record with same primary key as the base record) and processing the duplicities? (and of course skip already processed record in the base loop).
Where is the problem?0 -
I tried it but query works too long and looks like the process will never finish. In this case I think query processes 60.000 * 60.000 records.
What should I do?0 -
Native or SQL? If Native, have you selected correct key when filtering? Do you see the "Counter" counting the records?0
-
SQL.
I tried it in sql query builder. Here is the query;
USE ProjectDB
UPDATE dbo.[BASETable]
SET DuplicateRecNo = (
SELECT TOP 1 RecordNo
FROM dbo.[ReferenceTable]
WHERE [ReferenceTable].Name = [BASETable].Name AND
[ReferenceTable].LastName = [BASETable].LastName AND
[ReferenceTable].EMail = [BASETable].EMail)
Thanks Kine;0 -
It will mark all as duplicity, because it will at least find itself... ;-)
Question is your code in NAV. Of course, going through 60000 record and filtering for 3 sets of filters could take some time, it is on you to display some dialog with the progress and check if all is optimized (indexes).0
Categories
- All Categories
- 73 General
- 73 Announcements
- 66.6K Microsoft Dynamics NAV
- 18.7K NAV Three Tier
- 38.4K NAV/Navision Classic Client
- 3.6K Navision Attain
- 2.4K Navision Financials
- 116 Navision DOS
- 851 Navision e-Commerce
- 1K NAV Tips & Tricks
- 772 NAV Dutch speaking only
- 617 NAV Courses, Exams & Certification
- 2K Microsoft Dynamics-Other
- 1.5K Dynamics AX
- 320 Dynamics CRM
- 111 Dynamics GP
- 10 Dynamics SL
- 1.5K Other
- 990 SQL General
- 383 SQL Performance
- 34 SQL Tips & Tricks
- 35 Design Patterns (General & Best Practices)
- 1 Architectural Patterns
- 10 Design Patterns
- 5 Implementation Patterns
- 53 3rd Party Products, Services & Events
- 1.6K General
- 1.1K General Chat
- 1.6K Website
- 83 Testing
- 1.2K Download section
- 23 How Tos section
- 252 Feedback
- 12 NAV TechDays 2013 Sessions
- 13 NAV TechDays 2012 Sessions