excelstarter1
Board Regular
- Joined
- Jul 20, 2017
- Messages
- 81
Hello VBA pros,
I have a tricky problem (well it is for me) regarding duplicates in a dataset. Regularly I receive manually generated customer lists from our finance/sales or another department including an overview of customers and their revenues in a specific period (customer name in column A and revenue in column B). Now the issues with those lists are duplicates, e.g. the customer Walmart appears several times as Walmart USA, Walmart (NY), Walmartt (with a typo), USA Walmart, NY Walmart and so on. I want to find all those duplicates and cluster the customers. The simple Excel function would not work. I already tried text functions like LEFT (for 3 characters, then run find duplicates). However, they are not sufficient.
Is there a way with VBA to identify duplicates by comparing the content of each individual cell to the full dataset (e.g. 1000 rows with customer names)? As the duplicate name could appear at the beginning (e.g. Walmart USA), the end (USA Walmart) or the middle (USA, Walmart, NY), it would make sense not to compare the full content of the cell but only 5 or 6 characters in order to take this issue into account.
Would be great if you could help me with a couples of ideas!
Thanks a lot!
Best regards
I have a tricky problem (well it is for me) regarding duplicates in a dataset. Regularly I receive manually generated customer lists from our finance/sales or another department including an overview of customers and their revenues in a specific period (customer name in column A and revenue in column B). Now the issues with those lists are duplicates, e.g. the customer Walmart appears several times as Walmart USA, Walmart (NY), Walmartt (with a typo), USA Walmart, NY Walmart and so on. I want to find all those duplicates and cluster the customers. The simple Excel function would not work. I already tried text functions like LEFT (for 3 characters, then run find duplicates). However, they are not sufficient.
Is there a way with VBA to identify duplicates by comparing the content of each individual cell to the full dataset (e.g. 1000 rows with customer names)? As the duplicate name could appear at the beginning (e.g. Walmart USA), the end (USA Walmart) or the middle (USA, Walmart, NY), it would make sense not to compare the full content of the cell but only 5 or 6 characters in order to take this issue into account.
Would be great if you could help me with a couples of ideas!
Thanks a lot!
Best regards