Identifying Duplicate Values

crimsonchin68 · Sep 28, 2018

Hi MrExcel,

I'm working with a large data dump (>3000 rows) that contains multiple columns but only two values that I'm primarily interested in: messages and message codes. Every message has a code and every code has a message, and what is supposed to be true is that every code has exactly one message and every message text has exactly one code. However, the input method for codes and messages is manual, meaning sometimes mistakes occur. Generally, this manifests itself as a typo in a message, leading to two different messages for the same code. It is also possible that there are identical messages with two or more different codes. I'm interested in finding those examples of duplicate codes and duplicate messages.

Right now we're using a pivot table with messages and codes as the axes. If my data was perfect there would be only one column per row filled in, and the value filled in would represent the number of times the code and its corresponding message appears in the system. When there are mistakes, multiple columns are populated, meaning that one message is used X times with code Z and Y times with code W. In the right of this table we have a "grand total" per uses of each message. We then use a COUNTIF formula to see if a row's grand total appears elsewhere in the same row, which would mean that there is only one code for the message. A "1" populates the COUNTIF column if the message has one code and "0" if it does not. We then use these rows tabbed "0" to filter the original data dump, because at that point the data beyond message and message code is important.

This seems to work in identifying codes that have multiple different messages, but in using VLOOKUP sometimes identifying messages that have multiple different codes fails because of the 255 character limit. So I have two questions: 1) can you think of a generally more efficient way of doing this process, or 2) can you think of a workaround to the VLOOKUP #VALUE error that occurs after the pivot table step?

RileyC · Sep 28, 2018

Do you have a list of the unique messages and codes? Are there sometimes two or more messages and two or more codes in the same columns? I would first separate each message and code so they're unique values (1-1) then compare it to the unique list to make sure that each one doesn't have any input errors and flag the ones that do.

Edit: also using match and index might be a way around the limit, can you use your own functions in VBA?

crimsonchin68 · Sep 28, 2018

Unfortunately what I'm dealing with is supposed to be the master list of messages and their codes (it's definitely a flawed system) so there's no list that contains every correct message and its correct code. What I'm thinking of doing right now is using VBA to create a hash of each message to compare rather than using the messages themselves.

RileyC · Sep 28, 2018

I'd start with the ones you do know are correct and create a new excel file that is the start of a master list of the codes. How do you know when a code is right or wrong?

Identifying Duplicate Values

crimsonchin68

New Member

Excel Facts

RileyC

New Member

crimsonchin68

New Member

RileyC

New Member

Similar threads

Forum statistics

Share this page

Identifying Duplicate Values

crimsonchin68

New Member

Excel Facts

RileyC

New Member

crimsonchin68

New Member

RileyC

New Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock