crimsonchin68
New Member
- Joined
- Sep 28, 2018
- Messages
- 2
Hi MrExcel,
I'm working with a large data dump (>3000 rows) that contains multiple columns but only two values that I'm primarily interested in: messages and message codes. Every message has a code and every code has a message, and what is supposed to be true is that every code has exactly one message and every message text has exactly one code. However, the input method for codes and messages is manual, meaning sometimes mistakes occur. Generally, this manifests itself as a typo in a message, leading to two different messages for the same code. It is also possible that there are identical messages with two or more different codes. I'm interested in finding those examples of duplicate codes and duplicate messages.
Right now we're using a pivot table with messages and codes as the axes. If my data was perfect there would be only one column per row filled in, and the value filled in would represent the number of times the code and its corresponding message appears in the system. When there are mistakes, multiple columns are populated, meaning that one message is used X times with code Z and Y times with code W. In the right of this table we have a "grand total" per uses of each message. We then use a COUNTIF formula to see if a row's grand total appears elsewhere in the same row, which would mean that there is only one code for the message. A "1" populates the COUNTIF column if the message has one code and "0" if it does not. We then use these rows tabbed "0" to filter the original data dump, because at that point the data beyond message and message code is important.
This seems to work in identifying codes that have multiple different messages, but in using VLOOKUP sometimes identifying messages that have multiple different codes fails because of the 255 character limit. So I have two questions: 1) can you think of a generally more efficient way of doing this process, or 2) can you think of a workaround to the VLOOKUP #VALUE error that occurs after the pivot table step?
I'm working with a large data dump (>3000 rows) that contains multiple columns but only two values that I'm primarily interested in: messages and message codes. Every message has a code and every code has a message, and what is supposed to be true is that every code has exactly one message and every message text has exactly one code. However, the input method for codes and messages is manual, meaning sometimes mistakes occur. Generally, this manifests itself as a typo in a message, leading to two different messages for the same code. It is also possible that there are identical messages with two or more different codes. I'm interested in finding those examples of duplicate codes and duplicate messages.
Right now we're using a pivot table with messages and codes as the axes. If my data was perfect there would be only one column per row filled in, and the value filled in would represent the number of times the code and its corresponding message appears in the system. When there are mistakes, multiple columns are populated, meaning that one message is used X times with code Z and Y times with code W. In the right of this table we have a "grand total" per uses of each message. We then use a COUNTIF formula to see if a row's grand total appears elsewhere in the same row, which would mean that there is only one code for the message. A "1" populates the COUNTIF column if the message has one code and "0" if it does not. We then use these rows tabbed "0" to filter the original data dump, because at that point the data beyond message and message code is important.
This seems to work in identifying codes that have multiple different messages, but in using VLOOKUP sometimes identifying messages that have multiple different codes fails because of the 255 character limit. So I have two questions: 1) can you think of a generally more efficient way of doing this process, or 2) can you think of a workaround to the VLOOKUP #VALUE error that occurs after the pivot table step?