Hi everyone, hope you are safe and healthy!!
I would like to hear ideas and or code to find and write matches. I have a column with hundreds to thousands of rows of data and would like to find the closest matches within the same column to normalize the list.
Desired output:
To consider:
- Showing above only related strings, column has hundreds/thousands in random order
- Most of the values have more than 1 string within the cell
- ","(commas)
- "."(points)
- " "(single spaces)
- " "(multiple spaces or other)
It might be easier to get the following output based on shorter and longer matches....:
Thank you, and have a great and safe weekend!!!
I would like to hear ideas and or code to find and write matches. I have a column with hundreds to thousands of rows of data and would like to find the closest matches within the same column to normalize the list.
finding matches.xlsx | |||
---|---|---|---|
L | |||
2 | Original | ||
3 | John Doe | ||
4 | John F Doe | ||
5 | John Frank Doe Robbins | ||
6 | John Frand Doe | ||
7 | John D Doe R | ||
8 | John F. Doe | ||
9 | John Doe | ||
10 | Doe, John | ||
Start |
Desired output:
finding matches.xlsx | ||||||
---|---|---|---|---|---|---|
L | M | N | O | |||
2 | Original | Proposed 1 | Proposed 2 | Proposed 3 | ||
3 | John Doe | John Doe | John F. Doe | John Frank Doe Robbins | ||
4 | John F Doe | John Doe | John F. Doe | John Frank Doe Robbins | ||
5 | John Frank Doe Robbins | John Doe | John F. Doe | John Frank Doe Robbins | ||
6 | John Frand Doe | John Doe | John F. Doe | John Frank Doe Robbins | ||
7 | John D Doe R | John Doe | John F. Doe | John Frank Doe Robbins | ||
8 | John F. Doe | John Doe | John F. Doe | John Frank Doe Robbins | ||
9 | John Doe | John Doe | John F. Doe | John Frank Doe Robbins | ||
10 | Doe, John | John Doe | John F. Doe | John Frank Doe Robbins | ||
Start |
To consider:
- Showing above only related strings, column has hundreds/thousands in random order
- Most of the values have more than 1 string within the cell
- ","(commas)
- "."(points)
- " "(single spaces)
- " "(multiple spaces or other)
It might be easier to get the following output based on shorter and longer matches....:
finding matches.xlsx | ||||||
---|---|---|---|---|---|---|
L | M | O | ||||
2 | Original | Proposed 1 | Proposed 3 | |||
3 | John Doe | John Doe | John Frank Doe Robbins | |||
4 | John F Doe | John Doe | John Frank Doe Robbins | |||
5 | John Frank Doe Robbins | John Doe | John Frank Doe Robbins | |||
6 | John Frand Doe | John Doe | John Frank Doe Robbins | |||
7 | John D Doe R | John Doe | John Frank Doe Robbins | |||
8 | John F. Doe | John Doe | John Frank Doe Robbins | |||
9 | John Doe | John Doe | John Frank Doe Robbins | |||
10 | Doe, John | John Doe | John Frank Doe Robbins | |||
Start |
Thank you, and have a great and safe weekend!!!