Need to cleanse a sheet with 13k rows, help!

shivakanou

New Member
Joined
May 19, 2015
Messages
8
Hello everybody!

I'm new on this forum and ended up here searching for solutions to my problem.

I was assigned to cleanse an Excell sheet with 13k rows, all these rows are clients informations like name and address and the reason is due to the fact that our employees registered the clients with different informations (i.e Sa, S/A, SA instead of S.A. & Ltda, ltda, l.t.d.a instead of LTDA.), so now we need to standardize everything.

I've been looking some formulas, but so far I have no idea how to turn this process into something easier instead of dealing with every row of information (that would take quite some time).

I hope somebody here can help me :(

Thank you!
 

Excel Facts

Create a Pivot Table on a Map
If your data has zip codes, postal codes, or city names, select the data and use Insert, 3D Map. (Found to right of chart icons).
Welcome to the forum, you should look at using Find and Replace to assist with some of this. As you will have many records with different client information you should be able to change certain parts fairly quickly.
 
Upvote 0
Welcome to the forum, you should look at using Find and Replace to assist with some of this. As you will have many records with different client information you should be able to change certain parts fairly quickly.

Any suggestion on how to deal with the other records?
Like when people typped "uni." and "univer." instead of "university"?

Thanks!
 
Upvote 0
Try the sample below in a standard module. Change the word "Dictionary" in the code to the name of the sheet containing the word substitution list or name the sheet "Dictionary". Words, word fragments and/or phrases to be found should be listed in column A of the "Dictionary" sheet, words or phrases to be substituted should be in the adjacent cell of column B of the "Dictionary" sheet. Activate the sheet that requires the substitution(s) before running the code. Please try it on a backup copy first. It is generally best to list the words to be found (column A) from longest number of characters to shortest number of characters. Also include leading and/or trailing spaces in column A. Especially for word fragments.

You may be able to find a procedure on this forum that will split sentences and phrases into individual words that can be used in column A if you don't have that list already.

Hope it helps.

Gary

In a standard module:
Code:
Public Sub WordSwap()

Dim sFind As String
Dim sReplace As String
Dim oDictionary As Worksheet
Dim oWordList As Range
Dim oCell As Range
Dim oSearch As Range

Set oDictionary = ThisWorkbook.Worksheets("Dictionary") ' Change "Dictionary" to the name of the sheet containing word list
Set oWordList = oDictionary.Range("A1:A" & oDictionary.Range("A" & Rows.Count).End(xlUp).Row)

Set oSearch = ActiveSheet.UsedRange

For Each oCell In oWordList
    sFind = oCell.Text
    sReplace = oCell.Offset(0, 1).Text
    oSearch.Replace what:=sFind, replacement:=sReplace, Lookat:=xlPart, MatchCase:=False
Next oCell

'"Dictionary" Worksheet setup

'Column A       Column B
'unversity      university  < List longer words first
'univer.        university
'uni.           university
'uni            university  < needs trailing space to preclude 'Universityversity' etc
'Sa             S.A.
'S/A            S.A.
'SA             S.A.
'LTDA           LTDA
'LTDA           LTDA
'l.t.d.a        LTDA
        
'^ Find this    ^ Replaces with this

End Sub
 
Upvote 0
Try the sample below in a standard module. Change the word "Dictionary" in the code to the name of the sheet containing the word substitution list or name the sheet "Dictionary". Words, word fragments and/or phrases to be found should be listed in column A of the "Dictionary" sheet, words or phrases to be substituted should be in the adjacent cell of column B of the "Dictionary" sheet. Activate the sheet that requires the substitution(s) before running the code. Please try it on a backup copy first. It is generally best to list the words to be found (column A) from longest number of characters to shortest number of characters. Also include leading and/or trailing spaces in column A. Especially for word fragments.

You may be able to find a procedure on this forum that will split sentences and phrases into individual words that can be used in column A if you don't have that list already.

Hope it helps.

Gary

In a standard module:
Code:
Public Sub WordSwap()

Dim sFind As String
Dim sReplace As String
Dim oDictionary As Worksheet
Dim oWordList As Range
Dim oCell As Range
Dim oSearch As Range

Set oDictionary = ThisWorkbook.Worksheets("Dictionary") ' Change "Dictionary" to the name of the sheet containing word list
Set oWordList = oDictionary.Range("A1:A" & oDictionary.Range("A" & Rows.Count).End(xlUp).Row)

Set oSearch = ActiveSheet.UsedRange

For Each oCell In oWordList
    sFind = oCell.Text
    sReplace = oCell.Offset(0, 1).Text
    oSearch.Replace what:=sFind, replacement:=sReplace, Lookat:=xlPart, MatchCase:=False
Next oCell

'"Dictionary" Worksheet setup

'Column A       Column B
'unversity      university  < List longer words first
'univer.        university
'uni.           university
'uni            university  < needs trailing space to preclude 'Universityversity' etc
'Sa             S.A.
'S/A            S.A.
'SA             S.A.
'LTDA           LTDA
'LTDA           LTDA
'l.t.d.a        LTDA
        
'^ Find this    ^ Replaces with this

End Sub


Thank you! I will try this for sure!

One more thing, just to validade the exact word that I'm looking to replace.

I have some knowledge on DB's so I know that in SQL I can determine the exact word I'm looking for, like:
"%sa" if there can be anything before sa (sa% same, but after).

Is there any way to do it in excell?

Thank you!
 
Upvote 0
In the following line, "Lookat:=xlPart" causes it to look for the specified character string (in column A, this example) anywhere in the word, phrase, or sentence. You can change that to "Lookat:xlWhole" to operate on only full words(exact match for entire cell). MatchCase can also be set to true if needed.

Code:
oSearch.Replace what:=sFind, replacement:=sReplace, Lookat:=xlPart, MatchCase:=False
 
Upvote 0
The sample below will create a list (single instance) of every individual word (separated by one or more spaces) in your data. You can use it for the column A data of the previous procedure.

Code:
Public Sub Create_Word_List()

Dim oDict As Object ' Use above line to enable intellisense for dictionary object
Dim oTarget As Range
Dim oCell As Range
Dim sDelimiter As String
Dim vSplit As Variant
Dim vElement As Variant
Dim oSheet As Worksheet
Dim lLastRow As Long
Dim lLastCol As Long
Dim iMinLength As Integer

'Sheet from which the words are to be extracted must be active prior to execution

lLastRow = ActiveSheet.Cells.Find(what:="*", SearchDirection:=xlPrevious, SearchOrder:=xlByRows).Row
lLastCol = ActiveSheet.Cells.Find(what:="*", SearchDirection:=xlPrevious, SearchOrder:=xlByColumns).Column

Set oDict = CreateObject("scripting.dictionary")

iMinLength = 1 ' The shortest word length to process (number of characters)
sDelimiter = " " 'Split sentences and phrases at this character (blank space recommended, may be more than one character)

Set oTarget = ActiveSheet.Range("A1").Resize(lLastRow, lLastCol)

For Each oCell In oTarget

    vSplit = Split(oCell.Text, sDelimiter)
    
    For Each vElement In vSplit
    
        If Not oDict.Exists(vElement) Then
            oDict.Add vElement, 0
        End If
        
    Next vElement

Next oCell

On Error Resume Next

ThisWorkbook.Worksheets("Wordlist").Activate

If Err Then

    Set oSheet = ThisWorkbook.Worksheets.Add
    oSheet.Name = "Wordlist"
    Err.Clear
    
 Else
 
    Set oSheet = ThisWorkbook.Worksheets("Wordlist")

End If

On Error GoTo 0

oSheet.UsedRange.ClearContents

Set oCell = oSheet.Range("A1")

For Each vElement In oDict

      oCell.Value = vElement
      Set oCell = oCell.Offset(1, 0)

Next vElement

End Sub
 
Upvote 0
I have worked a large cleansing project and avoided such an auto replacement method. Something about paranoia and Murphy's Law, including the Fifth.
I think you should, for due diligence and accuracy, still 'touch' each and every record.
We also had an issue with companies with extremely similar names yet separate business entities.
We validate Business names and addresses. DUNS numbering and/or BvD numbering was key to cleansing the many large companies.
I very much like McMaster's solution to your request, but am concerned as to certain remaining issues for your project....
 
Upvote 0

Forum statistics

Threads
1,223,246
Messages
6,170,987
Members
452,373
Latest member
TimReeks

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top