Raw data--cleansing macro?

tc88

Board Regular
Joined
Jul 6, 2011
Messages
80
Hello all-<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
This is my first post here, but I have been browsing for the past couple of weeks at my new job. I am an intern in the accounting/finance department at a publicly traded company here in the US and we are currently working on reducing waste in business processes. One of those processes revolves around exporting data from SAP into Excel, fixing some of the data, and then removing irrelevant data. I am using Excel 2010 on Windows XP SP 3.
The first step I have set up is to check if any of the data needs to be reformatted (it is missing a “0” in the middle of the number). I have the formula as follows, and it works correctly (Column M in dummy data below)
Code:
=IF(OR(LEFT(C2,6)="104001",LEFT(C2,6)="104003",LEFT(C2,6)="104004"),CONCATENATE("CTR ",LEFT(C2,2),"0",MID(C2,3,4)),D2)
<o:p></o:p>

From there, I use a nested IF based on the column for the above formula. That formula is as follows (Column N in dummy data below)
Code:
=IF(OR(M2="CTR 1004001",M2="CTR 1004003",M2="CTR 1004004"),M2,IF(AND(RIGHT(D2,5)>="12475",RIGHT(D2,5)<="12739"),D2,"IRRELEVANT"))
<o:p></o:p>

I have that column conditionally formatted to show “IRRELEVANT” with a red fill, and up to this point I have absolutely no problems and the formulas appear to work correctly. However, from there, I must manually delete all “IRRELEVANT” rows to get out the data for our specific facility. If this were a few rows it wouldn’t be a big deal but the raw data is ~2000 rows and my manually cleansed data is ~600 rows, so as you can see I am manually clearing out quite a bit of data.<o:p></o:p>
This is a process that is done monthly, and while I believe a macro is the best way to do this, I don’t have any experience writing macros/using VBA. I consider myself a strong Excel user, but I am not an expert.<o:p></o:p>

Below is some dummy data so you have a better idea of what I am talking about:

Excel Workbook
BCDEFGHIJKLMN
1Cost ElementNameAuxAcctAs1Name of offsetting accountOffst.acctOrderOffActCoCdVal.in rep.cur.Postg DateCreated onFormatting FixRelevant?
2Facilities10357E EN DUMMY DATA13967010009,956.0005/31/201106/02/2011IRRELEVANT
3Facilities104004-PRJ DUMMY DATA15629510002,645.0005/31/201106/01/2011CTR 1004004CTR 1004004
4Fringe10384 BD DUMMY DATA115746100021.5605/31/201106/02/2011IRRELEVANT
5Fringe10436 RD DUMMY DATA1157461000545.2105/31/201106/02/2011IRRELEVANT
6GeneralJCTE000204193CTR 12708A/P AMCL210036154710S10005.8905/04/201105/05/2011CTR 12708CTR 12708
7GeneralJCTE000204193CTR 1004003A/P AMCL210036156295S100016.6705/04/201105/05/2011CTR 1004003CTR 1004003
8GeneralJCTE000204193CTR 1004003A/P AMCL210036157010S10006.2805/04/201105/05/2011CTR 1004003CTR 1004003
9Other104003_DSTDUMMYDATA1564301000-2,215.6505/31/201106/02/2011CTR 1004003CTR 1004003
10Other12335_DST DISTRDUMMY DATA1569501000-312.4505/31/201106/02/2011IRRELEVANT
Sheet1
Excel 2010
Cell Formulas
RangeFormula
M2=IF(OR(LEFT(C2,6)="104001",LEFT(C2,6)="104003",LEFT(C2,6)="104004"),CONCATENATE("CTR ",LEFT(C2,2),"0",MID(C2,3,4)),D2)
N2=IF(OR(M2="CTR 1004001",M2="CTR 1004003",M2="CTR 1004004"),M2,IF(AND(RIGHT(D2,5)>="12475",RIGHT(D2,5)<="12739"),D2,"IRRELEVANT"))



Thanks in advance for any assistance!
<o:p></o:p>
 
Last edited:
I'm aware that the "Sheet1.Range("N2").Select" isn't the best practice, but for the life of me, I can't figure out how to apply conditional formatting reliably without selecting the first cell.

If anyone else is still following this thread and can chime in with a non-Sheet1.Range("N2").Select solution, I'd appreciate it for my own education.

You can avoid selecting the cell if you use R1C1 reference style for the formula.
 
Upvote 0

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
ARGH...that's what I thought. Thanks Andrew! I'll keep that bit in my back pocket for future reference.
 
Upvote 0
OK, try this:

Code:
Option Explicit
 
Sub Foo()
 
Dim lr As Long
Dim rng As Range
 
Application.ScreenUpdating = False
 
lr = Sheet1.Cells.Find(what:="*", searchorder:=xlByRows, searchdirection:=xlPrevious).Row
Sheet1.Range("M1").Value = "Formatting Fix"
Sheet1.Range("N1").Value = "Relevant"
Sheet1.Range("M2:M" & lr).Formula = "=IF(OR(LEFT(C2,6)=""104001"",LEFT(C2,6)=""104003"",LEFT(C2,6)=""104004""),CONCATENATE(""CTR "",LEFT(C2,2),""0"",MID(C2,3,4)),IF(OR(LEFT(C2,6)=""100404"",LEFT(C2,6)=""100403""),CONCATENATE(""CTR "",LEFT(C2,5),""0"",MID(C2,6,1)),D2))"
Sheet1.Range("N2:N" & lr).Formula = "=IF(OR(M2=""CTR 1004001"",M2=""CTR 1004003"",M2=""CTR 1004004""),M2,IF(OR(RIGHT(D2,7)=""1004001"",RIGHT(D2,7)=""1004003"",RIGHT(D2,7)=""1004004""),D2,IF(AND(RIGHT(D2,5)>=""12475"",RIGHT(D2,5)<=""12739""),D2,IF(OR(A2=""5050"",RIGHT(C2,7)=""charges""),""Check CTR"",""IRRELEVANT""))))"
 
 
Set rng = Sheet1.Range("B1:N" & lr)
 
With rng
    .AutoFilter Field:=13, Criteria1:= _
        "IRRELEVANT"
            On Error Resume Next
    .Offset(1, 0).Resize(.Rows.Count - 1, 1). _
            SpecialCells(xlCellTypeVisible).EntireRow.Delete
        On Error GoTo 0
 
    .AutoFilter
End With
lr = Sheet1.Cells.Find(what:="*", searchorder:=xlByRows, searchdirection:=xlPrevious).Row
 
Sheet1.Range("D2:D" & lr).Value = Sheet1.Range("M2:M" & lr).Value
 
Sheet1.Cells.FormatConditions.Delete
Sheet1.Range("N2").Select
Sheet1.Range("$N$2:$N$" & lr).FormatConditions.Add Type:=xlExpression, Formula1:= _
        "=IF($N2=""Check CTR"",TRUE)"
Sheet1.Range("$N2:$N" & lr).FormatConditions(Sheet1.Range("$N2:$N" & lr).FormatConditions.Count).SetFirstPriority
    With Sheet1.Range("$N2:$N" & lr).FormatConditions(1).Interior
        .PatternColorIndex = xlAutomatic
        .Color = 65535
        .TintAndShade = 0
    End With
Sheet1.Range("$N2:$N" & lr).FormatConditions(1).StopIfTrue = False
 
Application.ScreenUpdating = True
End Sub

I'm aware that the "Sheet1.Range("N2").Select" isn't the best practice, but for the life of me, I can't figure out how to apply conditional formatting reliably without selecting the first cell.

If anyone else is still following this thread and can chime in with a non-Sheet1.Range("N2").Select solution, I'd appreciate it for my own education.

That works! There are some nuances in the data that are just going to require some human data cleansing, but this will make the process MUCH faster!
 
Upvote 0
HURRAY!! Glad everything worked out. Good luck with the rest of your project and feel free to post back should you need anything else!
 
Upvote 0
HURRAY!! Glad everything worked out. Good luck with the rest of your project and feel free to post back should you need anything else!

Bringing this thread back for a quick question-

Each month, the raw data that this macro runs on is exported out of SAP into a temporary Excel file which is then saved as an appropriately named file. After it is cleansed (manually before this macro), the clean data is then copied into a master file.

Since I won't be here next month when they run this macro on their own, what's the best way to store this macro to make it simple and quick for them to use it?
 
Upvote 0
Typically I have an "Instructions" tab with a "Press to run code" type of button. Then I have a "raw data" where the end-user copies the data and an "Output" tab. A similar approach may work for you...
 
Upvote 0
Typically I have an "Instructions" tab with a "Press to run code" type of button. Then I have a "raw data" where the end-user copies the data and an "Output" tab. A similar approach may work for you...

Is it possible to modify the most recent macro to keep the raw data on its own sheet, and then create a new worksheet with the cleansed data?
 
Upvote 0
sure, not tested but this should work:

Code:
Option Explicit
 
Sub Foo()
 
Dim lr As Long
Dim rng As Range
 
Application.ScreenUpdating = False

Sheet2.Cells.Copy Destination:=Sheet1.Range("A1")

lr = Sheet1.Cells.Find(what:="*", searchorder:=xlByRows, searchdirection:=xlPrevious).Row
Sheet1.Range("M1").Value = "Formatting Fix"
Sheet1.Range("N1").Value = "Relevant"
Sheet1.Range("M2:M" & lr).Formula = "=IF(OR(LEFT(C2,6)=""104001"",LEFT(C2,6)=""104003"",LEFT(C2,6)=""104004""),CONCATENATE(""CTR "",LEFT(C2,2),""0"",MID(C2,3,4)),IF(OR(LEFT(C2,6)=""100404"",LEFT(C2,6)=""100403""),CONCATENATE(""CTR "",LEFT(C2,5),""0"",MID(C2,6,1)),D2))"
Sheet1.Range("N2:N" & lr).Formula = "=IF(OR(M2=""CTR 1004001"",M2=""CTR 1004003"",M2=""CTR 1004004""),M2,IF(OR(RIGHT(D2,7)=""1004001"",RIGHT(D2,7)=""1004003"",RIGHT(D2,7)=""1004004""),D2,IF(AND(RIGHT(D2,5)>=""12475"",RIGHT(D2,5)<=""12739""),D2,IF(OR(A2=""5050"",RIGHT(C2,7)=""charges""),""Check CTR"",""IRRELEVANT""))))"
 
 
Set rng = Sheet1.Range("B1:N" & lr)
 
With rng
    .AutoFilter Field:=13, Criteria1:= _
        "IRRELEVANT"
            On Error Resume Next
    .Offset(1, 0).Resize(.Rows.Count - 1, 1). _
            SpecialCells(xlCellTypeVisible).EntireRow.Delete
        On Error GoTo 0
 
    .AutoFilter
End With
lr = Sheet1.Cells.Find(what:="*", searchorder:=xlByRows, searchdirection:=xlPrevious).Row
 
Sheet1.Range("D2:D" & lr).Value = Sheet1.Range("M2:M" & lr).Value
 
Sheet1.Cells.FormatConditions.Delete
Sheet1.Range("N2").Select
Sheet1.Range("$N$2:$N$" & lr).FormatConditions.Add Type:=xlExpression, Formula1:= _
        "=IF($N2=""Check CTR"",TRUE)"
Sheet1.Range("$N2:$N" & lr).FormatConditions(Sheet1.Range("$N2:$N" & lr).FormatConditions.Count).SetFirstPriority
    With Sheet1.Range("$N2:$N" & lr).FormatConditions(1).Interior
        .PatternColorIndex = xlAutomatic
        .Color = 65535
        .TintAndShade = 0
    End With
Sheet1.Range("$N2:$N" & lr).FormatConditions(1).StopIfTrue = False
 
Application.ScreenUpdating = True
End Sub

This assumes that your raw data is being put in Sheet2 and Sheet1 being where you want the end product to be.
 
Upvote 0
sure, not tested but this should work:

Code:
Option Explicit
 
Sub Foo()
 
Dim lr As Long
Dim rng As Range
 
Application.ScreenUpdating = False
 
Sheet2.Cells.Copy Destination:=Sheet1.Range("A1")
 
lr = Sheet1.Cells.Find(what:="*", searchorder:=xlByRows, searchdirection:=xlPrevious).Row
Sheet1.Range("M1").Value = "Formatting Fix"
Sheet1.Range("N1").Value = "Relevant"
Sheet1.Range("M2:M" & lr).Formula = "=IF(OR(LEFT(C2,6)=""104001"",LEFT(C2,6)=""104003"",LEFT(C2,6)=""104004""),CONCATENATE(""CTR "",LEFT(C2,2),""0"",MID(C2,3,4)),IF(OR(LEFT(C2,6)=""100404"",LEFT(C2,6)=""100403""),CONCATENATE(""CTR "",LEFT(C2,5),""0"",MID(C2,6,1)),D2))"
Sheet1.Range("N2:N" & lr).Formula = "=IF(OR(M2=""CTR 1004001"",M2=""CTR 1004003"",M2=""CTR 1004004""),M2,IF(OR(RIGHT(D2,7)=""1004001"",RIGHT(D2,7)=""1004003"",RIGHT(D2,7)=""1004004""),D2,IF(AND(RIGHT(D2,5)>=""12475"",RIGHT(D2,5)<=""12739""),D2,IF(OR(A2=""5050"",RIGHT(C2,7)=""charges""),""Check CTR"",""IRRELEVANT""))))"
 
 
Set rng = Sheet1.Range("B1:N" & lr)
 
With rng
    .AutoFilter Field:=13, Criteria1:= _
        "IRRELEVANT"
            On Error Resume Next
    .Offset(1, 0).Resize(.Rows.Count - 1, 1). _
            SpecialCells(xlCellTypeVisible).EntireRow.Delete
        On Error GoTo 0
 
    .AutoFilter
End With
lr = Sheet1.Cells.Find(what:="*", searchorder:=xlByRows, searchdirection:=xlPrevious).Row
 
Sheet1.Range("D2:D" & lr).Value = Sheet1.Range("M2:M" & lr).Value
 
Sheet1.Cells.FormatConditions.Delete
Sheet1.Range("N2").Select
Sheet1.Range("$N$2:$N$" & lr).FormatConditions.Add Type:=xlExpression, Formula1:= _
        "=IF($N2=""Check CTR"",TRUE)"
Sheet1.Range("$N2:$N" & lr).FormatConditions(Sheet1.Range("$N2:$N" & lr).FormatConditions.Count).SetFirstPriority
    With Sheet1.Range("$N2:$N" & lr).FormatConditions(1).Interior
        .PatternColorIndex = xlAutomatic
        .Color = 65535
        .TintAndShade = 0
    End With
Sheet1.Range("$N2:$N" & lr).FormatConditions(1).StopIfTrue = False
 
Application.ScreenUpdating = True
End Sub

This assumes that your raw data is being put in Sheet2 and Sheet1 being where you want the end product to be.

What sheet should this macro be placed under in VBA?

Thanks again for all your help, I'm just lost when it comes to VBA.
 
Upvote 0
No problem. I wouldn't put it in a sheet, I'd put it in a module.

And the code you'd put on the button click event would be:

Call Foo
 
Upvote 0

Forum statistics

Threads
1,225,155
Messages
6,183,215
Members
453,151
Latest member
Lizamaison

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top