Macro to deduplicate based on A and delete?

cloobless · Apr 18, 2024

Hello -- I've been working on this since this morning (starting in a batch file, then resorting to AutoHotkey). I know it's simple, but...so am I.

I have a .csv file that is automaticallly generated throughout the day in the following format. It is a concatenation of a bunch of other files. When the file is created, I need to very quickly scrub it, and I think the fastest way to do this is using a macro, but I'm open to any suggestion that works.

What I would like to do is the following:

Delete all instances of the header (Music,Description,%Length (1 Track),Category Style: Code) -- there will be multiple instances in the data.
Delete all instances of the footer (Sorted by MusicMaker) -- will be multiple
Delete all duplicates using the first value (Music) as the comparison. So even if the other values are different, delete all but one instance of "BACH".
Collape the list so that empty rows are deleted.

Any help offered is deeply appreciated. Thank you.

The initial data looks like this:
A

Music,Description,%Length (1 Track),Category Style: Code

BACH,"Classical Hits",70.679012,47.7678

LEDZEP,"Rock",14.666667,5912241.4000

RUSH,"Canandian Stuff",197.0538,224130.0920

Sorted by MusicMaker

I would like it to end up looking like this:

BACH

LEDZEP

RUSH

mumps · Apr 18, 2024

I think one problem you will have is how Excel will know that BACH, LEDZEP and RUSH all refer to music. You may need a list on another sheet, for example, of all the strings that refer to music so Excel can refer to that list.

cloobless · Apr 18, 2024

mumps said:
I think one problem you will have is how Excel will know that BACH, LEDZEP and RUSH all refer to music. You may need a list on another sheet, for example, of all the strings that refer to music so Excel can refer to that list.

"Music" is simply the first delimited value in any row. It's value1 in the delimited list.

Value1,Value2,Value3

Etc.

mumps · Apr 18, 2024

I’m not sure what you mean. Please clarify in detail using a few examples from your data.

rpaulson · Apr 18, 2024

Power Query is the way to go. NO VBA Code Required.
it will literally take 2 minutes to set up.

Filter the rows to remove "Music" in header, Blank Cells, and "Sorted by MusicMaker" from the footer.
then click on the column header and remove duplicates.

going forward you just refresh and you will have the new data.

cloobless · Apr 18, 2024

mumps said:
I’m not sure what you mean. Please clarify in detail using a few examples from your data.

The original post contains everything I know how to outline, so I'm not certain what is missing.

1. My .csv file has delimited data. I want to delete duplicate rows based on matching the first value. Only the first value matters for determining a duplicate.
BACH,value2,value3
BACH,value2,value3
----

2. I want to delete all instances of "Music,Description,%Length (1 Track),Category Style: Code" in the data. These are just repeated header rows from the previous files that were concatenated into this working file.

3. I want to delete all instances of "Sorted by MusicMaker". These are just a footer row that was similarly inserted during a prior process.

4. I want to delete blank rows.

What I want to be left with is a list, in column A, consisting of the deduplicated value1 data:
BACH
RUSH
FIONAAPPLE
BEETHOVEN
HENDRIX

cloobless · Apr 18, 2024

rpaulson said:
Power Query is the way to go. NO VBA Code Required.
it will literally take 2 minutes to set up.

Filter the rows to remove "Music" in header, Blank Cells, and "Sorted by MusicMaker" from the footer.
then click on the column header and remove duplicates.

going forward you just refresh and you will have the new data.

Thank you. Unfortunately, this is an older machine with Excel 2010 running on it, so I don't think that's an option.

mumps · Apr 18, 2024

Got it. I have to got out now. I’ll have a look at it tomorrow.

cloobless · Apr 18, 2024

mumps said:
Got it. I have to got out now. I’ll have a look at it tomorrow.

Thank you.

mumps · Apr 19, 2024

This macro assumes that you have a header in cell A1 and your data starts in row 2.

VBA Code:

Sub DeleteDups()
    Application.ScreenUpdating = False
    Dim lRow As Long, v As Variant, i As Long, dic As Object
    On Error Resume Next
    Range("A:A").SpecialCells(xlBlanks).EntireRow.Delete
    On Error GoTo 0
    lRow = Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
    Range("A1:A" & lRow).AutoFilter Field:=1, Criteria1:= _
        "=Music,Description,%Length (1 Track),Category Style: Code", Operator:=xlOr, Criteria2:="=Sorted by MusicMaker"
    ActiveSheet.AutoFilter.Range.Offset(1).EntireRow.Delete
    Range("A1").AutoFilter
    lRow = Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
    v = Range("A2:A" & lRow).Value
    Set dic = CreateObject("Scripting.Dictionary")
    For i = UBound(v) To LBound(v) Step -1
        If Not dic.exists(Split(v(i, 1), ",")(0)) Then
            dic.Add Split(v(i, 1), ",")(0), Nothing
        Else
            Rows(i + 1).Delete
        End If
    Next i
    Range("A2").Resize(dic.Count) = Application.Transpose(dic.keys)
    Application.ScreenUpdating = True
End Sub

Macro to deduplicate based on A and delete?

cloobless

Board Regular

mumps

Well-known Member

cloobless

Board Regular

mumps

Well-known Member

rpaulson

Well-known Member

cloobless

Board Regular

cloobless

Board Regular

mumps

Well-known Member

cloobless

Board Regular

mumps

Well-known Member

Similar threads

Forum statistics

Share this page

Macro to deduplicate based on A and delete?

Board Regular

Well-known Member

Board Regular

Well-known Member

Well-known Member

Board Regular

Board Regular

Well-known Member

Board Regular

Well-known Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock