Removing Duplicate Entries

hawaean

New Member
Joined
Aug 25, 2016
Messages
36
Hi Excel Gurus.

I'd like to remove duplicate timestamps in a subtitle file. In the show, the subtitle text will display the first timestamped text and ignore any other with the same timestamp. I'd like to "replace" the second time stamp along with 2 rows above it (a blank space and line number).

I figure the solution includes a search function for an anchor, like the number (although, the risk is there's a subtitle number that might be mistakenly ID'ed), or something unique, like the "-->". From there, it's just a brain twister for me.

It might be possible to use a row number counting function, such as: find X > check 2 rows up > delete X rows. As far as I can tell, these subtitles have duplicate timestamps because they split up multiple lines. However, I don't know for a fact that all the subtitles across the entire show follow this one-line-per-timestamp rule.

1) Yes, I know there are subtitle tools out there. They do not fix this.
2) No, I don't want to do this manually, considering would be hundreds of these to go through.
3) While I'm usually not a fan of VBA solutions, I'd be open to it in this case.
4) Yes, this is pretty trivial. But you'll be saving me from hours of headaches!

Here's a sample of the data:

1368​
00:31:44,820 --> 00:31:47,573​
SO YOU DON'T SEE ME WALK​
1369​
00:31:44,820 --> 00:31:47,573​
INTO THE KITCHEN WITH THEM.​
1370​
00:31:47,614 --> 00:31:49,616​
DID YOUR MOTHER MENTION​
1371​
00:31:47,614 --> 00:31:49,616​
IF THEY WERE COMING?​
1372​
00:31:49,658 --> 00:31:51,201​
SHE NEVER GOT BACK TO ME.​
1373​
00:31:51,243 --> 00:31:53,078​
UH, NO,​
1374​
00:31:51,243 --> 00:31:53,078​
SHE'S GOT A THING TONIGHT.​
1375​
00:31:53,119 --> 00:31:55,997​
TOO BAD. HEY, WE CAN​
1376​
00:31:53,119 --> 00:31:55,997​
SAVE HER A PIECE OF CAKE.​
1377​
00:31:56,039 --> 00:31:57,999​
THAT WOULD BE NICE.​
1378​
00:31:58,041 --> 00:32:03,463​
♪ ****, THIS BAR IS THUMPING,​
1379​
00:31:58,041 --> 00:32:03,463​
THIS BRAIN RELEASED NOW ♪​
1380​
00:32:03,505 --> 00:32:06,132​
♪ MY LITTLE FOOTPRINTS​
1381​
00:32:03,505 --> 00:32:06,132​
HANGING... ♪​
1382​
00:32:06,174 --> 00:32:10,470​
ATTENTION, GUESTS, THE BUFFET​
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
Try using Power Query
VBA Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"",Replacer.ReplaceValue,{"Column1"}),
    Custom1 = Table.FromRows(List.Split(#"Replaced Value"[Column1],4)),
    #"Removed Duplicates" = Table.Distinct(Custom1, {"Column2"}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Removed Duplicates", {}, "Attribute", "Value"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"})
in
    #"Removed Columns"
Value
1368
00:31:44,820 --> 00:31:47,573
SO YOU DON'T SEE ME WALK
1370
00:31:47,614 --> 00:31:49,616
DID YOUR MOTHER MENTION
1372
00:31:49,658 --> 00:31:51,201
SHE NEVER GOT BACK TO ME.
1373
00:31:51,243 --> 00:31:53,078
UH, NO,
1375
00:31:53,119 --> 00:31:55,997
TOO BAD. HEY, WE CAN
1377
00:31:56,039 --> 00:31:57,999
THAT WOULD BE NICE.
1378
00:31:58,041 --> 00:32:03,463
♪ ****, THIS BAR IS THUMPING,
1380
00:32:03,505 --> 00:32:06,132
♪ MY LITTLE FOOTPRINTS
1382
00:32:06,174 --> 00:32:10,470
ATTENTION, GUESTS, THE BUFFET
 
Upvote 0
Try using Power Query
VBA Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"",Replacer.ReplaceValue,{"Column1"}),
    Custom1 = Table.FromRows(List.Split(#"Replaced Value"[Column1],4)),
    #"Removed Duplicates" = Table.Distinct(Custom1, {"Column2"}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Removed Duplicates", {}, "Attribute", "Value"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"})
in
    #"Removed Columns"
Value
1368
00:31:44,820 --> 00:31:47,573
SO YOU DON'T SEE ME WALK
1370
00:31:47,614 --> 00:31:49,616
DID YOUR MOTHER MENTION
1372
00:31:49,658 --> 00:31:51,201
SHE NEVER GOT BACK TO ME.
1373
00:31:51,243 --> 00:31:53,078
UH, NO,
1375
00:31:53,119 --> 00:31:55,997
TOO BAD. HEY, WE CAN
1377
00:31:56,039 --> 00:31:57,999
THAT WOULD BE NICE.
1378
00:31:58,041 --> 00:32:03,463
♪ ****, THIS BAR IS THUMPING,
1380
00:32:03,505 --> 00:32:06,132
♪ MY LITTLE FOOTPRINTS
1382
00:32:06,174 --> 00:32:10,470
ATTENTION, GUESTS, THE BUFFET

Thanks for the idea. I've never used the Power Query before. It's not quite what I'm looking for, but close.

Maybe I should have provided an example from the start, but here's what it should look like:

IMPORT TEXT:

1375
00:31:53,119 --> 00:31:55,997
TOO BAD. HEY, WE CAN
1376
00:31:53,119 --> 00:31:55,997
SAVE HER A PIECE OF CAKE.


EXPECTED RESULT:

1375
00:31:53,119 --> 00:31:55,997
TOO BAD. HEY, WE CAN
SAVE HER A PIECE OF CAKE.

I don't know why the subs timestamp the second line. It should really be 1 timestamp and both lines. So, I would need the 2nd 'set' of duplicate timestamped text to be consolidated (ie. remove the space, number and second timestamp) for every occurance.
 
Upvote 0
@hawaean
Try this:
VBA Code:
Sub hawaean_1()
'00:31:44,820 --> 00:31:47,573?
Dim i As Long
Dim tx As String
Dim va

With Range("A1", Cells(Rows.Count, "A").End(xlUp))
    va = .Value
    tx = "ZZ_ZZ"
    For i = 1 To UBound(va, 1)
        
        If Left(va(i, 1), 12) = tx Then
           va(i, 1) = "#N/A"
           va(i - 1, 1) = "#N/A"
           If Len(va(i - 2, 1)) = 1 Then va(i - 2, 1) = "#N/A"
        End If
        
        If va(i, 1) Like "##:##:##,*" Then
            tx = Left(va(i, 1), 12)
        End If
    Next
    
    .Value = va
    .SpecialCells(xlConstants, xlErrors).EntireRow.Delete
End With

End Sub

Result:
Book1
A
11368
200:31:44,820 --> 00:31:47,573
3SO YOU DON'T SEE ME WALK
4INTO THE KITCHEN WITH THEM.
5
61370
700:31:47,614 --> 00:31:49,616
8DID YOUR MOTHER MENTION
9IF THEY WERE COMING?
10
111372
1200:31:49,658 --> 00:31:51,201
13SHE NEVER GOT BACK TO ME.
14
151373
1600:31:51,243 --> 00:31:53,078
17UH, NO,
18SHE'S GOT A THING TONIGHT.
19
201375
2100:31:53,119 --> 00:31:55,997
22TOO BAD. HEY, WE CAN
23SAVE HER A PIECE OF CAKE.
24
251377
2600:31:56,039 --> 00:31:57,999
27THAT WOULD BE NICE.
28
291378
3000:31:58,041 --> 00:32:03,463
31♪ ****, THIS BAR IS THUMPING,
32THIS BRAIN RELEASED NOW ♪
33
341380
3500:32:03,505 --> 00:32:06,132
36♪ MY LITTLE FOOTPRINTS
37HANGING... ♪
38
391382
4000:32:06,174 --> 00:32:10,470
41ATTENTION, GUESTS, THE BUFFET
Sheet1
 
Upvote 0
I believe it is faster code.
Data range starts from A2.
Result is given from B2. If you want the result replace the original data change this line
VBA Code:
Range("B2").Resize(UBound(A, 1), 1) = B
as
Range("A2").Resize(UBound(A, 1), 1) = B
Macro code
VBA Code:
Sub RemoveDuplicates()
Dim A
Dim Lr&, T&, Ro&
Ro = 1
Lr = Range("A" & Rows.Count).End(xlUp).Row
A = Range("A2:A" & Lr)
ReDim B(1 To UBound(A, 1), 1 To 1)
With CreateObject("scripting.dictionary")
For T = 1 To UBound(A, 1) - 2 Step 4
If Not .Exists(A(T + 1, 1)) Then
.Add (A(T + 1, 1)), 1
B(Ro, 1) = A(T, 1): B(Ro + 1, 1) = A(T + 1, 1): B(Ro + 2, 1) = A(T + 2, 1)      ': B(Ro + 3, 1) = A(T + 3, 1)
Ro = Ro + 4
End If
Next T

Range("B2").Resize(UBound(A, 1), 1) = B
End With
End Sub
 
Last edited:
Upvote 0
In the above code change this line
VBA Code:
B(Ro, 1) = A(T, 1): B(Ro + 1, 1) = A(T + 1, 1): B(Ro + 2, 1) = A(T + 2, 1)      ': B(Ro + 3, 1) = A(T + 3, 1)
as
VBA Code:
B(Ro, 1) = A(T, 1): B(Ro + 1, 1) = A(T + 1, 1): B(Ro + 2, 1) = A(T + 2, 1)
 
Upvote 0
Here's a formula:

Book1
AB
113681368
200:31:44,820 --> 00:31:47,57300:31:44,820 --> 00:31:47,573
3SO YOU DON'T SEE ME WALKSO YOU DON'T SEE ME WALK
4INTO THE KITCHEN WITH THEM.
51369
600:31:44,820 --> 00:31:47,5731370
7INTO THE KITCHEN WITH THEM.00:31:47,614 --> 00:31:49,616
8DID YOUR MOTHER MENTION
91370IF THEY WERE COMING?
1000:31:47,614 --> 00:31:49,616
11DID YOUR MOTHER MENTION1372
1200:31:49,658 --> 00:31:51,201
131371SHE NEVER GOT BACK TO ME.
1400:31:47,614 --> 00:31:49,616
15IF THEY WERE COMING?1373
1600:31:51,243 --> 00:31:53,078
171372UH, NO,
1800:31:49,658 --> 00:31:51,201SHE'S GOT A THING TONIGHT.
19SHE NEVER GOT BACK TO ME.
201375
21137300:31:53,119 --> 00:31:55,997
2200:31:51,243 --> 00:31:53,078TOO BAD. HEY, WE CAN
23UH, NO,SAVE HER A PIECE OF CAKE.
24
2513741377
2600:31:51,243 --> 00:31:53,07800:31:56,039 --> 00:31:57,999
27SHE'S GOT A THING TONIGHT.THAT WOULD BE NICE.
28
2913751378
3000:31:53,119 --> 00:31:55,99700:31:58,041 --> 00:32:03,463
31TOO BAD. HEY, WE CAN♪ ****, THIS BAR IS THUMPING,
32THIS BRAIN RELEASED NOW ♪
331376
3400:31:53,119 --> 00:31:55,9971380
35SAVE HER A PIECE OF CAKE.00:32:03,505 --> 00:32:06,132
36♪ MY LITTLE FOOTPRINTS
371377HANGING... ♪
3800:31:56,039 --> 00:31:57,999
39THAT WOULD BE NICE.1382
4000:32:06,174 --> 00:32:10,470
411378ATTENTION, GUESTS, THE BUFFET
4200:31:58,041 --> 00:32:03,463
43♪ ****, THIS BAR IS THUMPING,
44
451379
4600:31:58,041 --> 00:32:03,463
47THIS BRAIN RELEASED NOW ♪
48
491380
5000:32:03,505 --> 00:32:06,132
51♪ MY LITTLE FOOTPRINTS
52
531381
5400:32:03,505 --> 00:32:06,132
55HANGING... ♪
56
571382
5800:32:06,174 --> 00:32:10,470
59ATTENTION, GUESTS, THE BUFFET
Sheet4
Cell Formulas
RangeFormula
B1:B41B1=LET(a,A1:A59,s,SEQUENCE(ROWS(a)),b,IFS(s<4,1,1,SWITCH(MOD(s,4),0,IFERROR(INDEX(a,s-2)<>INDEX(a,s+2),0),1,INDEX(a,s-3)<>INDEX(a,s+1),2,INDEX(a,s)<>INDEX(a,s-4),3,1)),FILTER(a,b))
Dynamic array formulas.
 
Upvote 0
I believe it is faster code.
Data range starts from A2.
Result is given from B2. If you want the result replace the original data change this line
VBA Code:
Range("B2").Resize(UBound(A, 1), 1) = B
as
Range("A2").Resize(UBound(A, 1), 1) = B
Macro code
VBA Code:
Sub RemoveDuplicates()
Dim A
Dim Lr&, T&, Ro&
Ro = 1
Lr = Range("A" & Rows.Count).End(xlUp).Row
A = Range("A2:A" & Lr)
ReDim B(1 To UBound(A, 1), 1 To 1)
With CreateObject("scripting.dictionary")
For T = 1 To UBound(A, 1) - 2 Step 4
If Not .Exists(A(T + 1, 1)) Then
.Add (A(T + 1, 1)), 1
B(Ro, 1) = A(T, 1): B(Ro + 1, 1) = A(T + 1, 1): B(Ro + 2, 1) = A(T + 2, 1)      ': B(Ro + 3, 1) = A(T + 3, 1)
Ro = Ro + 4
End If
Next T

Range("B2").Resize(UBound(A, 1), 1) = B
End With
End Sub

Thanks for this solution as well. I haven't gotten it to work. The macro triggers a '429' runtime error: activeX component can't create object. Debugger highlights: With CreateObject("scripting.dictionary")
 
Upvote 0
@hawaean
Try this:
VBA Code:
Sub hawaean_1()
'00:31:44,820 --> 00:31:47,573?
Dim i As Long
Dim tx As String
Dim va

With Range("A1", Cells(Rows.Count, "A").End(xlUp))
    va = .Value
    tx = "ZZ_ZZ"
    For i = 1 To UBound(va, 1)
       
        If Left(va(i, 1), 12) = tx Then
           va(i, 1) = "#N/A"
           va(i - 1, 1) = "#N/A"
           If Len(va(i - 2, 1)) = 1 Then va(i - 2, 1) = "#N/A"
        End If
       
        If va(i, 1) Like "##:##:##,*" Then
            tx = Left(va(i, 1), 12)
        End If
    Next
   
    .Value = va
    .SpecialCells(xlConstants, xlErrors).EntireRow.Delete
End With

End Sub

Result:
Book1
A
11368
200:31:44,820 --> 00:31:47,573
3SO YOU DON'T SEE ME WALK
4INTO THE KITCHEN WITH THEM.
5
61370
700:31:47,614 --> 00:31:49,616
8DID YOUR MOTHER MENTION
9IF THEY WERE COMING?
10
111372
1200:31:49,658 --> 00:31:51,201
13SHE NEVER GOT BACK TO ME.
14
151373
1600:31:51,243 --> 00:31:53,078
17UH, NO,
18SHE'S GOT A THING TONIGHT.
19
201375
2100:31:53,119 --> 00:31:55,997
22TOO BAD. HEY, WE CAN
23SAVE HER A PIECE OF CAKE.
24
251377
2600:31:56,039 --> 00:31:57,999
27THAT WOULD BE NICE.
28
291378
3000:31:58,041 --> 00:32:03,463
31♪ ****, THIS BAR IS THUMPING,
32THIS BRAIN RELEASED NOW ♪
33
341380
3500:32:03,505 --> 00:32:06,132
36♪ MY LITTLE FOOTPRINTS
37HANGING... ♪
38
391382
4000:32:06,174 --> 00:32:10,470
41ATTENTION, GUESTS, THE BUFFET
Sheet1
I assume I need to change Sub hawaean_1() to something else, but it doesn't work with the table name, sheetname or filename. What do I need to put there?
 
Upvote 0
I assume I need to change Sub hawaean_1() to something else, but it doesn't work with the table name, sheetname or filename. What do I need to put there?
Just copy the subtitle file content and paste into a clean sheet starting at cell A1 and run the code.
but it doesn't work with the table name
Don't put it on a named table.

First, try copying your example in post #1 and paste it to a clean sheet and run the code, see the result.
 
Upvote 0

Forum statistics

Threads
1,223,882
Messages
6,175,166
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top