VBA - removing duplicate values and/or letters within a cell

zookeepertx

Well-known Member
Joined
May 27, 2011
Messages
589
Office Version
  1. 365
Platform
  1. Windows
Hi there!
I have a spreadsheet with data in columns A-W. Column F contains 11 digit Purchase Order numbers; sometimes a number will be in a given cell more than once and I need to remove the duplicates.
Problem is, there are a few different scenarios:
Generally the dups will have another, 4 digit number either before or after the PO# with a "/" between the 4 digits and the 11 digits and that's the one I need to get rid of.​
Or, the dup will have "PO" before it and that's the one I need to get rid of.​
Or ,all that's in a cell is 1 or more PO#s, each of which has "PO" in front of it and I just need to get rid of each instance of "PO".​

For example:
ABCDEFGHIJ
10100603871/8129 10100603871
10100604443/2009 PO10100604443 10100604372/2009 10100604443 PO10100604372 10100604372
10100580054 10100584640
PO10100612531 PO10100595725 PO10100607852 PO10100604443
10100604160 10100615922 PO10100615922
10100615922 PO10100615922 10100615922/8111
10100606384 PO10100606384
In each of these cases, the things I've colored red need to be deleted, leaving only the digits that are black.

I've got a macro that does a lot of sorting, inserting columns and such - simple stuff, really - but I just need to add a bit in the middle to take care of these duplicates and letters.

I have no doubt somebody here can come up with a way to do this easily but I just can't see it. Anybody got a solution? (I'll continue to try and figure it out in the meantime, but am not having success so far)

Thanks!!
Jenny
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
Hi

Let's start with the codes with 11 digits followed by a slash followed by 4 digits

To get rid of the codes like 10100604372/2009 replace ???????????/???? with nothing

Either select the range and do the replace directly in the sheet ...
... or do the same in vba

Please comment.
 
Upvote 0
Hello,

Oooh, that's excellent! Works perfectly on the numbers formatted like that! So, then I used the same technique to get rid of the "PO" in the cells that those existed and that worked, too!
But then I remembered that once in a while the numbers are formatted with "4 digits and /" BEFORE the 11 digit number. I thought that should be easy and work the same way, so I created a cell with that formatting and tried it but that didn't turn out as expected. I started with the cell showing 10100583698 1234/10100583698. The code was set to look for ????/??????????? and replace with nothing. Well, THAT did happen, but, much to my surprise, the number remaining in the cell had an extra 0 in the middle of it - 101000583698. (It also wasn't formatted as a number any more, but that's an easy fix). If the number with the 4 digits in front comes FIRST in the cell - 1234/10100583698 10100583698 - then all is well and I end up with 10100583698 left in the cell!
It's some kind of magic, isn't it? LOL!

Anyway, your solution works perfectly in the majority of what's encountered. Any idea what weirdness is happening with the other cells?

Thanks!
Jenny
 
Upvote 0
Hi

With both the 4 digit numbers before and after the 11 digit numbers that simple replace won't work because the "?" does not represent a digit, it stands for any character.

This is a vba solution.
I've written the result in column G to the left of the cells processed just for testing.

Try:

VBA Code:
Sub DeletePOs()
Dim r As Range, rC As Range
Dim s As String

Set r = Range("F1", Range("F" & Rows.Count).End(xlUp))
With CreateObject("VBScript.RegExp")
    .Global = True
    For Each rC In r
        .Pattern = "\d{11}/\d{4}|\d{4}/\d{11}"
        s = .Replace(rC.Value, "")
        .Pattern = "(\d{11})(.+)(\1)"
        While .test(s)
            s = .Replace(s, "$1$2")
        Wend
        rC.Offset(, 1).Value = Application.Trim(Replace(s, "PO", ""))
    Next rC
End With

End Sub

I tested on this data:


Book1
FG
110100603871/8129 1010060387110100603871
210100604443/2009 PO10100604443 10100604372/2009 10100604443 PO10100604372 1010060437210100604443 10100604372
310100580054 1010058464010100580054 10100584640
4PO10100612531 PO10100595725 PO10100607852 PO1010060444310100612531 10100595725 10100607852 10100604443
510100604160 10100615922 PO1010061592210100604160 10100615922
610100615922 PO10100615922 10100615922/811110100615922
710100606384 PO1010060638410100606384
82009/10100604443 PO10100604443 2010/10100604372 10100604443 PO10100604372 1010060437210100604443 10100604372
92009/10100604443 PO10100604443 10100604372/2010 10100604443 PO10100604372 1010060437210100604443 10100604372
1010100604443/2009 PO10100604443 2010/10100604372 10100604443 PO10100604372 1010060437210100604443 10100604372
Sheet1
 
Upvote 0
Solution
Okay, WOW! That works perfectly!! But I have NO idea how it works, LOL! Any chance you'd have time to explain it to my poor, feeble mind? o_O

I have questions like:
1 - How did "s" get its value?​
2 - Is rC just the value of the cell being evaluated at that moment?​
3 - In the line s = .Replace(rC.Value, "") --- to ME it seems like that would replace the entire contents of the cell with nothing. Obviously that's not true, but I can't figure out why it does what it does.​
4 - Apparently .Pattern = "\d{11}/\d{4}|\d{4}/\d{11} tells it to find instances of the pattern of 11digits/4digits or 4digits/11 digits, but what does the "d" represent? And the curly brackets? And the backslashes?​
5 - Then, in the line .Pattern = "(\d{11})(.+)(\1)" well.... I just can't even come up with a theory ?...​
6 - Next comes s = .Replace(s, "$1$2") annnnnd I'm still lost
I'm guessing that last line - rC.Offset(, 1).Value = Application.Trim(Replace(s, "PO", "")) - is what tells it to put the result in the next column, right? Can it put the result back into column F, replacing what was in there to start with? If not, I can just insert a blank column for the result to go in and then delete the original column; that's no problem.​
I'm feeling SO stupid right now! I'm sorry to be such a pain. :oops:
Last question: Can I insert your code into the middle of my existing macro, and where would YOU put it? I'll put my current code below.​

VBA Code:
Sub Expeditors()
'JennyD10042018

With Application
    .ScreenUpdating = False
    .EnableEvents = False
    .Calculation = xlCalculationManual
    .DisplayAlerts = False
End With
Dim EA As Long, lr As Long

'Format entire sheet
Cells.Select
With Selection
    .HorizontalAlignment = xlLeft
    .VerticalAlignment = xlCenter
End With
With Selection.Font
    .Name = "Arial"
    .Size = 8
End With

'Format all columns with dates in them
Columns("H:V").Select
Selection.NumberFormat = "mm/dd/yy;@"
With Selection
    .HorizontalAlignment = xlCenter
    .ColumnWidth = 7
End With

'Find last row with data
lr = Range("B" & Rows.count).End(xlUp).Row

'Add thin borders to data sections
Range("A2:W" & lr).Select
With Selection
    .Borders.LineStyle = xlContinuous
    .Borders.Weight = xlThin
End With

'Color column O yellow
Range("O2:O" & lr).Select
With Selection.Interior
    .Pattern = xlSolid
    .PatternColorIndex = xlAutomatic
    .Color = 65535
End With

'Sort by A
Range("A2:W" & lr).Select
    Selection.Sort Key1:=Range("A1"), Order1:=xlAscending
   
'Insert 2 blank rows between Import invoices starting with 2 vs 7
For i = Cells(Rows.count, 1).End(xlUp).Row To 2 Step -1
    If Left(Cells(i, 1), 1) = 7 And Left(Cells(i, 1).Offset(-1), 1) = 2 Then
        With Cells(i, 1)
            .Resize(2).EntireRow.Insert Shift:=xlDown
                With .Offset(-2).Resize(2).EntireRow
                    .Clear
                    .Interior.Color = vbWhite
                End With
        End With
    End If
Next i

'Insert columns
Range("A:A,B:B,G:G").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromRight

'In what is now A, number rows in each section sequentially
With Range("A2")
    .Value = "1"
    .AutoFill Destination:=Range("A2").Resize(EA - 1), Type:=xlFillSeries
End With

With Range("A" & EA + 3)
    .Value = "1"
    .AutoFill Destination:=Range("A" & EA + 3).Resize((lr) - (EA + 2)), Type:=xlFillSeries
End With

'Set all column widths
Range("A:B,G:G").EntireColumn.AutoFit
Columns("C:D").ColumnWidth = 3.29
Columns("E:E").ColumnWidth = 4.17
Columns("F:F").ColumnWidth = 4.67
Columns("H:I").ColumnWidth = 19.86
Columns("J:J").ColumnWidth = 20
Columns("K:Y").ColumnWidth = 6.29

'Add headers to inserted columns
Range("C1") = Range("D1").Value
Range("I1").Value = Range("J1").Value

'Hide columns not needed right away
Range("E:G,L:M,O:S,V:W").Select
Selection.EntireColumn.Hidden = True

Cells.Select
Selection.RowHeight = 11.25

Range("A1").Select
ActiveWindow.ScrollRow = 1

With Application
    .ScreenUpdating = True
    .EnableEvents = True
    .Calculation = xlCalculationAutomatic
    .DisplayAlerts = True
End With

End Sub

Thank you again!

Jenny
 
Upvote 0
I'm guessing that last line - rC.Offset(, 1).Value = Application.Trim(Replace(s, "PO", "")) - is what tells it to put the result in the next column, right? Can it put the result back into column F, replacing what was in there to start with? If not, I can just insert a blank column for the result to go in and then delete the original column; that's no problem.
Last question: Can I insert your code into the middle of my existing macro, and where would YOU put it? I'll put my current code below.​

Hi

I'm glad it's working OK.

To put the result in column F just lose the .Offset()

VBA Code:
rC.Value = Application.Trim(Replace(s, "PO", ""))

The code I posted just changes the cells in column F, doesn't change anything else, so you can put it wherever you deem fit.

About the rest of the code, it uses Regular Expressions. If you want to learn something about them you can start here:


Note: regular expression were not developed for vba. They are used in most languages and, in fact, they were here before PC's even existed.
 
Upvote 0
(Sorry for the delayed reply. Month-end rolled around and they made me go do my "regular" work instead of making macros to help other people. Isn't that mean? Macros are much more interesting, LOL!)

Thank you so much for your help; this works perfectly! I even inserted it into my existing code and it worked exactly as needed! You're awesome! I'll be able to adapt this to a couple of other existing macros and give them a little boost, too.

Jenny
 
Upvote 0
Hi

I'm glad it's working OK.

To put the result in column F just lose the .Offset()

VBA Code:
rC.Value = Application.Trim(Replace(s, "PO", ""))

The code I posted just changes the cells in column F, doesn't change anything else, so you can put it wherever you deem fit.

About the rest of the code, it uses Regular Expressions. If you want to learn something about them you can start here:


Note: regular expression were not developed for vba. They are used in most languages and, in fact, they were here before PC's even existed.
Hi again PGC.

I've run into a problem on this subject that I didn't foresee. (Actually nobody told me about it till just the other day)

Sometimes the raw data numbers will have just 4 digits and a space in front of the 11 digit PO number but with no / between them.
And some may have NA and a space in front of the PO number.
And thirdly, some will have 2 letters a dash and then 4 numbers then a space before the PO number.
Lastly, some PO numbers may be followed by a dash and then 4 numbers.

In all of the above cases, I'd like to get rid of all of the bolded portions, leaving only the 11 digit PO number. (Too bad nobody told me this from the beginning.)
I tried starting out by adjusting the current code to the "4 digits and a space in front of the PO number" but failed at every attempt. I give up and am now back here begging for help again.

Below I'm putting the table from post #1 above, but I've replaced the data in the last 4 rows with the new scenarios. Below that, I'll put the code for this part as it stands now.

ABCDEFGHIJ
10100603871/8129 10100603871
10100604443/2009 PO10100604443 10100604372/2009 10100604443 PO10100604372 10100604372
PO10100612531 PO10100595725 PO10100607852 PO10100604443
9289 10100612579
NA 10100601761
DN-9289 10100612579
10100601761-7079

VBA Code:
Set r = Range("F2", Range("F" & Rows.count).End(xlUp))
With CreateObject("VBScript.RegExp")
    .Global = True
    For Each rC In r
        .Pattern = "\d{11}/\d{4}|\d{4}/\d{11}"
        s = .Replace(rC.Value, "")
        .Pattern = "(\d{11})(.+)(\1)"
        While .Test(s)
            s = .Replace(s, "$1$2")
        Wend
        rC.Value = Application.Trim(Replace(s, "PO", ""))
    Next rC
End With

I would really appreciate any help you could offer! (If they make any more changes to this report, I may just tell them to deal with it, LOL!)

(I'll be leaving here at 3:00 Central Time (almost 2 hrs from now) and will be back here to work on Monday, so I'll probably have to try any solutions then, but I'll definitely be back.

Jenny
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,334
Members
452,636
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top