Editing Data and Comparing lists

mesolomo

New Member
Joined
Jun 15, 2009
Messages
38
I'm currently working on sorting a column of data from a PDF file into separate columns to later compare to another sheet. I'm having a few problems getting the data into a workable format, and also don't know how to compare to the other file. These are the specific issues, if someone can help me!

Ultimately, I want the data in columns that look like this:
Name, City, State, Zip, $123.00, mm/dd/yy

Currently it looks like this:

A
1 Name
2 City State Zip
3 $123.00 mm/dd/yy

The sheets I'm working with are thousands of rows long, with these three rows of data repeating for every person. I have two problems with them, however:

1. I ultimately need each of those pieces of information into a seperate cell, in separate columns. I've been transposing the data then doing "Text to Columns" to get it from three separate rows in Column A to one row in Columns A, B, C etc.

The problem with this is the names aren't always uniform. Sometimes there's Mr. or Mrs. in front of them, sometimes they have a middle initial, or an "esq." behind them, which completely messes up the "Text to Columns." Is there a better way to break them up into separate cells?


2. Secondly, not all of the names are individuals- some are businesses or charities. I want to remove all of these (and the two lines of data that go with each of them), so I only have individual people. Do you have any idea of how to make excel know the difference between, for example, "Mr. John Smith" and "Smith and Sons Realty"?

3. Lastly Once I have all of this data spread out into the different columns
(Last Name, First Name, City, State, Zip, Date, Amount), I need to compare it to another list. Basically, I've got one list of data configured in a similar way, and I want to identify any repeats between the two lists. I know that excel has a way to delete repeats, but what I need is a way to delete everything BUT repeats. Any ideas?

Thank you for your help!
 
mesolomo******** type=text/javascript> vbmenu_register("postmenu_1971693", true); *********>

What I want to see is only "Names" and the results that you want from the sample in your post #15.
try this anyway
Code:
Sub test()
Dim a, b(), c(), i As Long, n As Long
a = Range("a1").CurrentRegion.Value
ReDim b(1 To UBound(a, 1) \ 3 + 1, 1 To 6)
ReDim c(1 To UBound(a, 1) \ 3 + 1, 1 To 4)
On Error Resume Next
For i = 1 To UBound(a, 1) Step 3
    n = n + 1
    If a(i, 1) Like "*.*,*" Then
        b(n, 1) =  Split(Trim(Split(a(i, 1), ".",2)(1)), ",")(0)
    Else
        b(n, 1) = a(i, 1)
    End If
    b(n, 2) = Split(a(i + 1, 1), ",")(0)
    b(n, 3) = Split(a(i + 2, 1))(1)
    b(n, 4) = Split(a(i + 2, 1))(0)
    b(n, 5) = Year(DateValue(b(n, 3))) & "-" & _
          Year(DateValue(b(n, 3))) + 1
    b(n, 6) = "Sestak"
    c(n, 1) = Split(b(n, 1))(0) : c(n, 2) = Split(b(b, 1))(1)
    c(n, 3) = a(i + 1, 1) : c(n, 4) = a(i + 2, 1)
Next
Range("c1").Resize(n, 4).Value = c
Sheets("sheet2").Cells(1).Resize(n, 6).Value = b
End Sub
 
Upvote 0
That macro had the "type mismatch again." This time when I ran it, it immediately went back to the VBA screen and highlighted the second "b" in the first row beginning with "c"

I wasn't clear from your last post- is there more information that you would like me to post to help you? Thanks!
 
Upvote 0
This is the line:

c(n, 1) = Split(b(n, 1))(0): c(n, 2) = Split(b(b, 1))(1)

The second b is what's highlighted.
 
Upvote 0
OOps
That line should be
Rich (BB code):
    c(n, 1) = Split(b(n, 1))(0) : c(n, 2) = Split(b(n, 1))(1)
 
Upvote 0
It moved the data to columns, which is good, but it splits the names in a way that loses the last names in some, and in some keeps it, but it's mixed up because then a row will have some first names and some last names, which doesn't really work. Here's the screenshot:

Excel Workbook
ABCDEFG
1Edgewater PartnershipEdgewaterPartnershipBaltimore, MD 212023118$2,500.00 10/24/2007
2Baltimore, MD 212023118GeneralRobertWashington, DC 200164450$2,500.00 10/24/2007
3$2,500.00 10/24/2007HonorableRobertWashington, DC 200062127$1,000.00 10/24/2007
4General Robert DiamondLawOfficesWashington, DC 200025913$500.00 10/24/2007
5Washington, DC 200164450MrLaurencePotomac, MD 208545441$500.00 10/24/2007
6$2,500.00 10/24/2007MrRalphWashington, DC 200055002$5,000.00 10/24/2007
7Honorable Robert Borski Jr.Mr.AndrewPhiladelphia, PA 191303163$2,000.00 10/23/2007
8Washington, DC 200062127Mr.CurtisFayetteville, GA 302151933$500.00 10/24/2007
9$1,000.00 10/24/2007Mr.DavidWashington, DC 200103044$2,500.00 10/24/2007
10Law Offices of Frederick H. GraefeMr.DavidPotomac, MD 208545424$3,400.00 10/23/2007
11Washington, DC 200025913Mr.DerekThousand Oaks, CA 913621102$500.00 10/24/2007
12$500.00 10/24/2007Mr.DonaldNewtown Square, PA 19073$1,000.00 10/25/2007
13Mr Laurence HarrisMr.FredPhiladelphia, PA 191035774$500.00 10/23/2007
14Potomac, MD 208545441Mr.HerbertSpring House, PA 19477$1,000.00 10/24/2007
15$500.00 10/24/2007Mr.JamesGibbsboro, NJ 080261219$1,000.00 10/23/2007
16Mr Ralph BazilioMr.JamieWashington, DC 200364410$2,500.00 10/24/2007
17Washington, DC 200055002Mr.JeffreyWashington, DC 200055002$5,000.00 10/24/2007
18$5,000.00 10/24/2007Mr.JohnPhiladelphia, PA 191024008$500.00 10/24/2007
19Mr. Andrew WigglesworthMr.LawrenceBowie, MD 207214213$5,000.00 10/24/2007
20Philadelphia, PA 191303163Mr.MichaelWashington, DC 200042436$2,000.00 10/24/2007
21$2,000.00 10/23/2007Mr.MichaelWashington, DC 200121449$5,000.00 10/24/2007
22Mr. Curtis JonesMr.MichaelWashington, DC 200095535$1,000.00 10/24/2007
23Fayetteville, GA 302151933Mr.PhilipBala Cynwyd, PA 190042640$1,000.00 10/23/2007
24$500.00 10/24/2007Mr.R.J.Jenkintown, PA 190463837$500.00 10/23/2007
25Mr. David MoreyMr.RandallPhiladelphia, PA 191514541$500.00 10/24/2007
26Washington, DC 200103044Mr.RobertPhiladelphia, PA 191295504$500.00 10/24/2007
27$2,500.00 10/24/2007Mr.RonaldHaverford, PA 190411607$1,000.00 10/25/2007
28Mr. David UrbanMr.ShawnAlexandria, VA 223023401$3,300.00 10/24/2007
Sheet1
 
Upvote 0
try
Code:
Sub test()
Dim a, b(), c(), i As Long, n As Long, x
a = Range("a1").CurrentRegion.Value
ReDim b(1 To UBound(a, 1) \ 3 + 1, 1 To 6)
ReDim c(1 To UBound(a, 1) \ 3 + 1, 1 To 4)
On Error Resume Next
With CreateObject("VBScript.RegExp")
    .Pattern = "((M(s|rs?|is)|jr)\.?(\s|$)|general |Honorable |(.+ of ))"
    .IgnoreCase = True
    .Global = True
    For i = 1 To UBound(a, 1) Step 3
        n = n + 1
        b(n, 1) = .replace(a(i, 1), "")
        b(n, 2) = Split(a(i + 1, 1), ",")(0) 
        b(n, 3) = Split(a(i + 2, 1))(1)
        b(n, 4) = Split(a(i + 2, 1))(0)
        b(n, 5) = Year(DateValue(b(n, 3))) & "-" & _
              Year(DateValue(b(n, 3))) + 1
        b(n, 6) = "Sestak"
        x = Split(b(n, 1), 2)
        c(n, 1) = x(0) : c(n, 2) = x(1)
        c(n, 3) = a(i + 1, 1) : c(n, 4) = a(i + 2, 1)
    Next
End With
Range("c1").Resize(n, 4).Value = c
Sheets("sheet2").Cells(1).Resize(n, 6).Value = b
End Sub
 
Upvote 0
For just reading, this is great, but I need to sort it against another list, looking to find all of the people who appear on this list and two others, so I really need to be able to isolate the last names. If it was down to just the first and last names in cell A, I could just split them, but how it is now, I don't really know anyway to separate the names so I can isolate the last names. Any ideas? Thanks!!!!
 
Upvote 0
Since your data in Name column has no logical separator...
My answer is Impossible.
 
Upvote 0
Hmm, that's really what I was afraid of. What you gave me gets me a most of the way there, though, and I really appreciate it! It looks like I'll have to go through and do the rest manually, but this will make it a lot easier. Thanks!
 
Upvote 0

Forum statistics

Threads
1,226,771
Messages
6,192,919
Members
453,767
Latest member
922aloose

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top