Editing Data and Comparing lists

mesolomo

New Member
Joined
Jun 15, 2009
Messages
38
I'm currently working on sorting a column of data from a PDF file into separate columns to later compare to another sheet. I'm having a few problems getting the data into a workable format, and also don't know how to compare to the other file. These are the specific issues, if someone can help me!

Ultimately, I want the data in columns that look like this:
Name, City, State, Zip, $123.00, mm/dd/yy

Currently it looks like this:

A
1 Name
2 City State Zip
3 $123.00 mm/dd/yy

The sheets I'm working with are thousands of rows long, with these three rows of data repeating for every person. I have two problems with them, however:

1. I ultimately need each of those pieces of information into a seperate cell, in separate columns. I've been transposing the data then doing "Text to Columns" to get it from three separate rows in Column A to one row in Columns A, B, C etc.

The problem with this is the names aren't always uniform. Sometimes there's Mr. or Mrs. in front of them, sometimes they have a middle initial, or an "esq." behind them, which completely messes up the "Text to Columns." Is there a better way to break them up into separate cells?


2. Secondly, not all of the names are individuals- some are businesses or charities. I want to remove all of these (and the two lines of data that go with each of them), so I only have individual people. Do you have any idea of how to make excel know the difference between, for example, "Mr. John Smith" and "Smith and Sons Realty"?

3. Lastly Once I have all of this data spread out into the different columns
(Last Name, First Name, City, State, Zip, Date, Amount), I need to compare it to another list. Basically, I've got one list of data configured in a similar way, and I want to identify any repeats between the two lists. I know that excel has a way to delete repeats, but what I need is a way to delete everything BUT repeats. Any ideas?

Thank you for your help!
 
Hi jacophile,

yes, i read your post, but I'm not sure I understand it. What should I be special pasting? Do you mean part of the data, or the macro?

Let me know and thanks for your help!!!
 
Upvote 0
Actually my suggestion was to do it without a macro. When you paste from the clipboard into excel with data from a PDF document using Paste (from the menu), you get the format problems you describe. But if you use Paste Special (Edit\Paste Special) and select the third option which is Text, you actually get a format like you require.

Sorry to be a kill joy but I thought you may not need a macro and if you did you could just record the process if you need to automate it.
 
Upvote 0
Maybe it's something about the PDF I'm using, but unfortunately it formats the same way. Thanks for the suggestion though!
 
Upvote 0
Seiya and Hiker: I may have had an idea. I think maybe I've gotten too complex with this, and don't actually need the names formatted correctly.

First, let me explain what I'm ultimately doing, then I'll explain my idea.

Basically, I have 5 sheets of data formatted like this (we'll call these "A Lists"):
Excel Workbook
ABCDEFG
448ASPLEN, CHRISCHALFONT3/31/20072300Murphy2007-2008Individual
449ATKINSON, MICHAELPRINCETON6/23/20082300Murphy2007-2008Individual
450AUBREY, JENNIFERCHICAGO6/23/2008500Murphy2007-2008Individual
451AUGUST, JERALD DAVIDWEST PALM BEACH6/30/20081000Murphy2007-2008Individual
452AUSTIN, REGINAPHILADELPHIA11/3/2008300Murphy2007-2008Individual
453AUSTIN, REGINAPHILADELPHIA11/3/2008300Murphy2007-2008Individual
454AYARS, ROBERTMEDIA9/25/2008250Murphy2007-2008Individual
455BADER, DAVID M.ERWINNA11/29/2007300Murphy2007-2008Individual
456BADER, DAVID M.ERWINNA9/25/2008300Murphy2007-2008Individual
457BAER, ANDREWPHILADELPHIA10/17/20081000Murphy2007-2008Individual
458BAER, ANDREWPHILADELPHIA10/17/20081000Murphy2007-2008Individual
459BAICKER, MARKCARVERSVILLE9/9/2008500Murphy2007-2008Individual
460BAIERLEIN, DANIELEWASHINGTON6/30/2007250Murphy2007-2008Individual
461BAKER, CAMERONBELVEDERE TIBURON4/2/2008500Murphy2007-2008Individual
462BAKER, KENAMBLER8/6/2008250Murphy2007-2008Individual
Murphy


Then, I have 5 sheets of data formatted like the ones you've been working on, we'll call them "B Lists". They're the ones that currently look like this:

Excel Workbook
A
1Edgewater Partnership
2Baltimore, MD 212023118
3$2,500.00 10/24/2007
4General Robert Diamond
5Washington, DC 200164450
6$2,500.00 10/24/2007
7Honorable Robert Borski Jr.
8Washington, DC 200062127
9$1,000.00 10/24/2007
10Law Offices of Frederick H. Graefe
11Washington, DC 200025913
12$500.00 10/24/2007
13Mr Laurence Harris
14Potomac, MD 208545441
15$500.00 10/24/2007
16Mr Ralph Bazilio
17Washington, DC 200055002
18$5,000.00 10/24/2007
19Mr. Andrew Wigglesworth
20Philadelphia, PA 191303163
21$2,000.00 10/23/2007
22Mr. Curtis Jones
23Fayetteville, GA 302151933
24$500.00 10/24/2007
25Mr. David Morey
26Washington, DC 200103044
27$2,500.00 10/24/2007
unsorted 07



What I ultimately want is a list of all of the people who appear at least once on the B lists and at least twice on the A Lists (either on two different lists, or twice on the same list).

It seems to me, that rather than having the names formatted the same way and comparing them like that, couldn't we set up some sort of program to take the correctly formatted names on the A list, compare them to the B list, and pull any that appear once on the B lists and twice on the A list. For example: If "John Smith" appears twice on the A lists, and then is compared to the B lists, it would look for the name anywhere in column A, and therefore recognize specifically the "John Smith" in "Mr. John Smith, Attorney at Law" (and it would simply ignore all of the companies and organizations that I don't want). Then it would simply keep the names on the A list (which are the ones formatted well) that appear on the B list, and delete the others, thus giving me the master list I need, which is simply all the names on the A list that are there twice that are also on the B list.

Is this possible? I really hate to ask more from you both, since you've both done so much, but it just occurred to me that this actually gets around the name formatting issue and probably is a faster way than what I was planning to do to compare them, so I'd love to know if it would work.

Let me know if you have any ideas, and let me know if you need any more information or want me to explain my idea or what I need any more.

Thanks!!!

And I have
 
Upvote 0
You have

1) 5 Sheets of ListA/B each.
2) compare what is appeared twice in ListA with ListB


What do you want when a look alike found ?
 
Upvote 0
Basically, I want to weed out any data sets where the name does not appear twice on any of the A lists and once on a B list. Just to explain it a bit more, these are donor lists for local political candidates, and I'm trying to sort it to find who has given to a combination of candidates.


If you notice, each of the well-formatted sheets I've posted have one name that runs all the way down one of the latter columns (such as: Sestak, Nutter, Murphy, etc.) These are the names of the candidate that each list is for, so each of these names represents a different list.

My ultimate master list should have in each row: the name from Column A, the city, and the number of times they appear on the Sestak list, with the dollar value for each appearance (and ideally the dates for each one as well), the number of times they appear on the Nutter list, the dollar amounts and dates with each one, and so on for each of the 10 candidates I'm sorting, so that I can look at each donor name, see where they're from, who they gave to, when they gave, and how much they gave each time.

This is sort of the format I'm envisioning (I just made up the numbers so you could see what I'm talking about):
Excel Workbook
ABCDEFGHIJKLMNOPQRSTUV
1Sestak- AMurphy-AFattah (cong.)-ASchwartz-ABrady (cong)-ANutter-BKnox-BBrady (may)-BFattah (may)-BEvans-B
2ASPLEN, CHRISCHALFONT$500,$2507/5/07, 10/4/07$3005/4/2008$300, $2004/6/07, 5/2/08$3005/3/2007
3ATKINSON, MICHAELPRINCETON$200, $400, $10010/3/06, 2/4/07, 7/4/07$4003/4/2008
4AUBREY, JENNIFERCHICAGO$3005/3/2007$2509/10/2007$1,0003/3/2008
5AUGUST, JERALD DAVIDWEST PALM BEACH$200, $2001/5/07, 9/4/07$2006/4/2007$5008/9/2008
6AUSTIN, REGINAPHILADELPHIA$4008/3/2007$3006/3/2007$3005/7/2007$3006/4/2008
7AUSTIN, REGINAPHILADELPHIA$200, $2004/5/07, 2/3/08$7003/2/2007
8
Sheet1


I don't really care how it's formatted, just that it presents that information. And, of course, the only people on this list are those who gave at least twice to a list A candidate and at least once to a list B candidate (I just also want to be able to see the details of the gift, as I have it displayed).

The names across the top of the table I just posted are the names of the candidates (with Brady and Fattah having each run for two different positions, therefore they're on there twice). Each list I have is all the donors for each of these people. The letter next to their name in the top row indicates if they're A list or B list.

Any ideas?

Thanks for your help!!!
 
Upvote 0
I still don't understand....

Do you want to change the appearance the data from ListA to the one you posted last ?

When dups found, combine them ?
 
Upvote 0
First of all, I see I made a mistake in my example chart: 'Regina Austin' should only appear once- that's the whole point of the sort I'm talking about.

What I need is a way to sort the lists, then change their format. So, the first step is to sort all 5 list A's against themselves and eachother, so the only names that remain are the ones that appear at least twice.

The second step is taking the sorted names and sorting them again against all of the B lists, so that the only names that remain are the ones that appeared at least twice on the A lists.

Once the names are sorted, so the only ones that remain are the ones that appear multiple times, I'm looking for it to display in a way that i can read a list of people who are multiple donors, and see who they gave to and when. It could be a new list, or a reformatting of list A, it doesn't matter.

So, part of this is combining duplicates, but before the duplicates are combined taking the information from each time they appear and condensing it into one line.

So, in the sample I posted, the first name, 'Chris Asplen" would have appeared:

1. twice on the Sestak list, 2. once on the Fattah (cong) list, 3. twice on the nutter list, and 4. twice on the Fattah (may) list.

But, instead of showing all of those appearances one beneath the other, the sorted list will display them in one row, simply marking where they appeared and how much they donated.

One aspect of this is that the macro would need to distinguish between each of the 5 list A's and the 5 list B's. They could be processed separately, or all of the List A's could be combined and all of the list B's combined, with the program differentiating the name that's listed in column E (the candidates name)

Hopefully I explained it better this time. I can envision this, and I'm pretty sure it's possible, but I'm not able to explain it in any tech terms.

Thanks for your help!!
 
Upvote 0

Forum statistics

Threads
1,226,771
Messages
6,192,918
Members
453,766
Latest member
Gskier

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top