Diff 2 files to find new rows?

fishpatrol

New Member
Joined
Apr 20, 2004
Messages
25
I have two tab-delimited text files I'd like to compare. They're lists of products. The purpose of the comparison is to discover if any new products have been added to the list. Here are two samples from March 13th and 15th:

tta20140313.txt
Code:
775111882	2014-03-15 05:10:11	HD	10.99
604125993	2014-03-15 05:10:13	HD	19.99
555752861	2014-03-15 05:10:16	HD	14.99
365218396	2014-03-15 05:10:18	RD	19.99
334505972	2014-03-13 06:29:35	HD	34.99
273913609	2014-03-15 05:10:20	RD	19.99
385170051	2014-03-15 05:10:23	RD	8.99
357040155	2014-03-15 05:10:26	HD	19.99
723854213	2014-03-15 05:10:28	HD	23.99
256222865	2014-03-15 05:10:31	RD	19.99
499299827	2014-03-15 05:10:33	HD	19.99
366713030	2014-03-15 05:10:40	RD	29.99

tta20140315.txt
Code:
775111882	2014-03-15 05:10:11	HD	14.99
604125993	2014-03-15 05:10:13	HD	19.99
555752861	2014-03-15 05:10:16	HD	19.99
365218396	2014-03-15 05:10:18	RD	19.99
273913609	2014-03-15 05:10:20	RD	19.99
385170051	2014-03-15 05:10:23	RD	8.99
357040155	2014-03-15 05:10:26	HD	19.99
723854213	2014-03-15 05:10:28	HD	23.99
256222865	2014-03-15 05:10:31	RD	19.99
499299827	2014-03-15 05:10:33	HD	29.99
491001259	2014-03-15 05:10:36	RD	11.99
366713030	2014-03-15 05:10:40	RD	29.99

Between these two sample files, there are several differences. Several of the prices have changed: Not important. With the 8-digit IDs in Column A, there's one ID (334505972) that's present in the 0313 file that isn't in the 0315 file: Not important. There's also one ID (491001259) present in the 0315 file that isn't in the 0313 file: That's important! That ID in the newer file (by date) and the other data in its row is what I want to extract. I'd like the outcome of this comparison to be this:

Code:
491001259	2014-03-15 05:10:36	RD	11.99

The only wrinkle (that I'm aware of) is that the ID order is different from file to file. So a diff (or Compare.xla) will report lots of differences in the file, based on the order, that aren't of interest to me. I need a solution that disregards row order and finds IDs that are unique to the most recent file. Any ideas? Thanks
 

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
I did the following:

1) I copied your 0313 file into Sheet1 of my Excel (columns A to D)
2) I copied your 0315 file into Sheet2 of my Excel (columns A to D)
3) I added a formula in column E of Sheet2, to get the following:

Excel 2010
ABCDE
17751118822014-03-15 05:10HD14.99 
26041259932014-03-15 05:10HD19.99
35557528612014-03-15 05:10HD19.99
43652183962014-03-15 05:10RD19.99
52739136092014-03-15 05:10RD19.99
63851700512014-03-15 05:10RD8.99
73570401552014-03-15 05:10HD19.99
87238542132014-03-15 05:10HD23.99
92562228652014-03-15 05:10RD19.99
104992998272014-03-15 05:10HD29.99
114910012592014-03-15 05:10RD11.99New item
123667130302014-03-15 05:10RD29.99
Sheet2
Cell Formulas
RangeFormula
E1=IF(ISERROR(MATCH(A1,Sheet1!A:A,0)),"New item","")

4) I converted columns A to D of Sheet2 into a Table (it is enough to enter Ctrl-t and answer about the data location).
5) I used autofilter at the top of row D to get only the new rows. Ready!

Best regards,

J.Ty.
 
Upvote 0
Thanks J.Ty. This certainly does the job. What's difficult about this approach is the number of manual steps involved. I need to run this diff, oh, five days a week or so. :) I don't do it nearly that often now because I do a similar process with a COUNTIF formula: opening the files and pasting the formula down all the rows, auto filtering the results.

What if the formula work were done in a separate file? Let's say the text files are both opened as Sheet1 and Sheet2 of an Excel document and saved as "Diff.xls". Could you have a second Excel document, let's say it's named "Differ.xls". It has a similar formula to what you've written in Column A, rows 1:15000 (which is safely in range) that compares Sheet1 & Sheet2, and has auto filter set up on "New item." The Diff.xls file would have to be present/open first, but then I could open Differ.xls, reapply the filter and be done.

Does that sound like it would work? How would your formula be changed to accommodate this approach? Or I'm open to other strategies. The fewer steps, the more likely I'll do it daily instead of letting it slip. Thanks again.
 
Upvote 0
Download this spreadsheet:
http://www.mimuw.edu.pl/~jty/MrExcel/fishpatrol.xlsx

The tab names tell everything. Formulas are only in the last tab, copy them down for as many rows as necessary. Beware, the formulas in row 1 are different from the rest, so do not copy this row, but one of the lower ones.

Please let me know if it is OK now.

J.Ty.
 
Upvote 0
Hmm, I opened the fishpatrol.xlsx file and saw a column of zeros and ones, a column of #N/A, and several blank columns containing formulas. Not sure what to do with that. So I adapted your original formula and followed the idea I proposed about separate Diff/Differ Excel documents.

1. I imported both text files into a new Excel document, putting the earlier file in Sheet1 and the newer file in Sheet2. (I'm using a very simple Automator macro on the Mac do this.) I save the file to Diff.xlsx.

2. I created a new Excel document, saved as Differ.xlsx. I created header rows to match the import files: ID, Date, Code, Price.

3. Still in Differ.xlsx, I inserted this code into A2:
Code:
=IF(ISERROR(MATCH([Diff.xlsx]Sheet2!A2,[Diff.xlsx]Sheet1!$A$1:$A$15000,0)),[Diff.xlsx]Sheet2!A2,"Present")
This identifies already-present IDs as "Present" and new IDs by printing the ID number.

4. Still in Differ.xlsx, I inserted this code into B2:
Code:
=VLOOKUP($A2, [Diff.xlsx]Sheet2!$A$1:$B$15000, 2, FALSE)
If the result in A2 is an ID, this code fills in its date. Otherwise it returns #N/A. For columns C & D, use the same VLOOKUP formula, expanding the range and column number appropriately.

5. Still in Differ.xlsx, Autofilter column A for Does Not Equal "Present" AND Does Not Equal "0" (since the 1:15000 range is larger than the range of values). There's the list of new IDs with their data. Done!

Thanks!
 
Upvote 0
Sorry for not explaining. Columns A and B are help data, you can hide them. Then columns C,D,E,F contain the new records (i.e., thos epresent in 0315 but not in 0323).

J.Ty.
 
Upvote 0

Forum statistics

Threads
1,223,247
Messages
6,171,007
Members
452,374
Latest member
keccles

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top