Diff 2 files to find new rows?

fishpatrol · Mar 17, 2014

I have two tab-delimited text files I'd like to compare. They're lists of products. The purpose of the comparison is to discover if any new products have been added to the list. Here are two samples from March 13th and 15th:

tta20140313.txt

Code:

775111882	2014-03-15 05:10:11	HD	10.99
604125993	2014-03-15 05:10:13	HD	19.99
555752861	2014-03-15 05:10:16	HD	14.99
365218396	2014-03-15 05:10:18	RD	19.99
334505972	2014-03-13 06:29:35	HD	34.99
273913609	2014-03-15 05:10:20	RD	19.99
385170051	2014-03-15 05:10:23	RD	8.99
357040155	2014-03-15 05:10:26	HD	19.99
723854213	2014-03-15 05:10:28	HD	23.99
256222865	2014-03-15 05:10:31	RD	19.99
499299827	2014-03-15 05:10:33	HD	19.99
366713030	2014-03-15 05:10:40	RD	29.99

tta20140315.txt

Code:

775111882	2014-03-15 05:10:11	HD	14.99
604125993	2014-03-15 05:10:13	HD	19.99
555752861	2014-03-15 05:10:16	HD	19.99
365218396	2014-03-15 05:10:18	RD	19.99
273913609	2014-03-15 05:10:20	RD	19.99
385170051	2014-03-15 05:10:23	RD	8.99
357040155	2014-03-15 05:10:26	HD	19.99
723854213	2014-03-15 05:10:28	HD	23.99
256222865	2014-03-15 05:10:31	RD	19.99
499299827	2014-03-15 05:10:33	HD	29.99
491001259	2014-03-15 05:10:36	RD	11.99
366713030	2014-03-15 05:10:40	RD	29.99

Between these two sample files, there are several differences. Several of the prices have changed: Not important. With the 8-digit IDs in Column A, there's one ID (334505972) that's present in the 0313 file that isn't in the 0315 file: Not important. There's also one ID (491001259) present in the 0315 file that isn't in the 0313 file: That's important! That ID in the newer file (by date) and the other data in its row is what I want to extract. I'd like the outcome of this comparison to be this:

Code:

491001259	2014-03-15 05:10:36	RD	11.99

The only wrinkle (that I'm aware of) is that the ID order is different from file to file. So a diff (or Compare.xla) will report lots of differences in the file, based on the order, that aren't of interest to me. I need a solution that disregards row order and finds IDs that are unique to the most recent file. Any ideas? Thanks

J.Ty. · Mar 17, 2014

I did the following:

1) I copied your 0313 file into Sheet1 of my Excel (columns A to D)
2) I copied your 0315 file into Sheet2 of my Excel (columns A to D)
3) I added a formula in column E of Sheet2, to get the following:

Excel 2010

A

B

C

D

E

1

775111882

2014-03-15 05:10

HD

14.99

2

604125993

2014-03-15 05:10

HD

19.99

3

555752861

2014-03-15 05:10

HD

19.99

4

365218396

2014-03-15 05:10

RD

19.99

5

273913609

2014-03-15 05:10

RD

19.99

6

385170051

2014-03-15 05:10

RD

8.99

7

357040155

2014-03-15 05:10

HD

19.99

8

723854213

2014-03-15 05:10

HD

23.99

9

256222865

2014-03-15 05:10

RD

19.99

10

499299827

2014-03-15 05:10

HD

29.99

11

491001259

2014-03-15 05:10

RD

11.99

New item

12

366713030

2014-03-15 05:10

RD

29.99

Sheet2

Cell Formulas
Range	Formula
E1	=IF(ISERROR(MATCH(A1,Sheet1!A:A,0)),"New item","")

4) I converted columns A to D of Sheet2 into a Table (it is enough to enter Ctrl-t and answer about the data location).
5) I used autofilter at the top of row D to get only the new rows. Ready!

Best regards,

J.Ty.

fishpatrol · Mar 17, 2014

Thanks J.Ty. This certainly does the job. What's difficult about this approach is the number of manual steps involved. I need to run this diff, oh, five days a week or so.

I don't do it nearly that often now because I do a similar process with a COUNTIF formula: opening the files and pasting the formula down all the rows, auto filtering the results.

What if the formula work were done in a separate file? Let's say the text files are both opened as Sheet1 and Sheet2 of an Excel document and saved as "Diff.xls". Could you have a second Excel document, let's say it's named "Differ.xls". It has a similar formula to what you've written in Column A, rows 1:15000 (which is safely in range) that compares Sheet1 & Sheet2, and has auto filter set up on "New item." The Diff.xls file would have to be present/open first, but then I could open Differ.xls, reapply the filter and be done.

Does that sound like it would work? How would your formula be changed to accommodate this approach? Or I'm open to other strategies. The fewer steps, the more likely I'll do it daily instead of letting it slip. Thanks again.

J.Ty. · Mar 17, 2014

Download this spreadsheet:
http://www.mimuw.edu.pl/~jty/MrExcel/fishpatrol.xlsx

The tab names tell everything. Formulas are only in the last tab, copy them down for as many rows as necessary. Beware, the formulas in row 1 are different from the rest, so do not copy this row, but one of the lower ones.

Please let me know if it is OK now.

J.Ty.

fishpatrol · Mar 18, 2014

Hmm, I opened the fishpatrol.xlsx file and saw a column of zeros and ones, a column of #N/A, and several blank columns containing formulas. Not sure what to do with that. So I adapted your original formula and followed the idea I proposed about separate Diff/Differ Excel documents.

1. I imported both text files into a new Excel document, putting the earlier file in Sheet1 and the newer file in Sheet2. (I'm using a very simple Automator macro on the Mac do this.) I save the file to Diff.xlsx.

2. I created a new Excel document, saved as Differ.xlsx. I created header rows to match the import files: ID, Date, Code, Price.

3. Still in Differ.xlsx, I inserted this code into A2:

Code:

=IF(ISERROR(MATCH([Diff.xlsx]Sheet2!A2,[Diff.xlsx]Sheet1!$A$1:$A$15000,0)),[Diff.xlsx]Sheet2!A2,"Present")

This identifies already-present IDs as "Present" and new IDs by printing the ID number.

4. Still in Differ.xlsx, I inserted this code into B2:

Code:

=VLOOKUP($A2, [Diff.xlsx]Sheet2!$A$1:$B$15000, 2, FALSE)

If the result in A2 is an ID, this code fills in its date. Otherwise it returns #N/A. For columns C & D, use the same VLOOKUP formula, expanding the range and column number appropriately.

5. Still in Differ.xlsx, Autofilter column A for Does Not Equal "Present" AND Does Not Equal "0" (since the 1:15000 range is larger than the range of values). There's the list of new IDs with their data. Done!

Thanks!

J.Ty. · Mar 18, 2014

Sorry for not explaining. Columns A and B are help data, you can hide them. Then columns C,D,E,F contain the new records (i.e., thos epresent in 0315 but not in 0323).

J.Ty.

Diff 2 files to find new rows?

fishpatrol

New Member

J.Ty.

Well-known Member

fishpatrol

New Member

J.Ty.

Well-known Member

fishpatrol

New Member

J.Ty.

Well-known Member

Share this page

Diff 2 files to find new rows?

fishpatrol

New Member

J.Ty.

Well-known Member

fishpatrol

New Member

J.Ty.

Well-known Member

fishpatrol

New Member

J.Ty.

Well-known Member

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock