Compare two huge text files and extract the differences

bagadiamohit

New Member
Joined
May 11, 2013
Messages
5
Hi guys,

I gotta serious problem here.. any kind of help is much appreciated!!

I have two huge text files (130 MB)each with thousands of records in each. I need to compare the two files using vba or by any means and generate a spreadsheet which includes the header and with two additional columns. The two additional columns will be the file name and in the next column it should display in which particular column is error. Each record will be having multiple discrepancies. One file can have the records which cannot be found in the other file. So this condition should also be recorded in the spreadsheet.

Example:
File 1: Taking one record from each.
00000018063|112295|000|0005|0009|0013|1| | |Y| | |106| | |1| | | | | | | | | | | | |000822090|99996|000|112295|C| | | | |000000|00000|0|1264|112295|000003883|N|000|1272|00| |00000018063|N///

File 2:
00000018063|112295|000|0005|0013|0017|1| | |Y| | |106| | |1| | | | | | | | | | | | |000822090|99996|000|112295|C| | | | |000000|00000|0|1260|112295|000003883|N|000|1272|00| |00000018063|N///



In the above example, the records are from two files. The highlighted ones are the differences between the records. So the output should be like this..

[TABLE="class: grid, width: 4780"]
<tbody>[TR]
[TD]HH_NUMBER[/TD]
[TD]CLASSIFICATION_DATE[/TD]
[TD]PERSON_CODE[/TD]
[TD]MV_MIN_OF_PGM[/TD]
[TD]DURATION_IN_MINUTES[/TD]
[TD]MV_END_MOP[/TD]
[TD]HH_TIME_ZONE[/TD]
[TD]DATE_CLOSED[/TD]
[TD]LIVE_PLAY_IND[/TD]
[TD]ACPM_SAMPLE_IND[/TD]
[TD]LAN[/TD]
[TD]ORIGIN[/TD]
[TD]MM_MKT_CODE[/TD]
[TD]MM_METRO_IND[/TD]
[TD]PERSON_SEX[/TD]
[TD]USAGE_IND[/TD]
[TD]HISP_SAMPLE_IND[/TD]
[TD]VISITOR_IND[/TD]
[TD]HOH_IND[/TD]
[TD]LOH_IND[/TD]
[TD]PARTTIME_WW_IND[/TD]
[TD]FULLTIME_WW_IND[/TD]
[TD]WW_IND[/TD]
[TD]ACN_JOB_CODE[/TD]
[TD]CENSUS_JOB_CODE[/TD]
[TD]EDUCATION[/TD]
[TD]LONG_TERM_VISITOR[/TD]
[TD]AGE[/TD]
[TD]ACN_PROG_CODE[/TD]
[TD]RPRT_TCAST_NO[/TD]
[TD]COMPLEX_PROG_NO[/TD]
[TD]RPRT_DATE[/TD]
[TD]NET_CODE_AL[/TD]
[TD]PRIOR_VIEW_IND[/TD]
[TD]PRIOR_EVEN_IND[/TD]
[TD]PRIOR_RECORD_IND[/TD]
[TD]LIVE_VIEW_IND[/TD]
[TD]ACN_EVENT_DATE[/TD]
[TD]PLAY_DELAY[/TD]
[TD]SDP_IND[/TD]
[TD]NY_ST_TIME[/TD]
[TD]NY_DATE[/TD]
[TD]VIEWING_WEIGHT[/TD]
[TD]PRIOR_LIVE_REC_IND[/TD]
[TD]DMA_CODE[/TD]
[TD]NY_END_TIME[/TD]
[TD]PLAYBACK_SOURCE[/TD]
[TD]EXTENDED_HOME_IND[/TD]
[TD]PRIMARY_HHLD_ID[/TD]
[TD]NPMH_SAMPLE_IND[/TD]
[TD]File Mismatch[/TD]
[TD]Mismatch Reason[/TD]
[/TR]
[TR]
[TD]00000012596[/TD]
[TD]112295[/TD]
[TD]000[/TD]
[TD]0010[/TD]
[TD]0006[/TD]
[TD]0015[/TD]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD]Y[/TD]
[TD][/TD]
[TD][/TD]
[TD]106[/TD]
[TD][/TD]
[TD][/TD]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]000199085[/TD]
[TD]99875[/TD]
[TD]000[/TD]
[TD]112295[/TD]
[TD]C[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]000000[/TD]
[TD]00000[/TD]
[TD]0[/TD]
[TD]1329[/TD]
[TD]112295[/TD]
[TD]000003492[/TD]
[TD]N[/TD]
[TD]000[/TD]
[TD]1334[/TD]
[TD]00[/TD]
[TD][/TD]
[TD]00000012596[/TD]
[TD]N///[/TD]
[TD]Media Events[/TD]
[TD]Mismatches in MV_MIN_OF_PGM AND MV_END_MOP[/TD]
[/TR]
[TR]
[TD]00000012596[/TD]
[TD]112295[/TD]
[TD]000[/TD]
[TD]0014[/TD]
[TD]0006[/TD]
[TD]0019[/TD]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD]Y[/TD]
[TD][/TD]
[TD][/TD]
[TD]106[/TD]
[TD][/TD]
[TD][/TD]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]000199085[/TD]
[TD]99875[/TD]
[TD]000[/TD]
[TD]112295[/TD]
[TD]C[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]000000[/TD]
[TD]00000[/TD]
[TD]0[/TD]
[TD]1329[/TD]
[TD]112295[/TD]
[TD]000003492[/TD]
[TD]N[/TD]
[TD]000[/TD]
[TD]1334[/TD]
[TD]00[/TD]
[TD][/TD]
[TD]00000012596[/TD]
[TD]N///[/TD]
[TD]PROL[/TD]
[TD]Mismatches in MV_MIN_OF_PGM AND MV_END_MOP[/TD]
[/TR]
[TR]
[TD]00000011861[/TD]
[TD]112295[/TD]
[TD]002[/TD]
[TD]0126[/TD]
[TD]0001[/TD]
[TD]0126[/TD]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD]Y[/TD]
[TD][/TD]
[TD][/TD]
[TD]106[/TD]
[TD][/TD]
[TD]M[/TD]
[TD]1[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]032[/TD]
[TD]000092153[/TD]
[TD]99873[/TD]
[TD]002[/TD]
[TD]112295[/TD]
[TD]C[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]000000[/TD]
[TD]00000[/TD]
[TD]0[/TD]
[TD]1110[/TD]
[TD]112295[/TD]
[TD]000003905[/TD]
[TD]N[/TD]
[TD]000[/TD]
[TD]1110[/TD]
[TD]00[/TD]
[TD][/TD]
[TD]00000011861[/TD]
[TD]N///[/TD]
[TD]Media Events[/TD]
[TD]Records present in Media Events file but missing in PROL file[/TD]
[/TR]
</tbody>[/TABLE]

The last two columns displays the record present in which file and the reason for mismatch.

Any help is highly appreciable!!! PLEASE try to help me out..
 

Excel Facts

Ambidextrous Undo
Undo last command with Ctrl+Z or Alt+Backspace. If you use the Undo icon in the QAT, open the drop-down arrow to undo up to 100 steps.

Forum statistics

Threads
1,223,903
Messages
6,175,284
Members
452,630
Latest member
OdubiYouth

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top