Seeking suggestions for fastest approach to re-organise data

JackDanIce · Apr 15, 2019

Hi,

I have data in > 2,500 CSV files to aggregate, clean and reduce to 2 columns.

Each file contains at mininum an address and restricted key word e.g. 1 Abbey Road, License Submitted (key word in bold)

Some files have more than 1 address and key word, e.g. 1 Abbey Road, License Submitted, 2 Abbey Road, London, License Approved

Comma's are used as delimiting character (unfortunately means some addresses span 1 cell, others more than 1 cell); key words pre-exist and can be placed into a dictionary.

Task is to create a 2 column list where column A is the address and column B is the key word

Using a nested loop seems inefficient, to test each cell value if it exists in the dictionary and process accordingly; key words needed for final stats (probably pivot the data once 2 column list is created).

Any suggestions or code design for this?

TIA,
Jack

mole999 · Apr 15, 2019

I would be tempted to try and import all the csvs, and process once, not sure if that is feasible, maybe add/gatehr a log of the files imported

JackDanIce · Apr 15, 2019

Hi Mole, thanks for the reply:

Code:

Sub Main()


    Application.ScreenUpdating = False
    
    Read_Data
    Clean_Data
    Rearrange_Data
    
    Application.ScreenUpdating = True
    
End Sub

I'm trying to avoid listing all my code, however, Read_Data reads in all the data from the CSV files first - each row can extend upto column I or J

Clean_Data clears unecessary characters ("#EANF#" and oddChars(1) = 187: oddChars(2) = 191: oddChars(3) = 239

Rearrange_Data is where I will be reducing the data into 2 columns. I didn't state, each "pair" of address and keyword needs to be in the next empty row in the output 2 columns.

e.g. with initial example, output:

1 Abbey Road, License Submitted
1 Abbey Road, License Submitted
2 Abbey Road, License Approved

Some csv files may have repeating addresses, this is not a problem.

Best,
Jack

GR00007 · Apr 15, 2019

When I've wanted to do something quick with a bunch of .csv's, I've concatenated them with a DOS command first.
If you open a command window in the folder where the files are you can type:

Code:

type *.csv > x.txt

x.txt would have all the .csv files in it (used .txt to prevent including itself)

JackDanIce · Apr 15, 2019

Hey didn’t know of that, thanks GR00007

I’ll try and reply back

JackDanIce · Apr 24, 2019

Hi, sorry for delayed reply, needed some help here to get the cmd part working: https://www.mrexcel.com/forum/gener...ions/1095228-code-automate-cmd-steps-vba.html

Great trick to remember for future need re type *.csv - x.txt

Data is in an array now, final part to create a 2 column list and output, thanks for the help!

GR00007 · Apr 24, 2019

Glad that worked for you.
The ">" below is a redirect, outputting the results of "Type *.csv" into the named file x.txt

Code:

Type *.csv > x.txt

JackDanIce · Apr 24, 2019

Thanks, barely can remember any DOS programming pre Windows days.. seem to recall things like pipes and stuff!

Seeking suggestions for fastest approach to re-organise data

JackDanIce

Well-known Member

mole999

Well-known Member

JackDanIce

Well-known Member

GR00007

Board Regular

JackDanIce

Well-known Member

JackDanIce

Well-known Member

GR00007

Board Regular

JackDanIce

Well-known Member

Similar threads

Share this page

Seeking suggestions for fastest approach to re-organise data

Well-known Member

Well-known Member

Well-known Member

Board Regular

Well-known Member

Well-known Member

Board Regular

Well-known Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock