Combine CSV Files with Junk Rows

radonwilson · Mar 23, 2023

I have 4 different .csv files that sit in a folder. I want to combine all of these.

But there is a catch, the first 2 files have 4 top junk rows that need to be removed and 5 columns.
And the last 2 files have 5 top junks rows and 6 columns.

I am looking for a dynamic way of combining all my CSV files from a folder.

Download Files

Expecting Result:-

Result.xlsx

A

B

C

D

E

F

1

Date

Settlement ID

closing fees

promo rebates

TDS

total

2

01-01-2022

111222333

1

2.5

null

3.5

3

02-01-2022

111222333

1

2.5

null

3.5

4

03-01-2022

111222333

1

2.5

null

3.5

5

04-01-2022

111222333

1

2.5

null

3.5

6

05-01-2022

111222333

1

2.5

null

3.5

7

01-02-2022

222333444

2

3.5

null

4.5

8

02-02-2022

222333444

2

3.5

null

4.5

9

03-02-2022

222333444

2

3.5

null

4.5

10

04-02-2022

222333444

2

3.5

null

4.5

11

01-03-2022

444555666

3

5.5

1

9.5

12

02-03-2022

444555666

3

5.5

1

9.5

13

03-03-2022

444555666

3

5.5

1

9.5

14

01-04-2022

333777888

7.5

1.4

0.75

9.65

15

02-04-2022

333777888

7.5

1.4

0.75

9.65

16

03-04-2022

333777888

7.5

1.4

0.75

9.65

Sheet1

jdellasala · Mar 23, 2023

The first time I used From File or Folder, I didn't find the TDS column because the "First" file didn't have it, and then discovered I wasn't able to change which file to use as the Sample file. So I pulled in the folder and reverse sorted the Name column Descending so that the Sample file (which is always the first file listed) had the TDS column.
After expanding the Binary column, PQ generated steps and 4 queries that it put into their own folder. In the main Query (Files) I inserted the step RemovedColumns before the generated step Removed Other Columns and then removed that step. The final code for the Files query looked like this

Power Query:

let
    Source = Folder.Files("C:\Temp\Files"),
    RemovedOtherColumns = Table.SelectColumns(Source,{"Name", "Content"}),
    SortedRows = Table.Sort(RemovedOtherColumns,{{"Name", Order.Descending}}),
    #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    RemovedColumns = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedColumns, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))
in
    #"Expanded Table Column1"

but created a table like this

Book1

A

B

C

D

E

F

G

1

Name

Column1

Column2

Column3

Column4

Column5

Column6

2

04-Apr-2022.csv

Line 1

3

04-Apr-2022.csv

Line 2

4

04-Apr-2022.csv

Line 3

5

04-Apr-2022.csv

Line 4

6

04-Apr-2022.csv

Line 5

7

04-Apr-2022.csv

Date

Settlement ID

closing fees

promo rebates

TDS

total

8

04-Apr-2022.csv

01-04-2022

333777888

7.5

1.4

0.75

9.65

9

04-Apr-2022.csv

02-04-2022

333777888

7.5

1.4

0.75

9.65

10

04-Apr-2022.csv

03-04-2022

333777888

7.5

1.4

0.75

9.65

11

03-Mar-2022.csv

Line 1

12

03-Mar-2022.csv

Line 2

13

03-Mar-2022.csv

Line 3

14

03-Mar-2022.csv

Line 4

15

03-Mar-2022.csv

Line 5

16

03-Mar-2022.csv

Date

Settlement ID

closing fees

promo rebates

TDS

total

17

03-Mar-2022.csv

01-03-2022

444555666

3

5.5

1

9.5

18

03-Mar-2022.csv

02-03-2022

444555666

3

5.5

1

9.5

19

03-Mar-2022.csv

03-03-2022

444555666

3

5.5

1

9.5

20

02-Feb-2022.csv

Line 1

21

02-Feb-2022.csv

Line 2

22

02-Feb-2022.csv

Line 3

23

02-Feb-2022.csv

Line 4

24

02-Feb-2022.csv

Date

Settlement ID

closing fees

promo rebates

total

25

02-Feb-2022.csv

01-02-2022

222333444

2

3.5

4.5

26

02-Feb-2022.csv

02-02-2022

222333444

2

3.5

4.5

27

02-Feb-2022.csv

03-02-2022

222333444

2

3.5

4.5

28

02-Feb-2022.csv

04-02-2022

222333444

2

3.5

4.5

29

01-Jan-2022.csv

Line 1

30

01-Jan-2022.csv

Line 2

31

01-Jan-2022.csv

Line 3

32

01-Jan-2022.csv

Line 4

33

01-Jan-2022.csv

Date

Settlement ID

closing fees

promo rebates

total

34

01-Jan-2022.csv

01-01-2022

111222333

1

2.5

3.5

35

01-Jan-2022.csv

02-01-2022

111222333

1

2.5

3.5

36

01-Jan-2022.csv

03-01-2022

111222333

1

2.5

3.5

37

01-Jan-2022.csv

04-01-2022

111222333

1

2.5

3.5

38

01-Jan-2022.csv

05-01-2022

111222333

1

2.5

3.5

Sheet2

Obviously that's not what you want, but the trick is to modify the Transform Sample File query like this:

Power Query:

let
    Source = Csv.Document(Parameter1,[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    FilteredRows = Table.SelectRows(Source, each not Text.StartsWith([Column1], "Line")),
    CapitalizedEachWord = Table.TransformColumns(FilteredRows,{{"Column3", Text.Proper, type text}, {"Column4", Text.Proper, type text}, {"Column6", Text.Proper, type text}}),
    PromotedHeaders = Table.PromoteHeaders(CapitalizedEachWord, [PromoteAllScalars=true])
in
    PromotedHeaders

With the tables now cleaned up, just a few steps were needed to complete the final table:

Power Query:

let
    Source = Folder.Files("C:\Temp\Files"),
    SortedRows = Table.Sort(Source,{{"Name", Order.Descending}}),
    #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true),
    RemovedOtherColumns = Table.SelectColumns(#"Filtered Hidden Files1",{"Name", "Content"}),
    #"Invoke Custom Function1" = Table.AddColumn(RemovedOtherColumns, "Transform File", each #"Transform File"([Content])),
    RemovedContent = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedContent, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    RenamedColumns = Table.RenameColumns(#"Expanded Table Column1",{{"Total", "Old Total"}}),
    ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Name", type text}, {"Date", type date}, {"Settlement ID", Int64.Type}, {"Closing Fees", type number}, {"Promo Rebates", type number}, {"TDS", type number}, {"Old Total", type number}}),
    InsertedSum = Table.AddColumn(ChangedType, "Total", each List.Sum({[Closing Fees], [Promo Rebates], [TDS]}), type number),
    RemovedColumns = Table.RemoveColumns(InsertedSum,{"Old Total"}),
    SortedRows2 = Table.Sort(RemovedColumns,{{"Name", Order.Ascending}, {"Date", Order.Ascending}, {"Settlement ID", Order.Ascending}})
in
    SortedRows2

and resulting in this table:

Book1

A

B

C

D

E

F

G

1

Name

Date

Settlement ID

Closing Fees

Promo Rebates

TDS

Total

2

01-Jan-2022.csv

01/01/2022

111222333

1

2.5

3.5

3

01-Jan-2022.csv

02/01/2022

111222333

1

2.5

3.5

4

01-Jan-2022.csv

03/01/2022

111222333

1

2.5

3.5

5

01-Jan-2022.csv

04/01/2022

111222333

1

2.5

3.5

6

01-Jan-2022.csv

05/01/2022

111222333

1

2.5

3.5

7

02-Feb-2022.csv

01/02/2022

222333444

2

3.5

5.5

8

02-Feb-2022.csv

02/02/2022

222333444

2

3.5

5.5

9

02-Feb-2022.csv

03/02/2022

222333444

2

3.5

5.5

10

02-Feb-2022.csv

04/02/2022

222333444

2

3.5

5.5

11

03-Mar-2022.csv

01/03/2022

444555666

3

5.5

1

9.5

12

03-Mar-2022.csv

02/03/2022

444555666

3

5.5

1

9.5

13

03-Mar-2022.csv

03/03/2022

444555666

3

5.5

1

9.5

14

04-Apr-2022.csv

01/04/2022

333777888

7.5

1.4

0.75

9.65

15

04-Apr-2022.csv

02/04/2022

333777888

7.5

1.4

0.75

9.65

16

04-Apr-2022.csv

03/04/2022

333777888

7.5

1.4

0.75

9.65

Sheet1

As noted, I manually kept the (file) Name column. Normally PQ would have removed it, but I thought it was useful to know what data came from which file.
I'm on the Insider Beta Channel and HOPE that not being able to select the Sample File is a temporary bug!
Hope that answers your question.

radonwilson · Mar 23, 2023

jdellasala said:
The first time I used From File or Folder, I didn't find the TDS column because the "First" file didn't have it, and then discovered I wasn't able to change which file to use as the Sample file. So I pulled in the folder and reverse sorted the Name column Descending so that the Sample file (which is always the first file listed) had the TDS column.
After expanding the Binary column, PQ generated steps and 4 queries that it put into their own folder. In the main Query (Files) I inserted the step RemovedColumns before the generated step Removed Other Columns and then removed that step. The final code for the Files query looked like this

Power Query:

let Source = Folder.Files("C:\Temp\Files"), RemovedOtherColumns = Table.SelectColumns(Source,{"Name", "Content"}), SortedRows = Table.Sort(RemovedOtherColumns,{{"Name", Order.Descending}}), #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true), #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])), RemovedColumns = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}), #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedColumns, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))) in #"Expanded Table Column1"

but created a table like this

Book1
A B C D E F G
1 Name Column1 Column2 Column3 Column4 Column5 Column6
2 04-Apr-2022.csv Line 1
3 04-Apr-2022.csv Line 2
4 04-Apr-2022.csv Line 3
5 04-Apr-2022.csv Line 4
6 04-Apr-2022.csv Line 5
7 04-Apr-2022.csv Date Settlement ID closing fees promo rebates TDS total
8 04-Apr-2022.csv 01-04-2022 333777888 7.5 1.4 0.75 9.65
9 04-Apr-2022.csv 02-04-2022 333777888 7.5 1.4 0.75 9.65
10 04-Apr-2022.csv 03-04-2022 333777888 7.5 1.4 0.75 9.65
11 03-Mar-2022.csv Line 1
12 03-Mar-2022.csv Line 2
13 03-Mar-2022.csv Line 3
14 03-Mar-2022.csv Line 4
15 03-Mar-2022.csv Line 5
16 03-Mar-2022.csv Date Settlement ID closing fees promo rebates TDS total
17 03-Mar-2022.csv 01-03-2022 444555666 3 5.5 1 9.5
18 03-Mar-2022.csv 02-03-2022 444555666 3 5.5 1 9.5
19 03-Mar-2022.csv 03-03-2022 444555666 3 5.5 1 9.5
20 02-Feb-2022.csv Line 1
21 02-Feb-2022.csv Line 2
22 02-Feb-2022.csv Line 3
23 02-Feb-2022.csv Line 4
24 02-Feb-2022.csv Date Settlement ID closing fees promo rebates total
25 02-Feb-2022.csv 01-02-2022 222333444 2 3.5 4.5
26 02-Feb-2022.csv 02-02-2022 222333444 2 3.5 4.5
27 02-Feb-2022.csv 03-02-2022 222333444 2 3.5 4.5
28 02-Feb-2022.csv 04-02-2022 222333444 2 3.5 4.5
29 01-Jan-2022.csv Line 1
30 01-Jan-2022.csv Line 2
31 01-Jan-2022.csv Line 3
32 01-Jan-2022.csv Line 4
33 01-Jan-2022.csv Date Settlement ID closing fees promo rebates total
34 01-Jan-2022.csv 01-01-2022 111222333 1 2.5 3.5
35 01-Jan-2022.csv 02-01-2022 111222333 1 2.5 3.5
36 01-Jan-2022.csv 03-01-2022 111222333 1 2.5 3.5
37 01-Jan-2022.csv 04-01-2022 111222333 1 2.5 3.5
38 01-Jan-2022.csv 05-01-2022 111222333 1 2.5 3.5
Sheet2

Obviously that's not what you want, but the trick is to modify the Transform Sample File query like this:

Power Query:

let Source = Csv.Document(Parameter1,[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]), FilteredRows = Table.SelectRows(Source, each not Text.StartsWith([Column1], "Line")), CapitalizedEachWord = Table.TransformColumns(FilteredRows,{{"Column3", Text.Proper, type text}, {"Column4", Text.Proper, type text}, {"Column6", Text.Proper, type text}}), PromotedHeaders = Table.PromoteHeaders(CapitalizedEachWord, [PromoteAllScalars=true]) in PromotedHeaders

With the tables now cleaned up, just a few steps were needed to complete the final table:

Power Query:

let Source = Folder.Files("C:\Temp\Files"), SortedRows = Table.Sort(Source,{{"Name", Order.Descending}}), #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true), RemovedOtherColumns = Table.SelectColumns(#"Filtered Hidden Files1",{"Name", "Content"}), #"Invoke Custom Function1" = Table.AddColumn(RemovedOtherColumns, "Transform File", each #"Transform File"([Content])), RemovedContent = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}), #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedContent, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))), RenamedColumns = Table.RenameColumns(#"Expanded Table Column1",{{"Total", "Old Total"}}), ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Name", type text}, {"Date", type date}, {"Settlement ID", Int64.Type}, {"Closing Fees", type number}, {"Promo Rebates", type number}, {"TDS", type number}, {"Old Total", type number}}), InsertedSum = Table.AddColumn(ChangedType, "Total", each List.Sum({[Closing Fees], [Promo Rebates], [TDS]}), type number), RemovedColumns = Table.RemoveColumns(InsertedSum,{"Old Total"}), SortedRows2 = Table.Sort(RemovedColumns,{{"Name", Order.Ascending}, {"Date", Order.Ascending}, {"Settlement ID", Order.Ascending}}) in SortedRows2

and resulting in this table:

Book1
A B C D E F G
1 Name Date Settlement ID Closing Fees Promo Rebates TDS Total
2 01-Jan-2022.csv 01/01/2022 111222333 1 2.5 3.5
3 01-Jan-2022.csv 02/01/2022 111222333 1 2.5 3.5
4 01-Jan-2022.csv 03/01/2022 111222333 1 2.5 3.5
5 01-Jan-2022.csv 04/01/2022 111222333 1 2.5 3.5
6 01-Jan-2022.csv 05/01/2022 111222333 1 2.5 3.5
7 02-Feb-2022.csv 01/02/2022 222333444 2 3.5 5.5
8 02-Feb-2022.csv 02/02/2022 222333444 2 3.5 5.5
9 02-Feb-2022.csv 03/02/2022 222333444 2 3.5 5.5
10 02-Feb-2022.csv 04/02/2022 222333444 2 3.5 5.5
11 03-Mar-2022.csv 01/03/2022 444555666 3 5.5 1 9.5
12 03-Mar-2022.csv 02/03/2022 444555666 3 5.5 1 9.5
13 03-Mar-2022.csv 03/03/2022 444555666 3 5.5 1 9.5
14 04-Apr-2022.csv 01/04/2022 333777888 7.5 1.4 0.75 9.65
15 04-Apr-2022.csv 02/04/2022 333777888 7.5 1.4 0.75 9.65
16 04-Apr-2022.csv 03/04/2022 333777888 7.5 1.4 0.75 9.65
Sheet1

As noted, I manually kept the (file) Name column. Normally PQ would have removed it, but I thought it was useful to know what data came from which file.
I'm on the Insider Beta Channel and HOPE that not being able to select the Sample File is a temporary bug!
Hope that answers your question.

Thanks for your answer.

There are 2 main concerns of my question.

1. Removing top junk rows from CSV files.

(I want something like this but whenever I am using this function i.e. Excel.Workbook(). I am getting an error because this function doesn't work on CSV files).

2. Adding that extra TDS column.

The first time I used From File or Folder, I didn't find the TDS column because the "First" file didn't have it, and then discovered I wasn't able to change which file to use as the Sample file. So I pulled in the folder and reverse sorted the Name column Descending so that the Sample file (which is always the first file listed) had the TDS column.

You discovered that right the first 2 files won't have the TDS column. Generally, as per my work, the first n numbers of months won't have that TDS column. As soon as the first file with the TDS column is saved in the folder, I want my query to be updated accordingly.

Combine CSV Files with Junk Rows

radonwilson

Board Regular

Excel Facts

jdellasala

Well-known Member

radonwilson

Board Regular

Similar threads

Forum statistics

Share this page

Combine CSV Files with Junk Rows

radonwilson

Board Regular

Excel Facts

jdellasala

Well-known Member

radonwilson

Board Regular

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock