Combine CSV Files with Junk Rows

radonwilson

Board Regular
Joined
Jun 23, 2021
Messages
50
Office Version
  1. 2019
Platform
  1. Windows
I have 4 different .csv files that sit in a folder. I want to combine all of these.

But there is a catch, the first 2 files have 4 top junk rows that need to be removed and 5 columns.
And the last 2 files have 5 top junks rows and 6 columns.

I am looking for a dynamic way of combining all my CSV files from a folder.


Download Files

Expecting Result:-
Result.xlsx
ABCDEF
1DateSettlement IDclosing feespromo rebatesTDStotal
201-01-202211122233312.5null3.5
302-01-202211122233312.5null3.5
403-01-202211122233312.5null3.5
504-01-202211122233312.5null3.5
605-01-202211122233312.5null3.5
701-02-202222233344423.5null4.5
802-02-202222233344423.5null4.5
903-02-202222233344423.5null4.5
1004-02-202222233344423.5null4.5
1101-03-202244455566635.519.5
1202-03-202244455566635.519.5
1303-03-202244455566635.519.5
1401-04-20223337778887.51.40.759.65
1502-04-20223337778887.51.40.759.65
1603-04-20223337778887.51.40.759.65
Sheet1
 

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
The first time I used From File or Folder, I didn't find the TDS column because the "First" file didn't have it, and then discovered I wasn't able to change which file to use as the Sample file. So I pulled in the folder and reverse sorted the Name column Descending so that the Sample file (which is always the first file listed) had the TDS column.
After expanding the Binary column, PQ generated steps and 4 queries that it put into their own folder. In the main Query (Files) I inserted the step RemovedColumns before the generated step Removed Other Columns and then removed that step. The final code for the Files query looked like this
Power Query:
let
    Source = Folder.Files("C:\Temp\Files"),
    RemovedOtherColumns = Table.SelectColumns(Source,{"Name", "Content"}),
    SortedRows = Table.Sort(RemovedOtherColumns,{{"Name", Order.Descending}}),
    #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    RemovedColumns = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedColumns, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))
in
    #"Expanded Table Column1"
but created a table like this
Book1
ABCDEFG
1NameColumn1Column2Column3Column4Column5Column6
204-Apr-2022.csvLine 1
304-Apr-2022.csvLine 2
404-Apr-2022.csvLine 3
504-Apr-2022.csvLine 4
604-Apr-2022.csvLine 5
704-Apr-2022.csvDateSettlement IDclosing feespromo rebatesTDStotal
804-Apr-2022.csv01-04-20223337778887.51.40.759.65
904-Apr-2022.csv02-04-20223337778887.51.40.759.65
1004-Apr-2022.csv03-04-20223337778887.51.40.759.65
1103-Mar-2022.csvLine 1
1203-Mar-2022.csvLine 2
1303-Mar-2022.csvLine 3
1403-Mar-2022.csvLine 4
1503-Mar-2022.csvLine 5
1603-Mar-2022.csvDateSettlement IDclosing feespromo rebatesTDStotal
1703-Mar-2022.csv01-03-202244455566635.519.5
1803-Mar-2022.csv02-03-202244455566635.519.5
1903-Mar-2022.csv03-03-202244455566635.519.5
2002-Feb-2022.csvLine 1
2102-Feb-2022.csvLine 2
2202-Feb-2022.csvLine 3
2302-Feb-2022.csvLine 4
2402-Feb-2022.csvDateSettlement IDclosing feespromo rebatestotal
2502-Feb-2022.csv01-02-202222233344423.54.5
2602-Feb-2022.csv02-02-202222233344423.54.5
2702-Feb-2022.csv03-02-202222233344423.54.5
2802-Feb-2022.csv04-02-202222233344423.54.5
2901-Jan-2022.csvLine 1
3001-Jan-2022.csvLine 2
3101-Jan-2022.csvLine 3
3201-Jan-2022.csvLine 4
3301-Jan-2022.csvDateSettlement IDclosing feespromo rebatestotal
3401-Jan-2022.csv01-01-202211122233312.53.5
3501-Jan-2022.csv02-01-202211122233312.53.5
3601-Jan-2022.csv03-01-202211122233312.53.5
3701-Jan-2022.csv04-01-202211122233312.53.5
3801-Jan-2022.csv05-01-202211122233312.53.5
Sheet2

Obviously that's not what you want, but the trick is to modify the Transform Sample File query like this:
Power Query:
let
    Source = Csv.Document(Parameter1,[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    FilteredRows = Table.SelectRows(Source, each not Text.StartsWith([Column1], "Line")),
    CapitalizedEachWord = Table.TransformColumns(FilteredRows,{{"Column3", Text.Proper, type text}, {"Column4", Text.Proper, type text}, {"Column6", Text.Proper, type text}}),
    PromotedHeaders = Table.PromoteHeaders(CapitalizedEachWord, [PromoteAllScalars=true])
in
    PromotedHeaders
With the tables now cleaned up, just a few steps were needed to complete the final table:
Power Query:
let
    Source = Folder.Files("C:\Temp\Files"),
    SortedRows = Table.Sort(Source,{{"Name", Order.Descending}}),
    #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true),
    RemovedOtherColumns = Table.SelectColumns(#"Filtered Hidden Files1",{"Name", "Content"}),
    #"Invoke Custom Function1" = Table.AddColumn(RemovedOtherColumns, "Transform File", each #"Transform File"([Content])),
    RemovedContent = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedContent, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    RenamedColumns = Table.RenameColumns(#"Expanded Table Column1",{{"Total", "Old Total"}}),
    ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Name", type text}, {"Date", type date}, {"Settlement ID", Int64.Type}, {"Closing Fees", type number}, {"Promo Rebates", type number}, {"TDS", type number}, {"Old Total", type number}}),
    InsertedSum = Table.AddColumn(ChangedType, "Total", each List.Sum({[Closing Fees], [Promo Rebates], [TDS]}), type number),
    RemovedColumns = Table.RemoveColumns(InsertedSum,{"Old Total"}),
    SortedRows2 = Table.Sort(RemovedColumns,{{"Name", Order.Ascending}, {"Date", Order.Ascending}, {"Settlement ID", Order.Ascending}})
in
    SortedRows2
and resulting in this table:
Book1
ABCDEFG
1NameDateSettlement IDClosing FeesPromo RebatesTDSTotal
201-Jan-2022.csv01/01/202211122233312.53.5
301-Jan-2022.csv02/01/202211122233312.53.5
401-Jan-2022.csv03/01/202211122233312.53.5
501-Jan-2022.csv04/01/202211122233312.53.5
601-Jan-2022.csv05/01/202211122233312.53.5
702-Feb-2022.csv01/02/202222233344423.55.5
802-Feb-2022.csv02/02/202222233344423.55.5
902-Feb-2022.csv03/02/202222233344423.55.5
1002-Feb-2022.csv04/02/202222233344423.55.5
1103-Mar-2022.csv01/03/202244455566635.519.5
1203-Mar-2022.csv02/03/202244455566635.519.5
1303-Mar-2022.csv03/03/202244455566635.519.5
1404-Apr-2022.csv01/04/20223337778887.51.40.759.65
1504-Apr-2022.csv02/04/20223337778887.51.40.759.65
1604-Apr-2022.csv03/04/20223337778887.51.40.759.65
Sheet1

As noted, I manually kept the (file) Name column. Normally PQ would have removed it, but I thought it was useful to know what data came from which file.
I'm on the Insider Beta Channel and HOPE that not being able to select the Sample File is a temporary bug!
Hope that answers your question.
 
Upvote 0
The first time I used From File or Folder, I didn't find the TDS column because the "First" file didn't have it, and then discovered I wasn't able to change which file to use as the Sample file. So I pulled in the folder and reverse sorted the Name column Descending so that the Sample file (which is always the first file listed) had the TDS column.
After expanding the Binary column, PQ generated steps and 4 queries that it put into their own folder. In the main Query (Files) I inserted the step RemovedColumns before the generated step Removed Other Columns and then removed that step. The final code for the Files query looked like this
Power Query:
let
    Source = Folder.Files("C:\Temp\Files"),
    RemovedOtherColumns = Table.SelectColumns(Source,{"Name", "Content"}),
    SortedRows = Table.Sort(RemovedOtherColumns,{{"Name", Order.Descending}}),
    #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    RemovedColumns = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedColumns, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))
in
    #"Expanded Table Column1"
but created a table like this
Book1
ABCDEFG
1NameColumn1Column2Column3Column4Column5Column6
204-Apr-2022.csvLine 1
304-Apr-2022.csvLine 2
404-Apr-2022.csvLine 3
504-Apr-2022.csvLine 4
604-Apr-2022.csvLine 5
704-Apr-2022.csvDateSettlement IDclosing feespromo rebatesTDStotal
804-Apr-2022.csv01-04-20223337778887.51.40.759.65
904-Apr-2022.csv02-04-20223337778887.51.40.759.65
1004-Apr-2022.csv03-04-20223337778887.51.40.759.65
1103-Mar-2022.csvLine 1
1203-Mar-2022.csvLine 2
1303-Mar-2022.csvLine 3
1403-Mar-2022.csvLine 4
1503-Mar-2022.csvLine 5
1603-Mar-2022.csvDateSettlement IDclosing feespromo rebatesTDStotal
1703-Mar-2022.csv01-03-202244455566635.519.5
1803-Mar-2022.csv02-03-202244455566635.519.5
1903-Mar-2022.csv03-03-202244455566635.519.5
2002-Feb-2022.csvLine 1
2102-Feb-2022.csvLine 2
2202-Feb-2022.csvLine 3
2302-Feb-2022.csvLine 4
2402-Feb-2022.csvDateSettlement IDclosing feespromo rebatestotal
2502-Feb-2022.csv01-02-202222233344423.54.5
2602-Feb-2022.csv02-02-202222233344423.54.5
2702-Feb-2022.csv03-02-202222233344423.54.5
2802-Feb-2022.csv04-02-202222233344423.54.5
2901-Jan-2022.csvLine 1
3001-Jan-2022.csvLine 2
3101-Jan-2022.csvLine 3
3201-Jan-2022.csvLine 4
3301-Jan-2022.csvDateSettlement IDclosing feespromo rebatestotal
3401-Jan-2022.csv01-01-202211122233312.53.5
3501-Jan-2022.csv02-01-202211122233312.53.5
3601-Jan-2022.csv03-01-202211122233312.53.5
3701-Jan-2022.csv04-01-202211122233312.53.5
3801-Jan-2022.csv05-01-202211122233312.53.5
Sheet2

Obviously that's not what you want, but the trick is to modify the Transform Sample File query like this:
Power Query:
let
    Source = Csv.Document(Parameter1,[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    FilteredRows = Table.SelectRows(Source, each not Text.StartsWith([Column1], "Line")),
    CapitalizedEachWord = Table.TransformColumns(FilteredRows,{{"Column3", Text.Proper, type text}, {"Column4", Text.Proper, type text}, {"Column6", Text.Proper, type text}}),
    PromotedHeaders = Table.PromoteHeaders(CapitalizedEachWord, [PromoteAllScalars=true])
in
    PromotedHeaders
With the tables now cleaned up, just a few steps were needed to complete the final table:
Power Query:
let
    Source = Folder.Files("C:\Temp\Files"),
    SortedRows = Table.Sort(Source,{{"Name", Order.Descending}}),
    #"Filtered Hidden Files1" = Table.SelectRows(SortedRows, each [Attributes]?[Hidden]? <> true),
    RemovedOtherColumns = Table.SelectColumns(#"Filtered Hidden Files1",{"Name", "Content"}),
    #"Invoke Custom Function1" = Table.AddColumn(RemovedOtherColumns, "Transform File", each #"Transform File"([Content])),
    RemovedContent = Table.RemoveColumns(#"Invoke Custom Function1",{"Content"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(RemovedContent, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    RenamedColumns = Table.RenameColumns(#"Expanded Table Column1",{{"Total", "Old Total"}}),
    ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Name", type text}, {"Date", type date}, {"Settlement ID", Int64.Type}, {"Closing Fees", type number}, {"Promo Rebates", type number}, {"TDS", type number}, {"Old Total", type number}}),
    InsertedSum = Table.AddColumn(ChangedType, "Total", each List.Sum({[Closing Fees], [Promo Rebates], [TDS]}), type number),
    RemovedColumns = Table.RemoveColumns(InsertedSum,{"Old Total"}),
    SortedRows2 = Table.Sort(RemovedColumns,{{"Name", Order.Ascending}, {"Date", Order.Ascending}, {"Settlement ID", Order.Ascending}})
in
    SortedRows2
and resulting in this table:
Book1
ABCDEFG
1NameDateSettlement IDClosing FeesPromo RebatesTDSTotal
201-Jan-2022.csv01/01/202211122233312.53.5
301-Jan-2022.csv02/01/202211122233312.53.5
401-Jan-2022.csv03/01/202211122233312.53.5
501-Jan-2022.csv04/01/202211122233312.53.5
601-Jan-2022.csv05/01/202211122233312.53.5
702-Feb-2022.csv01/02/202222233344423.55.5
802-Feb-2022.csv02/02/202222233344423.55.5
902-Feb-2022.csv03/02/202222233344423.55.5
1002-Feb-2022.csv04/02/202222233344423.55.5
1103-Mar-2022.csv01/03/202244455566635.519.5
1203-Mar-2022.csv02/03/202244455566635.519.5
1303-Mar-2022.csv03/03/202244455566635.519.5
1404-Apr-2022.csv01/04/20223337778887.51.40.759.65
1504-Apr-2022.csv02/04/20223337778887.51.40.759.65
1604-Apr-2022.csv03/04/20223337778887.51.40.759.65
Sheet1

As noted, I manually kept the (file) Name column. Normally PQ would have removed it, but I thought it was useful to know what data came from which file.
I'm on the Insider Beta Channel and HOPE that not being able to select the Sample File is a temporary bug!
Hope that answers your question.
Thanks for your answer.

There are 2 main concerns of my question.

1. Removing top junk rows from CSV files.
(I want something like this but whenever I am using this function i.e. Excel.Workbook(). I am getting an error because this function doesn't work on CSV files).


2. Adding that extra TDS column.

The first time I used From File or Folder, I didn't find the TDS column because the "First" file didn't have it, and then discovered I wasn't able to change which file to use as the Sample file. So I pulled in the folder and reverse sorted the Name column Descending so that the Sample file (which is always the first file listed) had the TDS column.
You discovered that right the first 2 files won't have the TDS column. Generally, as per my work, the first n numbers of months won't have that TDS column. As soon as the first file with the TDS column is saved in the folder, I want my query to be updated accordingly.
 
Upvote 0

Forum statistics

Threads
1,225,348
Messages
6,184,432
Members
453,231
Latest member
HerGP

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top