I have a function I use for basic renaming and cleanup on imported .csv files (end of post). I have managed to figure everything out except the removal of empty rows (commented out in the code).
I have a table of metadata in my spreadsheet for each large table I am importing. The first few rows of the corporate department table are shown and it's named "tblDeptMeta" in the source spreadsheet. Since I don't need the ComboKey field in the source data it's marked with a "Y" in the Remove Col field. Dept is the key field so I want to remove duplicates and empty rows.
I call the function below as follows:
...
Source = Excel.Workbook(File.Contents(FileSource), null, true),
PickSheet = Source{[Item="Actual",Kind="Sheet"]}[Data],
PromoteHeaders = Table.PromoteHeaders(PickSheet, [PromoteAllScalars=true]),
RemoveExtraCols = Table.RemoveColumns(PromoteHeaders,{"Column23"}),
MetaDataSource = Excel.CurrentWorkbook() { [Name = "tblDeptMeta"]}[Content],
TableSetMetaData = fnSetMetaDataFromTable(RemoveExtraCols, MetaDataSource),
...
All of the other activities in the function (Rename, Remove Errors) are a command taking a single column such as RemoveRowsWithErrors. However, the M to remove empty rows is Table.SelectRows(..., each [Column] <> null and [Column] <> "". I am not clear on how to use my list of columns with the Remove Empty flag inside that command.
I have a table of metadata in my spreadsheet for each large table I am importing. The first few rows of the corporate department table are shown and it's named "tblDeptMeta" in the source spreadsheet. Since I don't need the ComboKey field in the source data it's marked with a "Y" in the Remove Col field. Dept is the key field so I want to remove duplicates and empty rows.
Dept | |||||||
Old Name | Remove Col | New Name | New Type | Remove Errors | Remove Empty | Distinct | Comments |
ComboKey | Y | ||||||
Unit | type text | ||||||
Dept | type text | Y | Y | ||||
Descr | type text | ||||||
Operating Unit | type text |
I call the function below as follows:
...
Source = Excel.Workbook(File.Contents(FileSource), null, true),
PickSheet = Source{[Item="Actual",Kind="Sheet"]}[Data],
PromoteHeaders = Table.PromoteHeaders(PickSheet, [PromoteAllScalars=true]),
RemoveExtraCols = Table.RemoveColumns(PromoteHeaders,{"Column23"}),
MetaDataSource = Excel.CurrentWorkbook() { [Name = "tblDeptMeta"]}[Content],
TableSetMetaData = fnSetMetaDataFromTable(RemoveExtraCols, MetaDataSource),
...
All of the other activities in the function (Rename, Remove Errors) are a command taking a single column such as RemoveRowsWithErrors. However, the M to remove empty rows is Table.SelectRows(..., each [Column] <> null and [Column] <> "". I am not clear on how to use my list of columns with the Remove Empty flag inside that command.
VBA Code:
/* --- fnSetMetaDataFromTable ---
MetaTable columns:
0 "Old Name" - Name from source Text
1 "Remove Col" - Delete the column? Y/N
2 "New Name" - New name of column Text
3 "New Type" - New column type Text - standard type names
4 "Remove Errors" - Remove errors from column e.g. dates Y/N
5 "Remove Empty" - Remove blanks and nulls Y/N
Not used... have to cycle for logical OR
6 "Distinct" - Eliminate duplicates Y/N
Always a difficult decision to know if column names or column positions are more likely to change.
This function is written to use column numbers as above
*/
(Source as table, MetaTable as table) as table =>
let
// MySchema = Table.Schema(Source),
/* ========================================================================
GET THE LISTS FOR ALL THE CHANGES NEEDED.
DO NOT CHANGE THE SOURCE TABLE
ALL ACTIONS ARE AGAINST THE META TABLE
*/
// Identify the unwanted columns first and then the remainder
WhatIsColumn = Table.ColumnNames(MetaTable){1},
FilterColsToDelete = Table.SelectRows(MetaTable,
each List.Contains(
{"Y", "Yes"},
Record.Field(_, Table.ColumnNames(MetaTable){1}))),
ListZapVals = Table.Column(FilterColsToDelete, Table.ColumnNames(MetaTable){0} ),
FilterColsToSave = Table.SelectRows(MetaTable,
each List.Contains(
{null, "", "N", "No"},
Record.Field(_, Table.ColumnNames(MetaTable){1}))),
// Identify the columns to rename and create the lists of old and new names
FilterColsToRename = Table.SelectRows(FilterColsToSave,
each not List.Contains(
{null},
Record.Field(_, Table.ColumnNames(MetaTable){2}))),
ListOldNameVals = Table.Column(FilterColsToRename, Table.ColumnNames(MetaTable){0} ),
ListNewNameVals = Table.Column(FilterColsToRename, Table.ColumnNames(MetaTable){2} ),
// Identify the columns to change the type. We will do this based on the old names and first filter where there is a type listed
FilterColsToChangeType = Table.SelectRows(FilterColsToSave,
each not List.Contains(
{null},
Record.Field(_, Table.ColumnNames(MetaTable){3}))),
ListChangeTypeNameVals = Table.Column(FilterColsToChangeType, Table.ColumnNames(MetaTable){0} ),
RawTypeList = Table.Column(FilterColsToChangeType, Table.ColumnNames(MetaTable){3} ),
ConvertedRawList = List.Transform(RawTypeList, each Expression.Evaluate( _, #shared) ),
// Identify the columns for distinct values
FilterColsToZapDupes = Table.SelectRows(FilterColsToSave,
each not List.Contains(
{null, "", "N", "No"},
Record.Field(_, Table.ColumnNames(MetaTable){6}))),
ListDistinctNameVals = Table.Column(FilterColsToZapDupes, Table.ColumnNames(MetaTable){0} ),
/* NOT USED but kept for reference
// With the basic Expression.Evaluate it doesn't recognize types that aren't primitive.
// This code will change any fancy terms to the primitive term
FixedTypeList = List.ReplaceMatchingItems(RawTypeList, {{ "Int64.Type", "type number"}, {"Currency.Type", "type number"}}),
// Convert the primitive (text) values to a type
ConvertedFixedList = List.Transform(FixedTypeList, Expression.Evaluate ),
*/
/* =======================================================
IMPLEMENT CHANGES FOR THE DESIGNATED COLUMNS
*/
// -- Implement actions for the columns --
// Delete the unwanted columns
RemoveCol = if not List.IsEmpty(ListZapVals) then
Table.RemoveColumns(Source,ListZapVals)
else Source,
// Change the column types to the types
ChangeColTypes = if not List.IsEmpty(ListChangeTypeNameVals) then
Table.TransformColumnTypes(RemoveCol, List.Zip({ListChangeTypeNameVals, ConvertedRawList}))
else RemoveCol,
/* ------------------------------------------------------------------
If we have to use the primitives, this search-and-replace text from biccountant.com will do it with hard-coded values
Now we have to filter yet again to find types that were not primitives and change them manually
Use the old column name before we rename the columns since renamed columns may not need a new name
This seems to work OK if no fields match the criteria, but could put in a List.Count
// Change type back to Int64
FilterColsToInt64 = Table.SelectRows(FilterColsToSave, each ([New Type] = "Int64.Type")),
ListInt64Fields = Table.Column(FilterColsToInt64, Table.ColumnNames(Rename){0} ),
ConvertFieldsToInt64 = Table.TransformColumnTypes(ChangeColTypes,
List.Transform(ListInt64Fields,
each {_, Int64.Type}
)
),
// Change type back to Currency
FilterColsToCurrency = Table.SelectRows(FilterColsToSave, each ([New Type] = "Currency.Type")),
ListCurrencyFields = Table.Column(FilterColsToCurrency, Table.ColumnNames(Rename){0} ),
ConvertFieldsToCurrency = Table.TransformColumnTypes(ConvertFieldsToInt64,
List.Transform(ListCurrencyFields,
each {_, Currency.Type}
)
),
------------------------------------------------------------------
*/
// Remove rows with errors
FilterColsToRemoveErrors = Table.SelectRows(MetaTable,
each List.Contains(
{"Y", "Yes"},
Record.Field(_, Table.ColumnNames(MetaTable){4}))),
ListErrorNameVals = Table.Column(FilterColsToRemoveErrors, Table.ColumnNames(MetaTable){0} ),
RemoveErrorsFromTable = if not List.IsEmpty(ListErrorNameVals) then
Table.RemoveRowsWithErrors(ChangeColTypes, ListErrorNameVals)
else ChangeColTypes,
/* -----------------------------------------------------------------
Remove empty rows
FilterColsToRemoveEmpty = Table.SelectRows(MetaTable,
each List.Contains(
{"Y", "Yes"},
Record.Field(_, Table.ColumnNames(MetaTable){5}))),
ListEmptyNameVals = Table.Column(FilterColsToRemoveEmpty, Table.ColumnNames(MetaTable){5} ),
// FilterEmptyRowsFromTable = Table.SelectRows(NewTable, each [ListEmptyNameVals] <> null and [ListEmptyNameVals] <> ""),
------------------------------------------------------------------
*/
// Eliminate duplicates
RemoveDuplicateRows = if not List.IsEmpty(ListDistinctNameVals) then
Table.Distinct(RemoveErrorsFromTable, ListDistinctNameVals)
else RemoveErrorsFromTable,
// Finally, rename the columns with the new name
RenameCols = if not List.IsEmpty(ListNewNameVals) then
Table.RenameColumns(RemoveDuplicateRows, List.Zip({ListOldNameVals, ListNewNameVals}))
else RemoveDuplicateRows
in
RenameCols