- Excel Version
- 365
- 2019
- 2016
- 2013
When I read Bill's "How Would You Clean This Data" challenge couple months ago, I thought that it could be easily done by using VBA. However, the article was all about Power Query. I was interested in learning the M language. I was even more interested in comparing VBA with M language.
During the process of solving the problem using the M language, I realize that M should be easy to learn for anyone already using VBA. By using M instead of the Power Query interface, your Power Query powers will double.
In this article, you will find my way of cleaning the data by using the List.Generate function in the M language.
To refresh your memory of the original problem, the following is the source data defined by "UglyData" name.
The following is the desired result:
I am impressed by the power of Power Query. It serves the result as a query table in the worksheet. The user-friendly interface with simple buttons that enables anyone to transform your data and get the job done. I strongly believe that every Excel user should try Power Query to deal with data transformation at least once. It is simply amazing. But, if you limit yourself to the commands found in the Power Query ribbon, you will be using less than half of the entire power available in the tool. If you are a VBA programmer and you are interested unleashing the entire power of Power Query, it is easy to transfer your VBA skills to learn the M language. Soon, you will find you have unlocked more power in Power Query.
Do you remember when you were learning VBA? You likely relied on the macro recorder and the intellisense code completion. Early in your learning journey, the macro recorder likely saved you hours.
When you use the ribbon icons in Power Query, Excel generates M code just like macro recorder in VBA. The M code can be revealed by using the Advanced Editor in Power Query.
Power Query ribbon
However, for me, the auto-generated M code is hard to read. The auto-generated code uses long variable names including spaces - quoted identifiers.
As a traditional programmer, variable names should never have spaces. Besides, quotes surrounding the variable names make it really hard to understand if it is a text value or variable. So, I defined my first rule while coding in M language - Do not use spaces in variable names and change the auto generated variable names immediately. Basically, use
When I first looked at the M code in the Advanced Editor, I noticed that it looks nothing like a procedural language. This makes sense since it is not a procedural but functional language.
The M code recorder builds the final code as steps that consist of variables. It uses the previous variable to create the next variable. Eventually it arrives at your final result. The recorded code looks like a strict structure. However, the M language does not care if you use white space or new lines while you are coding. This means what when you start manually coding in the editor, you can write your code just as you would write in VBA. The important rule is that you add a comma after each variable definition (except for the last variable).
This became my second rule - Treat M code like VBA code in the editor. It simply makes the code more readable. The code will have a familiar look for experienced VBA readers. You will also find it more natural to type code in this format.
The following code that I applied my rules works exactly the same with the auto-generated one.
My last rule is a little bit different that the first two rules - Find corresponding M functions for VBA methods.
VBA programmers are familiar with the For… Next loop in VBA. The equivalent functionality in M is List.Generate. You will see this structure used when cleaning the data in Bill’s problem. This way we can make a comparison between coding in two different programming languages.
Let's create a new blank query and copy and paste the following code in the Advanced Editor.
Now we have the source data from the worksheet range called "UglyData" in Power Query. It would create the same table if we loaded this query to a worksheet since we didn't do anything on it yet - except employee quarterly column headers since Power Query treats table columns as fields and field names must be unique, so Q1, Q2, Q3, and Q4 for each employee will be automatically suffixed with an incremental number following an underscore. So, let's start transforming data.
My intention is building two nested loops where the inner loop is looping through the category descriptions (rows) while the outer loop is looping through the employees (columns). The following VBA code shows how I would do the same thing if I was solving this problem by using VBA since the source data has predefined columns. So, I will be retrieving quarterly data for each employee and for each category from the corresponding rows and columns. It could be done in many different ways in VBA, and likely better than the code below, but I intentionally wrote it as below to make it look like the M code that I will write in this article.
We obviously need to get the employee and category count as well as the category names from the source data. Let's modify the M code to create corresponding variables.
The let expression simply encapsulates uniquely named variables consist of assigned and/or calculated values separated by comma. Each variable can be used in a subsequent expression to calculate another value defined as another variable. As you can see, we created three helper variables that we will use in the next steps below.
The Result variable is only for showing a list of these three helper variables by using List.Combine function in this step. In the screenshot below, I clicked on the last list element, CategoryNames, to reveal the list content consist of category names as it is also a list element in the generated list, and it won't be displayed like the first two primitive value types in the result list which are employee and category counts.
List.Combine to create a new list
Now we can build the outer loop by using the same algorithm as we did in VBA. The corresponding M function that will let us build a loop is List.Generate function. List.Generate doesn't look like the same For...Next statement in VBA but it is doing the iteration that we need.
Let's modify the M code to create a list of employee names retrieved from the source range. Remember that employee columns are placed in 7th, 12th, and 17th columns in the worksheet. We will use Table.ColumnNames function to get the column names in the source data, and return nth column's name from the list by calculating the n with
And the result:
List.Generate to create a loop
Let's take a closer look at the List.Generate function. It takes four function values as parameters called initial, condition, next, and selector.
So, it is now more obvious what
The each keyword forms a function that returns an object by using the record in the current scope. And it is just a shortcut way of writing the actual function. So, the following function returns a boolean value according to i field value.
Let's rewrite the condition function without the each keyword and compare.
By using the each keyword, we simply create the function without using scopeDataObject function parameter as it is automatically injected into the function by the host function, which is List.Generate. This saves us using an unnecessary variable, makes the code more readable, and probably also some memory at the background.
According to the desired result, we need to list all category names for each employee name. We have the employee names list so far. If we simply repeat employee names for each category name, and eventually add corresponding quarterly columns to the result, then we will get what we need. However, to return multiple columns, we need more than a list. Look at the following samples below.
A List simply is a column in Power Query.
This query creates the following list with three string values.
Create a basic list
A Record is on the other hand, a row consists of named fields.
And this query creates the following record with three named fields.
Create a basic record
Notice that we use curly braces to create a list, and square brackets to create a record that contains field definitions and values.
What happens if we used records as list items? I will also include a Name field in the record this time.
This is a list of records just as expected, and beautiful! If we click on a cell, then we will see the content of the record.
List of records
Now, we have a list of records. All we need to do is convert the list to a table, and then expanding the record fields as table columns.
That's it!
Table.FromList and Table.ExpandRecordColumn
Let's apply this logic to our actual query now. Instead of creating the inner loop for the category names right away, I will write the code to retrieve only the first-row values from the source data, which is the "Administrative" category row.
Perfect! We have the result table for "Administrative" category. The next step is creating an inner loop to loop through all categories.
Before that, let's examine how we are building the record in selector function.
Let's look at the parameters of the inner loop.
Thanks for reading! If you enjoyed this article, then please let me know.
During the process of solving the problem using the M language, I realize that M should be easy to learn for anyone already using VBA. By using M instead of the Power Query interface, your Power Query powers will double.
In this article, you will find my way of cleaning the data by using the List.Generate function in the M language.
To refresh your memory of the original problem, the following is the source data defined by "UglyData" name.
Category Description | Dept. Total | Q1 | Q2 | Q3 | Q4 | Employee 1 | Q1 | Q2 | Q3 | Q4 | Employee 2 | Q1 | Q2 | Q3 | Q4 | Employee 3 | Q1 | Q2 | Q3 | Q4 |
Administrative | 48.0 | 40.0 | 8.0 | 0.0 | 0.0 | 18.0 | 14.0 | 4.0 | 0.0 | 0.0 | 15.0 | 13.0 | 2.0 | 0.0 | 0.0 | 15.0 | 13.0 | 2.0 | 0.0 | 0.0 |
Holiday | 126.0 | 0.0 | 44.0 | 20.0 | 62.0 | 56.0 | 0.0 | 16.0 | 8.0 | 32.0 | 35.0 | 0.0 | 14.0 | 6.0 | 15.0 | 35.0 | 0.0 | 14.0 | 6.0 | 15.0 |
PTO/LOA/Jury Duty | 66.0 | 0.0 | 66.0 | 0.0 | 0.0 | 24.0 | 0.0 | 24.0 | 0.0 | 0.0 | 21.0 | 0.0 | 21.0 | 0.0 | 0.0 | 21.0 | 0.0 | 21.0 | 0.0 | 0.0 |
Project A | 53.0 | 41.5 | 11.5 | 0.0 | 0.0 | 21.0 | 16.5 | 4.5 | 0.0 | 0.0 | 16.0 | 12.5 | 3.5 | 0.0 | 0.0 | 16.0 | 12.5 | 3.5 | 0.0 | 0.0 |
Project B | 212.5 | 135.0 | 77.5 | 0.0 | 0.0 | 82.5 | 53.0 | 29.5 | 0.0 | 0.0 | 65.0 | 41.0 | 24.0 | 0.0 | 0.0 | 65.0 | 41.0 | 24.0 | 0.0 | 0.0 |
Project C | 152.0 | 141.5 | 10.5 | 0.0 | 0.0 | 63.0 | 60.5 | 2.5 | 0.0 | 0.0 | 44.5 | 40.5 | 4.0 | 0.0 | 0.0 | 44.5 | 40.5 | 4.0 | 0.0 | 0.0 |
The following is the desired result:
Category Description | Employee Name | Q1 | Q2 | Q3 | Q4 | Total |
Administrative | Employee 1 | 14 | 4 | 0 | 0 | 18 |
Holiday | Employee 1 | 0 | 16 | 8 | 32 | 56 |
PTO/LOA/Jury Duty | Employee 1 | 0 | 24 | 0 | 0 | 24 |
Project A | Employee 1 | 16.5 | 4.5 | 0 | 0 | 21 |
Project B | Employee 1 | 53 | 29.5 | 0 | 0 | 82.5 |
Project C | Employee 1 | 60.5 | 2.5 | 0 | 0 | 63 |
Administrative | Employee 2 | 13 | 2 | 0 | 0 | 15 |
Holiday | Employee 2 | 0 | 14 | 6 | 15 | 35 |
PTO/LOA/Jury Duty | Employee 2 | 0 | 21 | 0 | 0 | 21 |
Project A | Employee 2 | 12.5 | 3.5 | 0 | 0 | 16 |
Project B | Employee 2 | 41 | 24 | 0 | 0 | 65 |
Project C | Employee 2 | 40.5 | 4 | 0 | 0 | 44.5 |
Administrative | Employee 3 | 13 | 2 | 0 | 0 | 15 |
Holiday | Employee 3 | 0 | 14 | 6 | 15 | 35 |
PTO/LOA/Jury Duty | Employee 3 | 0 | 21 | 0 | 0 | 21 |
Project A | Employee 3 | 12.5 | 3.5 | 0 | 0 | 16 |
Project B | Employee 3 | 41 | 24 | 0 | 0 | 65 |
Project C | Employee 3 | 40.5 | 4 | 0 | 0 | 44.5 |
I am impressed by the power of Power Query. It serves the result as a query table in the worksheet. The user-friendly interface with simple buttons that enables anyone to transform your data and get the job done. I strongly believe that every Excel user should try Power Query to deal with data transformation at least once. It is simply amazing. But, if you limit yourself to the commands found in the Power Query ribbon, you will be using less than half of the entire power available in the tool. If you are a VBA programmer and you are interested unleashing the entire power of Power Query, it is easy to transfer your VBA skills to learn the M language. Soon, you will find you have unlocked more power in Power Query.
Do you remember when you were learning VBA? You likely relied on the macro recorder and the intellisense code completion. Early in your learning journey, the macro recorder likely saved you hours.
When you use the ribbon icons in Power Query, Excel generates M code just like macro recorder in VBA. The M code can be revealed by using the Advanced Editor in Power Query.
Power Query ribbon
However, for me, the auto-generated M code is hard to read. The auto-generated code uses long variable names including spaces - quoted identifiers.
Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="UglyData"]}[Content],
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Category Description", type text},
{"Dept. Total", type number}, {"Q1", type number}, {"Q2", type number}, {"Q3", Int64.Type},
{"Q4", Int64.Type}, {"Employee 1", type number}, {"Q1_1", type number}, {"Q2_2", type number},
{"Q3_3", Int64.Type}, {"Q4_4", Int64.Type}, {"Employee 2", type number}, {"Q1_5", type number},
{"Q2_6", type number}, {"Q3_7", Int64.Type}, {"Q4_8", Int64.Type}, {"Employee 3", type number},
{"Q1_9", type number}, {"Q2_10", type number}, {"Q3_11", Int64.Type}, {"Q4_12", Int64.Type}})
in
#"Changed Type"
As a traditional programmer, variable names should never have spaces. Besides, quotes surrounding the variable names make it really hard to understand if it is a text value or variable. So, I defined my first rule while coding in M language - Do not use spaces in variable names and change the auto generated variable names immediately. Basically, use
ChangedType
instead of #"Changed Type"
.When I first looked at the M code in the Advanced Editor, I noticed that it looks nothing like a procedural language. This makes sense since it is not a procedural but functional language.
The M code recorder builds the final code as steps that consist of variables. It uses the previous variable to create the next variable. Eventually it arrives at your final result. The recorded code looks like a strict structure. However, the M language does not care if you use white space or new lines while you are coding. This means what when you start manually coding in the editor, you can write your code just as you would write in VBA. The important rule is that you add a comma after each variable definition (except for the last variable).
This became my second rule - Treat M code like VBA code in the editor. It simply makes the code more readable. The code will have a familiar look for experienced VBA readers. You will also find it more natural to type code in this format.
The following code that I applied my rules works exactly the same with the auto-generated one.
Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="UglyData"]}[Content],
PromotedHeaders = Table.PromoteHeaders(
Source,
[PromoteAllScalars=true]
),
ChangedType = Table.TransformColumnTypes(
PromotedHeaders,
{
{"Category Description", type text},
{"Dept. Total", type number}, {"Q1", type number}, {"Q2", type number}, {"Q3", Int64.Type}, {"Q4", Int64.Type},
{"Employee 1", type number}, {"Q1_1", type number}, {"Q2_2", type number}, {"Q3_3", Int64.Type}, {"Q4_4", Int64.Type},
{"Employee 2", type number}, {"Q1_5", type number}, {"Q2_6", type number}, {"Q3_7", Int64.Type}, {"Q4_8", Int64.Type},
{"Employee 3", type number}, {"Q1_9", type number}, {"Q2_10", type number}, {"Q3_11", Int64.Type}, {"Q4_12", Int64.Type}
}
)
in
ChangedType
My last rule is a little bit different that the first two rules - Find corresponding M functions for VBA methods.
VBA programmers are familiar with the For… Next loop in VBA. The equivalent functionality in M is List.Generate. You will see this structure used when cleaning the data in Bill’s problem. This way we can make a comparison between coding in two different programming languages.
Let's create a new blank query and copy and paste the following code in the Advanced Editor.
Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="UglyData"]}[Content],
SourceWithHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true])
in
SourceWithHeaders
Now we have the source data from the worksheet range called "UglyData" in Power Query. It would create the same table if we loaded this query to a worksheet since we didn't do anything on it yet - except employee quarterly column headers since Power Query treats table columns as fields and field names must be unique, so Q1, Q2, Q3, and Q4 for each employee will be automatically suffixed with an incremental number following an underscore. So, let's start transforming data.
My intention is building two nested loops where the inner loop is looping through the category descriptions (rows) while the outer loop is looping through the employees (columns). The following VBA code shows how I would do the same thing if I was solving this problem by using VBA since the source data has predefined columns. So, I will be retrieving quarterly data for each employee and for each category from the corresponding rows and columns. It could be done in many different ways in VBA, and likely better than the code below, but I intentionally wrote it as below to make it look like the M code that I will write in this article.
VBA Code:
Sub TransformData()
Dim rngSource As Range
Dim rngCategoryNames As Range
Dim intEmployeeCount As Integer
Dim intCategoryCount As Integer
Dim i As Integer
Dim j As Integer
Dim strCategory As String
Dim strEmployee As String
Dim dblQ1 As Double
Dim dblQ2 As Double
Dim dblQ3 As Double
Dim dblQ4 As Double
Dim dblTotal As Double
Set rngSource = Sheet1.Range("UglyData")
intEmployeeCount = (rngSource.Columns.Count - 6) / 5
Set rngCategoryNames = rngSource.Columns(1).Offset(1).Resize(rngSource.Columns(1).Rows.Count - 1)
intCategoryCount = rngCategoryNames.Rows.Count
ReDim arrResult(intEmployeeCount * intCategoryCount, 6)
arrResult(0, 0) = "Category Description"
arrResult(0, 1) = "Employee Name"
arrResult(0, 2) = "Q1"
arrResult(0, 3) = "Q2"
arrResult(0, 4) = "Q3"
arrResult(0, 5) = "Q4"
arrResult(0, 6) = "Total"
For i = 0 To intEmployeeCount - 1
strEmployee = rngSource.Cells(1, i * 5 + 6)
' Note that we already used first-row index, 0
' for storing table headers.
' So, we start j counter from 1.
For j = 1 To intCategoryCount
strCategory = rngCategoryNames.Cells(j).Value
dblQ1 = rngSource.Cells(j + 1, i * 5 + 7)
dblQ2 = rngSource.Cells(j + 1, i * 5 + 8)
dblQ3 = rngSource.Cells(j + 1, i * 5 + 9)
dblQ4 = rngSource.Cells(j + 1, i * 5 + 10)
dblTotal = dblQ1 + dblQ2 + dblQ3 + dblQ4
arrResult(j + i * intCategoryCount, 0) = strCategory
arrResult(j + i * intCategoryCount, 1) = strEmployee
arrResult(j + i * intCategoryCount, 2) = dblQ1
arrResult(j + i * intCategoryCount, 3) = dblQ2
arrResult(j + i * intCategoryCount, 4) = dblQ3
arrResult(j + i * intCategoryCount, 5) = dblQ4
arrResult(j + i * intCategoryCount, 6) = dblTotal
Next j
Next i
Sheet1.Range("A30").Resize(UBound(arrResult, 1) + 1, UBound(arrResult, 2) + 1).Value = arrResult
End Sub
We obviously need to get the employee and category count as well as the category names from the source data. Let's modify the M code to create corresponding variables.
Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="UglyData"]}[Content],
SourceWithHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
// First six columns are not employee related,
// and each employee has five columns including employee total and four quarterly columns
EmployeeCount = (Table.ColumnCount(Source) - 6) / 5,
CategoryNames = Table.Column(SourceWithHeaders, "Category Description"),
CategoryCount = List.Count(CategoryNames),
Result = List.Combine({{EmployeeCount, CategoryCount, CategoryNames}})
in
Result
The let expression simply encapsulates uniquely named variables consist of assigned and/or calculated values separated by comma. Each variable can be used in a subsequent expression to calculate another value defined as another variable. As you can see, we created three helper variables that we will use in the next steps below.
The Result variable is only for showing a list of these three helper variables by using List.Combine function in this step. In the screenshot below, I clicked on the last list element, CategoryNames, to reveal the list content consist of category names as it is also a list element in the generated list, and it won't be displayed like the first two primitive value types in the result list which are employee and category counts.
List.Combine to create a new list
Now we can build the outer loop by using the same algorithm as we did in VBA. The corresponding M function that will let us build a loop is List.Generate function. List.Generate doesn't look like the same For...Next statement in VBA but it is doing the iteration that we need.
Let's modify the M code to create a list of employee names retrieved from the source range. Remember that employee columns are placed in 7th, 12th, and 17th columns in the worksheet. We will use Table.ColumnNames function to get the column names in the source data, and return nth column's name from the list by calculating the n with
i * 5 + 1
formula as 6, 11, and 16. Notice that indexes start from zero in M language functions as different than the actual column index numbers in the worksheet, so ColumnNames{6}
is actually 7th column in the worksheet and so on.
Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="UglyData"]}[Content],
SourceWithHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ColumnNames = Table.ColumnNames(SourceWithHeaders),
EmployeeCount = (Table.ColumnCount(Source) - 6) / 5,
CategoryNames = Table.Column(SourceWithHeaders, "Category Description"),
CategoryCount = List.Count(CategoryNames),
Result = List.Generate(
() => [i = 0],
each [i] < EmployeeCount,
each [i = [i] + 1],
each ColumnNames{[i] * 5 + 6}
)
in
Result
And the result:
List.Generate to create a loop
Let's take a closer look at the List.Generate function. It takes four function values as parameters called initial, condition, next, and selector.
- Initial function:
() => [i = 0]
- List.Generate starts with an initial function returning a record which contains the control variable field named i which is equal to 0 initially. This function is executed only once to assign starting values of the variables that will be used in the iteration and return a record. This record is then used in the scope of List.Generate function.
- Condition function:
each [i] < EmployeeCount
- List.Generate checks the condition function value on each iteration, and only execute the next iteration if this function returns true. Here we are using i field as the control value that will be incremented in the next function and continue looping while it is less than the employee count.
- Next function:
each [i = [i] + 1]
- This function sets new values for the variables that were assigned in the initial function which is only the control value hold by i field in our code and return as the updated record to be used in the scope.
- Selector function:
each ColumnNames{[i] * 5 + 6}
- Finally, the function that returns something meaningful from the iteration. We can omit selector function, in which case iteration will return the record assigned in next function contains the incremented value of i field. However, we need to select and return values from the corresponding columns of the source data, so we do not omit selector function in this implementation.
Power Query:
let
fnMyFunction = (x) => x + 5,
Result = fnMyFunction(10)
in
Result
(x) => x + 5
is the function in this assignment. (x) before the function definition symbol, which is =>, is where we include the expected parameters by the function, which is x in this function. The calculation after the function definition symbol is the return value of the function.Result = fnMyFunction(10)
is where we call the function as we assigned fnMyFunction variable as a function value. This query will return 15 as result.So, it is now more obvious what
() => [i = 1]
initial function is doing in our actual query. Without taking any parameter, it is creating a record contains i field which will be used as the parameter by the other three functions in the scope of the List.Generate function. However, we do not see the same function structure in the other three functions, but each keyword. Therefore, it is time to talk a little bit about each keyword in M language.The each keyword forms a function that returns an object by using the record in the current scope. And it is just a shortcut way of writing the actual function. So, the following function returns a boolean value according to i field value.
Power Query:
each [i] <= EmployeeCount
Let's rewrite the condition function without the each keyword and compare.
Power Query:
(scopeDataObject) => scopeDataObject[i] <= EmployeeCount
By using the each keyword, we simply create the function without using scopeDataObject function parameter as it is automatically injected into the function by the host function, which is List.Generate. This saves us using an unnecessary variable, makes the code more readable, and probably also some memory at the background.
According to the desired result, we need to list all category names for each employee name. We have the employee names list so far. If we simply repeat employee names for each category name, and eventually add corresponding quarterly columns to the result, then we will get what we need. However, to return multiple columns, we need more than a list. Look at the following samples below.
A List simply is a column in Power Query.
Power Query:
let
SourceList = {"E1", "E2", "E3"}
in
SourceList
This query creates the following list with three string values.
Create a basic list
A Record is on the other hand, a row consists of named fields.
Power Query:
let
SourceRecord = [Q1=1, Q2=2, Q3=4]
in
SourceRecord
And this query creates the following record with three named fields.
Create a basic record
Notice that we use curly braces to create a list, and square brackets to create a record that contains field definitions and values.
What happens if we used records as list items? I will also include a Name field in the record this time.
Power Query:
let
SourceList = {
[Name = "E1", Q1 = 1, Q2 = 2, Q3 = 4],
[Name = "E2", Q1 = 3, Q2 = 5, Q3 = 2],
[Name = "E3", Q1 = 5, Q2 = 1, Q3 = 3]
}
in
SourceList
This is a list of records just as expected, and beautiful! If we click on a cell, then we will see the content of the record.
List of records
Now, we have a list of records. All we need to do is convert the list to a table, and then expanding the record fields as table columns.
Power Query:
let
SourceList = {
[Name = "E1", Q1 = 1, Q2 = 2, Q3 = 4],
[Name = "E2", Q1 = 3, Q2 = 5, Q3 = 2],
[Name = "E3", Q1 = 5, Q2 = 1, Q3 = 3]
},
TableFromList = Table.FromList(
SourceList,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
Result = Table.ExpandRecordColumn(
TableFromList,
"Column1",
{"Name", "Q1", "Q2", "Q3"}
)
in
Result
That's it!
Table.FromList and Table.ExpandRecordColumn
Let's apply this logic to our actual query now. Instead of creating the inner loop for the category names right away, I will write the code to retrieve only the first-row values from the source data, which is the "Administrative" category row.
Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="UglyData"]}[Content],
SourceWithHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ColumnNames = Table.ColumnNames(SourceWithHeaders),
EmployeeCount = (Table.ColumnCount(Source) - 6) / 5,
CategoryNames = Table.Column(SourceWithHeaders, "Category Description"),
CategoryCount = List.Count(CategoryNames),
MergedList = List.Generate(
() => [i = 0],
(i) => [i] < EmployeeCount,
each [i = [i] + 1],
each [
Category = CategoryNames{0},
Employee = ColumnNames{[i] * 5 + 6},
Q1 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 7}),
Q2 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 8}),
Q3 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 9}),
Q4 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 10}),
Total = List.Sum({Q1, Q2, Q3, Q4})
]
),
TableFromList = Table.FromList(MergedList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
Result = Table.ExpandRecordColumn(
TableFromList,
"Column1",
{"Category", "Employee", "Q1", "Q2", "Q3", "Q4", "Total"}
)
in
Result
Perfect! We have the result table for "Administrative" category. The next step is creating an inner loop to loop through all categories.
Category | Employee | Q1 | Q2 | Q3 | Q4 | Total |
Administrative | Employee 1 | 14 | 4 | 0 | 0 | 18 |
Administrative | Employee 2 | 13 | 2 | 0 | 0 | 15 |
Administrative | Employee 3 | 13 | 2 | 0 | 0 | 15 |
Before that, let's examine how we are building the record in selector function.
Power Query:
[
Category = CategoryNames{0},
Employee = ColumnNames{[i] * 5 + 6},
Q1 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 7}),
Q2 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 8}),
Q3 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 9}),
Q4 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 10}),
Total = List.Sum({Q1, Q2, Q3, Q4})
]
Category = CategoryNames{0}
- We set the Category field by selecting the first item in the CategoryNames list.Employee = ColumnNames{[i] * 5 + 6}
- We already know how we get the employee name from the specific column in source table's header.Q1 = Record.Field(SourceWithHeaders{0}, ColumnNames{[i] * 5 + 7})
- This is something new, because we are reading from the record which is the first row of the source data. Record.Field function takes two parameters; first parameter is the source, the first record (source table row) here, and the second parameter is the name of the record field (table column header).
Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="UglyData"]}[Content],
SourceWithHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ColumnNames = Table.ColumnNames(SourceWithHeaders),
EmployeeCount = (Table.ColumnCount(Source) - 6) / 5,
CategoryNames = Table.Column(SourceWithHeaders, "Category Description"),
CategoryCount = List.Count(CategoryNames),
MergedList = List.Generate(
() => [i = 0],
each [i] < EmployeeCount,
each [i = [i] + 1],
// We are changing the selector function to return a new list instead
each List.Generate(
() => [i = [i], j = 0],
each [j] < CategoryCount,
each [i = [i], j = [j] + 1],
each [
Category = CategoryNames{[j]},
Employee = ColumnNames{[i] * 5 + 6},
Q1 = Record.Field(SourceWithHeaders{[j]}, ColumnNames{[i] * 5 + 7}),
Q2 = Record.Field(SourceWithHeaders{[j]}, ColumnNames{[i] * 5 + 8}),
Q3 = Record.Field(SourceWithHeaders{[j]}, ColumnNames{[i] * 5 + 9}),
Q4 = Record.Field(SourceWithHeaders{[j]}, ColumnNames{[i] * 5 + 10}),
Total = List.Sum({Q1, Q2, Q3, Q4})
]
)
),
TableFromList = Table.FromList(
MergedList,
Splitter.SplitByNothing(),
null,
null,
ExtraValues.Error
),
TableExpandListColumn = Table.ExpandListColumn(
TableFromList,
"Column1"
),
TableExpandRecordColumn = Table.ExpandRecordColumn(
TableExpandListColumn,
"Column1",
{"Category", "Employee", "Q1", "Q2", "Q3", "Q4", "Total"},
{"Category Description", "Employee Name", "Q1", "Q2", "Q3", "Q4", "Total"}
),
Result = Table.TransformColumnTypes(
TableExpandRecordColumn,
{{"Q1", type number}, {"Q2", type number}, {"Q3", type number}, {"Q4", type number}, {"Total", type number}}
)
in
Result
Let's look at the parameters of the inner loop.
- Initial function:
() => [i = [i], j = 0]
- This time we have two value fields that will be used in the function scope.
- i is the outer loop control value. We assign i field value of the inner loop's record by using the outer loop's record control value. We are simply carrying over the employee counter inside the inner loop. It will be still incremented by the outer loop, but we need it in the inner loop to find the current employee columns, including employee name in the header and quarterly data.
- j is the inner loop control value, and it is initially 0. It will be incremented in the next function just like it happens in the outer loop.
- This time we have two value fields that will be used in the function scope.
- Condition function:
each [j] < CategoryCount
- In the inner loop, we are using j field as the control value that will be incremented in the next function and continue looping while it is less than category count. Nothing different than the outer loop as we did for incrementing i field value.
- Next function:
each [i = [i], j = [j] + 1]
- We initiated two fields in the scope record in the inner loop and we need to pass them to the next iteration. We keep the i field value same because it is just carrying over the outer loop control value, and we increment j field value by 1.
- Selector function:
Q1 = Record.Field(SourceWithHeaders{[j]}, ColumnNames{[i] * 5 + 7})
- Nothing much is different here. We get the category and employee names. We get the corresponding employee quarterly column values. As a difference, we are just selecting the current row instead of using only "Administrator" category by using 0 index.
Category Description | Employee Name | Q1 | Q2 | Q3 | Q4 | Total |
Administrative | Employee 1 | 14 | 4 | 0 | 0 | 18 |
Holiday | Employee 1 | 0 | 16 | 8 | 32 | 56 |
PTO/LOA/Jury Duty | Employee 1 | 0 | 24 | 0 | 0 | 24 |
Project A | Employee 1 | 16.5 | 4.5 | 0 | 0 | 21 |
Project B | Employee 1 | 53 | 29.5 | 0 | 0 | 82.5 |
Project C | Employee 1 | 60.5 | 2.5 | 0 | 0 | 63 |
Administrative | Employee 2 | 13 | 2 | 0 | 0 | 15 |
Holiday | Employee 2 | 0 | 14 | 6 | 15 | 35 |
PTO/LOA/Jury Duty | Employee 2 | 0 | 21 | 0 | 0 | 21 |
Project A | Employee 2 | 12.5 | 3.5 | 0 | 0 | 16 |
Project B | Employee 2 | 41 | 24 | 0 | 0 | 65 |
Project C | Employee 2 | 40.5 | 4 | 0 | 0 | 44.5 |
Administrative | Employee 3 | 13 | 2 | 0 | 0 | 15 |
Holiday | Employee 3 | 0 | 14 | 6 | 15 | 35 |
PTO/LOA/Jury Duty | Employee 3 | 0 | 21 | 0 | 0 | 21 |
Project A | Employee 3 | 12.5 | 3.5 | 0 | 0 | 16 |
Project B | Employee 3 | 41 | 24 | 0 | 0 | 65 |
Project C | Employee 3 | 40.5 | 4 | 0 | 0 | 44.5 |
Thanks for reading! If you enjoyed this article, then please let me know.