Enhancing the Performace of Query for huge excel data

Chandna

New Member
Joined
Jan 16, 2015
Messages
6
Hi,
We do have 1000 excel files in a folder and each excel contains one spreadsheet with 15000 rows of data. We were reading the data from these 1000 files into a single excel sheet using POWER QUERY. For this we got a memory out of bound exception due to limited number of rows in a single excel sheet.:mad:
Our motive was,
STEP1: To read the data from all files and apply text filter on the entire data together.
STEP2: To find the count of the filtered rows (maximum will be around 2000).
There are additional functionality is to be performed, but this is currently out of scope of this query. Since, we are stuck at the initial step itself.
Alternative way,
We thought of another way wherein we thought of the following steps:
STEP1: To apply text filter on the data of each excel file individually.
STEP2: Combine the filtered data from every sheet into a single excel sheet.
STEP3: Find the number of rows.
Could you please tell me how to loop through each file in the folder and apply filter over each. Basically we want to know if the Step 1 is achievable using POWER QUERY.
If not kindly, suggest any other way to find the solution for the problem statement.:confused:
 
Last edited:

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
To less informations to answer your question but...yes PQ can do this (both your ways)
What is a structure of your excel files (sheets, table)? Can you show a part of these files?
Many more questions.
 
Upvote 0
Upvote 0
Hi Bill,
Please find below the sample input.

[TABLE="width: 500"]
<tbody>[TR]
[TD]ROW
[/TD]
[TD]SESSION
[/TD]
[/TR]
[TR]
[TD]1
[/TD]
[TD]Setting Context Data for export 805376005 : TRUE
[/TD]
[/TR]
[TR]
[TD]2
[/TD]
[TD]Setting Context Data for export 537203808 : TRUE
[/TD]
[/TR]
[TR]
[TD]3
[/TD]
[TD]PS Filter constructor
[/TD]
[/TR]
[TR]
[TD]4
[/TD]
[TD]ReplaceCardiacTagsFilter: Created new ReplaceCardiacTags Filter
[/TD]
[/TR]
[TR]
[TD]5
[/TD]
[TD]ReplaceCardiacTagsFilter: Created new ReplaceCardiacTags Filter
[/TD]
[/TR]
[TR]
[TD]6
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]7
[/TD]
[TD]Setting ContextData for MR Export converter
[/TD]
[/TR]
[TR]
[TD]8
[/TD]
[TD]Setting Context Data for export 537203795 : FALSE
[/TD]
[/TR]
[TR]
[TD]9
[/TD]
[TD]Setting Context Data for export 805376005 : TRUE
[/TD]
[/TR]
[TR]
[TD]10
[/TD]
[TD]Application: finished
[/TD]
[/TR]
[TR]
[TD]11
[/TD]
[TD]Setting Context Data for export 537203808 : TRUE
[/TD]
[/TR]
[TR]
[TD]12
[/TD]
[TD]PS Filter constructor
[/TD]
[/TR]
[TR]
[TD]13
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]14
[/TD]
[TD]$Series Completed Arrived
[/TD]
[/TR]
[TR]
[TD]15
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]16
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]17
[/TD]
[TD]Rxed study completed event
[/TD]
[/TR]
[TR]
[TD]18
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]19
[/TD]
[TD]ServletManager.InternalCleanup called, starting usage calculation
[/TD]
[/TR]
[TR]
[TD]20
[/TD]
[TD]Launched application: AVA, PID: 1172
[/TD]
[/TR]
[TR]
[TD]21
[/TD]
[TD]ServletManager.InternalCleanup called, total usage = 0, reduced = 0, pp = 0, concurrent users = 0
[/TD]
[/TR]
[TR]
[TD]22
[/TD]
[TD]New dummy servlet returned for user 8e8f76ae49f892d8. weight == 1.5
[/TD]
[/TR]
[TR]
[TD]23
[/TD]
[TD]Application: finished
[/TD]
[/TR]
[TR]
[TD]24
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]25
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]26
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]27
[/TD]
[TD]Application: finished
[/TD]
[/TR]
[TR]
[TD]28
[/TD]
[TD]$Series Completed Arrived
[/TD]
[/TR]
[TR]
[TD]29
[/TD]
[TD]Peeked at event from incoming queue
[/TD]
[/TR]
[TR]
[TD]30
[/TD]
[TD]Removed event from queue
[/TD]
[/TR]
</tbody>[/TABLE]

The input is provided in the form of excel sheet. There are 1000 such excel sheet, in 1 folder, each with 15000 rows. For e.g. sake we have provided you with a sample data of 30 rows.

The data to be filtered from all the sheets are- "Application: started" OR "Application: finished"

The first approach we implemented that showed us memory out of bound exception is as below:

let
Source = Folder.Files("F:\PickingDataFromXML\SampleData"),
WorkBookContent = Table.AddColumn(Source, "Custom", each Excel.Workbook([Content])),
ExpandWorkBookContent = Table.ExpandTableColumn(WorkBookContent , "Custom", {"Data", "Kind"}, {"Custom.Data", "Custom.Kind"}),
WorkSheetData = Table.SelectRows(ExpandWorkBookContent , each ([Custom.Kind] = "Sheet")),
HeaderRow = Table.AddColumn(WorkSheetData , "Custom", each if [Custom.Kind]<>"Table" then Table.PromoteHeaders([Custom.Data]) else [Custom.Data]),
ExpandTable = Table.ExpandTableColumn(HeaderRow , "Custom", {"Text"}, {"Custom.Text"}),
SelectedColumn = Table.SelectColumns(ExpandTable ,{"Custom.Text"}),
FilteredRows = Table.SelectRows(SelectedColumn, each Text.Contains([Custom.Text], "Application: started") or Text.Contains([Custom.Text], "Application: finished")),
AddedIndex = Table.AddIndexColumn(FilteredRows, "Index", 0, 1)
in
AddedIndex


So, we are planning to go for the alternative approach.
Alternative way,
Following are the steps:
STEP1: To apply text filter on the data of each excel file individually.
STEP2: Combine the filtered data from every sheet into a single excel sheet.
STEP3: Find the number of rows.
Could you please tell me how to loop through each file in the folder and apply filter over each. Basically we want to know if the Step 1 is achievable using POWER QUERY.
If not kindly, suggest any other way to find the solution for the problem statement.
 
Last edited:
Upvote 0
ooops...sorry....I did not read your STEP3 :-(.
So if you want count "Application: started", Application: finished" use this code below
Main query (FunctionUsing2)

let
Source = Folder.Files("C:\SampleData"),
RemOthCol = Table.SelectColumns(Source,{"Name", "Folder Path"}),
ChOrder = Table.ReorderColumns(RemOthCol,{"Folder Path", "Name"}),
MerCol = Table.CombineColumns(ChOrder,{"Folder Path", "Name"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Files"),
TblFun = Table.AddColumn(MerCol, "Function", each Fun2([Files])),
TblFromVal = Table.FromValue(List.Sum(TblFun[Function])),
ChNameCol = Table.RenameColumns(TblFromVal,{{"Value", "Start/End Count"}})
in
ChNameCol


Function for main query (Fun2)

let
myFunction = (file as text) =>
let
Source = Excel.Workbook(File.Contents(file)),
Filter1 = Table.SelectRows(Source, each [Kind] = "Sheet"),
PromHead = Table.AddColumn(Filter1, "Custom", each Table.PromoteHeaders([Data])),
RemOthCol = Table.SelectColumns(PromHead,{"Custom"}),
Count1 = Table.AddColumn(RemOthCol, "Custom.1", each Table.RowCount(Table.SelectRows([Custom], each ([SESSION]="Application: started") or ([SESSION] = "Application: finished")))),
RemCol = Table.SelectColumns(Count1,{"Custom.1"}){0}[Custom.1]
in
RemCol
in
myFunction
 
Upvote 0
Hi Bill, Where do we need to place the function Fun2. I am new to functions in Power Query. Please guide.
 
Upvote 0
Hi Chandna :-)

Excel window: Power Query -> From Other Sources -> Blank query.
Power Query window: View -> Advanced Editor
Advanced Editor window: Remove all lines then paste Fun2 code and click "Ok"
Power Query window: Change the name of the query to "Fun2".
Repeat steps above for "FunctionUsing2"

Hope this help.
 
Upvote 0
Can we take the data from the current worksheet and display the desired result into another sheet.
e.g. Data in Sheet1(current worksheet):
[TABLE="width: 500"]
<tbody>[TR]
[TD]ROW
[/TD]
[TD]SESSION
[/TD]
[/TR]
[TR]
[TD]1
[/TD]
[TD]Setting Context Data for export 805376005 : TRUE
[/TD]
[/TR]
[TR]
[TD]2
[/TD]
[TD]Setting Context Data for export 537203808 : TRUE
[/TD]
[/TR]
[TR]
[TD]3
[/TD]
[TD]PS Filter constructor
[/TD]
[/TR]
[TR]
[TD]4
[/TD]
[TD]ReplaceCardiacTagsFilter: Created new ReplaceCardiacTags Filter
[/TD]
[/TR]
[TR]
[TD]5
[/TD]
[TD]ReplaceCardiacTagsFilter: Created new ReplaceCardiacTags Filter
[/TD]
[/TR]
[TR]
[TD]6
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]7
[/TD]
[TD]Setting ContextData for MR Export converter
[/TD]
[/TR]
[TR]
[TD]8
[/TD]
[TD]Setting Context Data for export 537203795 : FALSE
[/TD]
[/TR]
[TR]
[TD]9
[/TD]
[TD]Setting Context Data for export 805376005 : TRUE
[/TD]
[/TR]
[TR]
[TD]10
[/TD]
[TD]Application: finished
[/TD]
[/TR]
[TR]
[TD]11
[/TD]
[TD]Setting Context Data for export 537203808 : TRUE
[/TD]
[/TR]
[TR]
[TD]12
[/TD]
[TD]PS Filter constructor
[/TD]
[/TR]
[TR]
[TD]13
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]14
[/TD]
[TD]$Series Completed Arrived
[/TD]
[/TR]
[TR]
[TD]15
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]16
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]17
[/TD]
[TD]Rxed study completed event
[/TD]
[/TR]
[TR]
[TD]18
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]19
[/TD]
[TD]ServletManager.InternalCleanup called, starting usage calculation
[/TD]
[/TR]
[TR]
[TD]20
[/TD]
[TD]Launched application: AVA, PID: 1172
[/TD]
[/TR]
[TR]
[TD]21
[/TD]
[TD]ServletManager.InternalCleanup called, total usage = 0, reduced = 0, pp = 0, concurrent users = 0
[/TD]
[/TR]
[TR]
[TD]22
[/TD]
[TD]New dummy servlet returned for user 8e8f76ae49f892d8. weight == 1.5
[/TD]
[/TR]
[TR]
[TD]23
[/TD]
[TD]Application: finished
[/TD]
[/TR]
[TR]
[TD]24
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]25
[/TD]
[TD]Loading the users database
[/TD]
[/TR]
[TR]
[TD]26
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]27
[/TD]
[TD]Application: finished
[/TD]
[/TR]
[TR]
[TD]28
[/TD]
[TD]$Series Completed Arrived
[/TD]
[/TR]
[TR]
[TD]29
[/TD]
[TD]Peeked at event from incoming queue
[/TD]
[/TR]
[TR]
[TD]30
[/TD]
[TD]Removed event from queue
[/TD]
[/TR]
</tbody>[/TABLE]

Power Query applied on data of Current Sheet (Sheet 1)
let
Source = Excel.CurrentWorkbook(){[Name="FilteredTable"]}[Content],
IndexColumn = Table.AddIndexColumn(Source, "Index", 0, 1),
FilteredRows = Table.SelectRows(IndexColumn , each (Text.Contains([Text], "Application: started"))),
in
FilteredRows

OUTPUT : write the result in Sheet2 (a new sheet).
[TABLE="width: 500"]
<tbody>[TR]
[TD]ROW
[/TD]
[TD]SESSION
[/TD]
[/TR]
[TR]
[TD]1
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]2
[/TD]
[TD]Application: started
[/TD]
[/TR]
[TR]
[TD]3
[/TD]
[TD]Application: started
[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
You have to select your result table and cut it, then you can paste it to sheet2 and refresh (right mouse click or from workbook queries window).
Is your table has two columns (ROW, SESSION) or one (SESSION)? This is not clear to me.
 
Upvote 0

Forum statistics

Threads
1,224,060
Messages
6,176,145
Members
452,707
Latest member
laplajewelry

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top