PQ to extract only dates or numbers per row

brawnystaff

Board Regular
Joined
Aug 9, 2012
Messages
109
Office Version
  1. 365
I OCR'ed a PDF containing a printout from ledger data and put it into an Excel worksheet. However, the OCR process puts the data in differnet columns depending on the page, as there was alot of text in it that I do not need.

The PDF ledger contained about 10 columns, and the data I want is either a date or number (sometimes with a decimal, sometimes not) in their respective cell. Other cells that contain either words or words with letters I do not need.

Is there a way in Power Query to extract out only cells with dates or numbers for each row? The PQ editor is showing 18 columns, as the "junk text" is inner-mixed in each row depending on the page. Looking to only extract cells with dates or numbers to only get the 10 columns. Any ideas? Thanks.
 

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
I am not sure what are the other options to do this, but this is my attempt:

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    // Get the column names as a list
    Fields = Table.ColumnNames(Source),
    // Transform operation for the column transformation
    // Basically checking each field against its value type 
    Transformer = List.Transform(Fields, each {_, each if Value.Is(_, type date) or Value.Is(_, type datetime) or Value.Is(_, type number) then _ else null }),
    // Apply the transformation to mark invalid values as null 
    MarkValids = Table.TransformColumns(Source, Transformer),
    // Filter the columns to pick values other than null
    // Assumption is each row has the same number of columns with valid values 
    Filter = Table.TransformRows(MarkValids, each 
        List.Accumulate({0..Record.FieldCount(_) - 1},
                {},
                (state, current) => let 
                                        Field = Record.FieldNames(_){current},
                                        This = Record.SelectFields(_, Field),
                                        Result = if Record.Field(_, Field) is null then state else state & Record.ToList(This) 
                                    in
                                        Result
        )
    ),
    // Create a list of filtered records
    // Generating dynamic field/column names 
    List = List.Transform(Filter, each Record.FromList(_, List.Transform({1..List.Count(_)}, each "field" & Number.ToText(_)))),
    // Create a table contains the filtered records
    CreateTable = Table.FromList(List, Splitter.SplitByNothing(), {"Column"}, null, ExtraValues.Error),
    // Expand the records in table rows
    // Again, using the dynamically created field names
    Result = Table.ExpandRecordColumn(CreateTable, "Column", List.Transform({1..Record.FieldCount(CreateTable{0}[Column])}, each "field" & Number.ToText(_)))
in
    Result

I have this sample table:
field1field2field3field4field5field6field7
1/1/2012c3asdasd3asdasd
d1/2/20125asdfsdewr2
1/3/2012sff613werasdasd
gsdasd1/4/2012sdd7cxcz55
1/5/2012ertsdfzxc54ads
wre1/6/2012dfgsdfdfg799
1/7/2012fggfdfgada88asdasd
rtdfg1/8/2012sdfsfd22


And the code generates the following output.

1689639286084.png
 
Upvote 0

Forum statistics

Threads
1,223,948
Messages
6,175,565
Members
452,652
Latest member
eduedu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top