Concatenate split row in table

Shortmeister1

Board Regular
Joined
Feb 19, 2008
Messages
212
Office Version
  1. 365
  2. 2016
  3. 2013
  4. 2010
  5. 2007
  6. 2003 or older
Platform
  1. Windows
Hi

I'm trying to set up and automate my bank statements which are imported from PDF tables. Unfortunately, the tables have row splits in unhelpful places.

So for example,
TypeDescription
DEBIT CARDTHE CAFE -
TRANSACTIONTHANK, YOU, ENFIELD GB
ONLINE TRANSACTIONCREDIT CARD PAYMENT ************************************************
CORRECTLY SPLIT OVER TWO ROWS

Of course it should look like this:

TypeDescription
DEBIT CARD TRANSACTIONTHE CAFE - THANK, YOU, ENFIELD GB
ONLINE TRANSACTIONCREDIT CARD PAYMENT ************************************************
CORRECTLY SPLIT OVER TWO ROWS

I'm not sure how to set up the logic to concatenate some entries but not others.

Anyone able to help?

Cheers
Martin
 

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
Hi @martinshort
I'm not quite there with an answer just yet, but I do have a question regarding the logic. Do your bank statements always have their Online Transaction descriptions begin with "CREDIT CARD PAYMENT", or a collection of standard payment types? That would be helpful to use to identify what should or should not be concatenated.
 
Upvote 0
Hi Johnny

Thanks for the reply. The full list (at present) is as follows:

ONLINE TRANSACTIONSingle Record
DEBIT CARD TRANSACTIONUsually split but I've found some examples where it isn't split.
AUTOMATED CREDITSometimes split between two rows.
ATM TRANSACTIONSingle Record

Once I've got a way of merging the two records, I'm quiteable to do the grunt work myself.

One thing that might prove useful is that the PaidIn and PaidOut columns will be null on the "duplicate" rows.

I think that solving this will vastly improve my problem solving in M.

Really appreciated if you can assist.

BR
Martin
 
Upvote 0
I don't know what you can/cannot provide due to the sensitivity of the data but I'm wondering if you would be able to take out the Column Detection OR Concatenate all of the Columns that you are provided with. So each item is, essentially in 1 row/1 column.

Then, in a new Custom Column, extract only the Transaction type with an If Text.Contains([The 1 Row/Column Colum],"Online Transaction", Comparer.OrdinalIgnoreCase) then "ONLINE TRANSACTION" and then else If until you get all of the transaction types identified. There should be no other results other than those 4 types in this new Custom Column.

Then in a new Custom Column, play with using a Text.Remove of some kind (there are a number of them) to remove the number of characters +1 of that Transaction Type custom column. This will result in only those characters that remain after removing the Transaction Type.

I hope this makes sense and gives you an idea on a path to take.
 
Upvote 0
Thank you very much for this.

I'll give it a whirl tomorrow and come crawling back if I can't get it to work. ?

Worst case, I can build a redacted example and share it, but I'll try and work through it myself first.
 
Upvote 0
Interestingly enough, I've never worked with PDF as a data source, and then I see your message and converse with you...cut to late this morning, my boss has an exciting challenge...Power Query with PDF. So, it's like you helped me get ready for it. HA
 
Upvote 0
Hi Johnny

Well it's taken a little while - not least as I had to manually type out the statement so that I could reconcile it. (**** the software test analyst inside me!!). I have finished.

You'd probably think it's a bit of code soup, but these are the steps I took:
  1. Use standard Power Query generated code to open all PDFs in folder.
  2. Promote Headers
  3. Remove extraneous rows e.g. extra header columns.
  4. Convert blank cells to nulls.
  5. Fill Down [Date] as there are loads of nulls in the date column.
  6. Remove Null Dates
  7. Clean Text
  8. Add base 1 index.
  9. Add a splitRow indicator: = Table.AddColumn(AddIndex, "SplitRow", each if [Balance] = null then 2 else 1)
  10. Add a MergeRowRequired column, which basically offsets the split row indicator so that I can accurately track the first & second rows in a record where they exist
    = Table.ExpandListColumn(Table.AddColumn(SplitRowId, "MergeRowReq", each List.Range(SplitRowId[SplitRow],[Index],1)), "MergeRowReq")
  11. List the next row's transaction type value in the "previous" row.
    = Table.ExpandListColumn(Table.AddColumn(MrgRequired, "NextRowType", each List.Range(MrgRequired[Type],[Index],1)), "NextRowType")
  12. Concatenate this value to find the final output. (Something I didn't know is that you can't concatenate a field to a null field, so I had to nest an IF statement to take care of any nulls.)
    = Table.AddColumn(NextRowType, "TypeFinal", each if [MergeRowReq] = 1 then [Type] else ([Type] & " " & (if [NextRowType] = null then "" else [NextRowType])))
  13. Repeat steps 11 & 12 for the Transaction Description.
  14. Add a whole load of steps to remove columns, clean up, set variable types.
  15. Spend several hours preparing testing and a few minutes actually running the tests.
Well it worked perfectly. Admittedly, I haven't tested it on next month's bank statement - fingers crossed that it won't cause any problems.

a) This should make recording and analysing my bank statements much easier and will help significantly if the taxman decides to audit me and my little consultancy.
b) The amount of stuff I've learned doing this little side project should help a huge amount in my day job.

So thank you hugely for your assistance, and do feel that you can comment if you think that I've made a dog's dinner of any of the steps.

Best regards
Martin

Code in full should it prove useful to anyone.
Power Query:
let
    Source = Folder.Files("D:\Shortie Documents\Household\Bank"),
    FilterPDFOnly = Table.SelectRows(Source, each ([Extension] = ".pdf")),
    FilteredHiddenFiles = Table.SelectRows(FilterPDFOnly, each [Attributes]?[Hidden]? <> true),
    InvokeCustomFunction = Table.AddColumn(FilteredHiddenFiles, "Transform File", each #"Transform File"([Content])),
    #"Renamed ColumnsName" = Table.RenameColumns(InvokeCustomFunction, {"Name", "Source.Name"}),
    RemovedColumns = Table.SelectColumns(#"Renamed ColumnsName", {"Source.Name", "Transform File"}),
    ExpandedTableColumn = Table.ExpandTableColumn(RemovedColumns, "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    FilteredChoosePDFTables = Table.SelectRows(ExpandedTableColumn, each ([Id] = "Table002" or [Id] = "Table004" or [Id] = "Table006" or [Id] = "Table008")),
    ExpandedData = Table.ExpandTableColumn(FilteredChoosePDFTables, "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6"}),
    RemovedColumns1 = Table.RemoveColumns(ExpandedData,{"Source.Name", "Id", "Name", "Kind"}),
    PromotedHeaders = Table.PromoteHeaders(RemovedColumns1, [PromoteAllScalars=true]),
    RemoveExtraneousRows = Table.SelectRows(PromotedHeaders, each ([Type] <> "BROUGHT FORWARD") and ([Date] <> "Date")),
    ConvertBlanksToNull = Table.ReplaceValue(RemoveExtraneousRows,"",null,Replacer.ReplaceValue,{"Date"}),
    FilledDownDate = Table.FillDown(ConvertBlanksToNull,{"Date"}),
    RemoveNullDate = Table.SelectRows(FilledDownDate, each ([Date] <> null)),
    CleanText = Table.TransformColumns(RemoveNullDate,{{"Type", Text.Clean, type text}}),
    CleanType = Table.ReplaceValue(CleanText,"DEBIT CARDTRANSACTION","DEBIT CARD TRANSACTION",Replacer.ReplaceText,{"Type"}),
    AddIndex = Table.AddIndexColumn(CleanType, "Index", 1, 1, Int64.Type),
    SplitRowId = Table.AddColumn(AddIndex, "SplitRow", each if [Balance] = null then 2 else 1) //Two means it's the second line of the split row
,
    MrgRequired = Table.ExpandListColumn(Table.AddColumn(SplitRowId, "MergeRowReq", each List.Range(SplitRowId[SplitRow],[Index],1)), "MergeRowReq") //Move the splitRowId up 1 line.
,
    NextRowType = Table.ExpandListColumn(Table.AddColumn(MrgRequired, "NextRowType", each List.Range(MrgRequired[Type],[Index],1)), "NextRowType"),
    TypeFinal = Table.AddColumn(NextRowType, "TypeFinal", each if [MergeRowReq] = 1 then [Type] else ([Type] & " " & (if [NextRowType] = null then "" else [NextRowType]))) //Finally concatenate the two cells.
,
    NextRowDesc = Table.ExpandListColumn(Table.AddColumn(TypeFinal, "NextRowDesc", each List.Range(MrgRequired[Description],[Index],1)), "NextRowDesc"),
    DescFinal = Table.AddColumn(NextRowDesc, "DescFinal", each if [MergeRowReq] = 1 then [Description] else [Description] & " " & [NextRowDesc]),
    RemoveNullRows = Table.SelectRows(DescFinal, each ([Balance] <> null)),
    #"Removed Columns" = Table.RemoveColumns(RemoveNullRows,{"Type", "Description", "Index", "SplitRow", "MergeRowReq", "NextRowType", "NextRowDesc"}),
    ReorderedColumns = Table.ReorderColumns(#"Removed Columns",{"Date", "TypeFinal", "DescFinal", "Paid in", "Paid out", "Balance"}),
    RenamedColumns = Table.RenameColumns(ReorderedColumns,{{"TypeFinal", "Type"}, {"DescFinal", "Description"}}),
    ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Paid out", Currency.Type}, {"Balance", Currency.Type}})
in
    ChangedType
 
Upvote 0

Forum statistics

Threads
1,225,406
Messages
6,184,801
Members
453,260
Latest member
kilpbsk

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top