Adding Columns for section title in a PDF

DRSteele

Well-known Member
Joined
Mar 31, 2015
Messages
2,654
Office Version
  1. 365
Platform
  1. Windows
Can someone please help me?

I used PQ to import a PDF (this one). I want to create two new columns, one for all the broad section headings (e.g., CANADIAN STOCKS) and another for the all sub-headings (e.g., Broad Market). Like this example I manually manipulated in Excel:

1603399597218.png


How can I do that in PQ?
 
Last edited:

Excel Facts

Can Excel fill bagel flavors?
You can teach Excel a new custom list. Type the list in cells, File, Options, Advanced, Edit Custom Lists, Import, OK
maybe use Conditional Columns with Contain from Fund Name then FillUp / FillDown (what will be easier for you) and after all filter null from second column
this is for CANADIAN STOCKS and Broad Market only. You need define more for each Conditional Column

Power Query:
let
    Source = Pdf.Tables(File.Contents("path_to\performance-can-en-ca.pdf"), [Implementation="1.1"]),
    Table001 = Source{[Id="Table001"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Table001, [PromoteAllScalars=true]),
    #"Added Conditional Column" = Table.AddColumn(#"Promoted Headers", "CANADIAN STOCKS", each if Text.Contains([Fund Name], "CANADIAN") then [Fund Name] else null),
    #"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "Broad Market", each if Text.Contains([Fund Name], "Broad Market") then [Fund Name] else null)
in
    #"Added Conditional Column1"
 
Last edited:
Upvote 0
some should be contains but some equals. Each mistake can be easy fixed
Power Query:
let
    Source = Pdf.Tables(File.Contents("C:\Users\shade\Desktop\performance-can-en-ca.pdf"), [Implementation="1.1"]),
    Table001 = Source{[Id="Table001"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Table001, [PromoteAllScalars=true]),
    #"Added Conditional Column" = Table.AddColumn(#"Promoted Headers", "CANADIAN STOCKS", each if Text.Contains([Fund Name], "CANADIAN") then [Fund Name] else if Text.Contains([Fund Name], "U.S. STOCKS") then [Fund Name] else null),
    #"Added Conditional Column1" = Table.AddColumn(#"Added Conditional Column", "Broad Market", each if Text.Contains([Fund Name], "Broad Market") then [Fund Name] else if [Fund Name] = "Dividend" then [Fund Name] else if Text.Contains([Fund Name], "Factor") then [Fund Name] else if [Fund Name] = "ESG" then [Fund Name] else if Text.Contains([Fund Name], "Sectors") then [Fund Name] else null)
in
    #"Added Conditional Column1"
 
Upvote 0
Sandy, thanks for your help.

That code does not really work. Even if it did, it requires user input. Next time I download that PDF and try to parse it the headings may have changed entirely - there might be new ones, altered ones, deleted ones.

What I need is a more general answer so that I can apply it to other PDFs. The PDF I used as an example has two annoying section headings (one broad one and the other a sub-heading) that span all the columns, which might be handy given that all the other columns in those rows are blanks.

I really hope this can be done.
 
Upvote 0
result of the code above (small part)

CANADIAN STOCKSBroad MarketFund NameTicker
Symbol
CANADIAN STOCKSnullCANADIAN STOCKSnull
nullBroad MarketBroad Marketnull
nullnulliShares Core S&P/TSX Capped Composite Index ETFXIC
nullnulliShares S&P/TSX 60 Index ETFXIU
nullnulliShares S&P/TSX Completion Index ETFXMD
nullnulliShares S&P/TSX SmallCap Index ETFXCS
nullDividendDividendnull
nullnulliShares Core MSCI Canadian Quality Dividend Index ETFXDIV
nullnulliShares S&P/TSX Composite High Dividend Index ETFXEI
nullnulliShares Canadian Select Dividend Index ETFXDV
nullnulliShares S&P/TSX Canadian Dividend Aristocrats Index ETFCDZ
nullFactorFactornull
nullnulliShares MSCI Min Vol Canada Index ETFXMV
nullnulliShares MSCI Multifactor Canada Index ETFXFC
nullnulliShares Canadian Growth Index ETFXCG
nullnulliShares Canadian Value Index ETFXCV
nullnulliShares Canadian Fundamental Index ETFCRQ
nullESGESGnull
nullnulliShares ESG Advanced MSCI Canada Index ETFXCSR
nullnulliShares ESG Aware MSCI Canada Index ETFXESG
nullnulliShares Jantzi Social Index ETFXEN
nullSectorsSectorsnull
nullnulliShares Equal Weight Banc & Lifeco ETFCEW
nullnulliShares S&P/TSX Capped Consumer Staples Index ETFXST
nullnulliShares S&P/TSX Capped Energy Index ETFXEG
nullnulliShares S&P/TSX Capped Financials Index ETFXFN
nullnulliShares S&P/TSX Capped Information Technology Index ETFXIT
nullnulliShares S&P/TSX Capped Materials Index ETFXMA
nullnulliShares S&P/TSX Capped REIT Index ETFXRE
nullnulliShares S&P/TSX Capped Utilities Index ETFXUT
U.S. STOCKSnullU.S. STOCKSnull
nullBroad MarketBroad Marketnull


did you "import" this pdf?
 
Upvote 0
That does not match the sample I provided. Each row needs a Type and a Style and the original headings need to be removed.
 
Upvote 0
it doesn't matter, you can define headers as you want in Conditional Column, btw you didn't read post#2
cc.png

how well do you know Power Query?

maybe it will help
 
Last edited:
Upvote 0
It does matter. Each fund has to have two new columns attached, one that says CANADIAN STOCKS, for example, and one that says DIVIDEND, for example. And the rows those field cam from must be deleted.
 
Upvote 0
eg. you need 2 new columns: Type and Style so name Conditional Columns suitably (post#7), define all things what you need in these columns then fill these columns Down then use filter for null on Ticker column. If in Type and Style will be any null value use FillUp
so... it doesn't matter ;)
I gave you an example only (working for me)
 
Upvote 0
I don't want every single record in the new Type column to say CANADIAN STOCKS. The heading changes to U.S.Stocks, then to INTERNATIONAL STOCKS, then to ... well, could be anything. The point is that it changes. The same goes for the new Style columns. I can't just fill on down. It has to fill down a word a correct number of times and then fill down a different word.
 
Upvote 0

Forum statistics

Threads
1,223,761
Messages
6,174,342
Members
452,555
Latest member
colc007

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top