Getting data in correct column from pdf file

MasterBash

Board Regular
Joined
Jan 22, 2022
Messages
70
Office Version
  1. 365
Platform
  1. Windows
Hello,

I am trying to open a commercial invoice in pdf format. Half the data is useless to me, so I am trying to get only the data I need. For obvious reasons, I can't upload the pdf file but I will try to give examples.

The data is both page and table. Here is a screenshot :

source.png


So under data, I select all 25 columns to try to re-arrange the data in the same column. Under Kind, I selected Table, because the data I need is under Table.

Let me give an example to what it looks like :

NullRandom WordAccountNumberRandomAddressRandomAddressnullnulll
NullnullnullRandomNumberRandomWordNullNull
Client NameRandomAddressStreetNumberRandomNumberNullRandomWordNull
AK467987 558734 ProductNameProductWeightnullProductPricenullnull
Random AddressnullAccountNumbernullnullnullnull
WI26464773Y1754988922 ProductNamenullProductWeightnullProductPrice
IL4681167941nullnull889846547 ProductNameProductWeightProductPrice
FL467864312 48858224ProductNameProductWeightProductPriceNullNull
NullRandom wordNullNullNullNullNull
Random wordRandom wordnullRandom wordnullnullnull
KS86671998 77845224 ProductNamenullProductWeightnullnullProductPrice
KS843667132null6483138 ProductNamenullProductWeightProductPrice

A giant mess.

What it should look like :

US StateProductNameProductWeightQuantityProductPrice

ProductName may contain numbers, but they are at most 3 digits long like : 54" or Name_347 or B_019, 160". ProductName may also contain spaces.

What I am currently doing :
Remove empty from column1 and add a formula to filter only rows with 2 characters.
Next, I merged column 2,3,4,5, used trim. Split with space as delimiter (left). Deleted column 2. Used trim again, split again with space (left). Deleted column.

So I was able to get rid of the 2 series of numbers and letters, now it looks like this :

AKProductName ProductWeight ProductPrice
WIProductName ProductWeightProductPrice
ILProductName ProductWeightProductPrice
FLProductName ProductWeight ProductPrice
KSProductName ProductWeightProductPrice
KSProductName ProductWeightProductPrice

How can I split those correctly ? I can't use split with a space as delimiter or a number as a delimiter, because the product name may contain a space, a number, " or _

An example :
This_is a 120" product 6.375 KG 0.475 144.78
This is a_product 8.000 NO 30 900.37

Product nameProduct WeightProduct QuantityProduct Price
This_is a 120" product6.375 KG0.475 L144.78
This is a_product8.000 NO30 PC900.37

There are more useless information I will be removing eventually, such as KG, NO, L, PC, and I have other columns with useless information I wish to also remove, but I would like to learn how to split the columns correctly first. Especially between Product name and the rest.

I am stuck at this step. If anyone can help, I appreciate it.

Thank you.
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Found a solution. Instead of merging only 4 columns, I merged them all except for the first one. After splitting twice from the left using a space as delimiter, I just started splitting from the right with a space as delimiter
 
Upvote 0
Solution

Forum statistics

Threads
1,224,818
Messages
6,181,152
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top