MasterBash
Board Regular
- Joined
- Jan 22, 2022
- Messages
- 63
- Office Version
- 365
- Platform
- Windows
Hello,
I am trying to open a commercial invoice in pdf format. Half the data is useless to me, so I am trying to get only the data I need. For obvious reasons, I can't upload the pdf file but I will try to give examples.
The data is both page and table. Here is a screenshot :
So under data, I select all 25 columns to try to re-arrange the data in the same column. Under Kind, I selected Table, because the data I need is under Table.
Let me give an example to what it looks like :
A giant mess.
What it should look like :
ProductName may contain numbers, but they are at most 3 digits long like : 54" or Name_347 or B_019, 160". ProductName may also contain spaces.
What I am currently doing :
Remove empty from column1 and add a formula to filter only rows with 2 characters.
Next, I merged column 2,3,4,5, used trim. Split with space as delimiter (left). Deleted column 2. Used trim again, split again with space (left). Deleted column.
So I was able to get rid of the 2 series of numbers and letters, now it looks like this :
How can I split those correctly ? I can't use split with a space as delimiter or a number as a delimiter, because the product name may contain a space, a number, " or _
An example :
This_is a 120" product 6.375 KG 0.475 144.78
This is a_product 8.000 NO 30 900.37
There are more useless information I will be removing eventually, such as KG, NO, L, PC, and I have other columns with useless information I wish to also remove, but I would like to learn how to split the columns correctly first. Especially between Product name and the rest.
I am stuck at this step. If anyone can help, I appreciate it.
Thank you.
I am trying to open a commercial invoice in pdf format. Half the data is useless to me, so I am trying to get only the data I need. For obvious reasons, I can't upload the pdf file but I will try to give examples.
The data is both page and table. Here is a screenshot :
So under data, I select all 25 columns to try to re-arrange the data in the same column. Under Kind, I selected Table, because the data I need is under Table.
Let me give an example to what it looks like :
Null | Random Word | AccountNumber | RandomAddress | RandomAddress | null | nulll |
Null | null | null | RandomNumber | RandomWord | Null | Null |
Client Name | RandomAddress | StreetNumber | RandomNumber | Null | RandomWord | Null |
AK | 467987 558734 ProductName | ProductWeight | null | ProductPrice | null | null |
Random Address | null | AccountNumber | null | null | null | null |
WI | 26464773Y1 | 754988922 ProductName | null | ProductWeight | null | ProductPrice |
IL | 4681167941 | null | null | 889846547 ProductName | ProductWeight | ProductPrice |
FL | 467864312 48858224 | ProductName | ProductWeight | ProductPrice | Null | Null |
Null | Random word | Null | Null | Null | Null | Null |
Random word | Random word | null | Random word | null | null | null |
KS | 86671998 77845224 ProductName | null | ProductWeight | null | null | ProductPrice |
KS | 843667132 | null | 6483138 ProductName | null | ProductWeight | ProductPrice |
A giant mess.
What it should look like :
US State | ProductName | ProductWeight | Quantity | ProductPrice |
ProductName may contain numbers, but they are at most 3 digits long like : 54" or Name_347 or B_019, 160". ProductName may also contain spaces.
What I am currently doing :
Remove empty from column1 and add a formula to filter only rows with 2 characters.
Next, I merged column 2,3,4,5, used trim. Split with space as delimiter (left). Deleted column 2. Used trim again, split again with space (left). Deleted column.
So I was able to get rid of the 2 series of numbers and letters, now it looks like this :
AK | ProductName ProductWeight ProductPrice | |||
WI | ProductName ProductWeight | ProductPrice | ||
IL | ProductName ProductWeight | ProductPrice | ||
FL | ProductName ProductWeight ProductPrice | |||
KS | ProductName ProductWeight | ProductPrice | ||
KS | ProductName ProductWeight | ProductPrice |
How can I split those correctly ? I can't use split with a space as delimiter or a number as a delimiter, because the product name may contain a space, a number, " or _
An example :
This_is a 120" product 6.375 KG 0.475 144.78
This is a_product 8.000 NO 30 900.37
Product name | Product Weight | Product Quantity | Product Price |
This_is a 120" product | 6.375 KG | 0.475 L | 144.78 |
This is a_product | 8.000 NO | 30 PC | 900.37 |
There are more useless information I will be removing eventually, such as KG, NO, L, PC, and I have other columns with useless information I wish to also remove, but I would like to learn how to split the columns correctly first. Especially between Product name and the rest.
I am stuck at this step. If anyone can help, I appreciate it.
Thank you.