recognise format of code in Power Query to extract

ooptennoort

Board Regular
Joined
Mar 29, 2021
Messages
67
Office Version
  1. 365
Platform
  1. Windows
>10k of rows of short sentences have within then, usually, a code. I need to extract this code, but how do I get PQ to recognise its format, which is:

230F-115-04
301F-115-05
30A-115-26
30F-1180-02
30F-118-03
30F-1252-04
130H-1252-05
30F-1252-18
30F-126-01
40G-127-01
30F-108-01
31F-299-91
30F-130-01
30F-131-06
30F-132-01

I've tried with text.before and between and after, but have never gotten just -and all of- the code. Very challenging. Help is very much appreciated (if want I want is possible at all ;) ).
 

Excel Facts

Convert text numbers to real numbers
Select a column containing text numbers. Press Alt+D E F to quickly convert text to numbers. Faster than "Convert to Number"
Hi @ooptennoort ,
Would this do the trick for you?
Book1
AB
1CommentGetCodes
2bla blabla bla 230F-115-04 blablablabla blablabla blablablabla blablabla blabla230F-115-04
3blabla blablablablablabla blablabla bla 301F-115-05 blabla bla301F-115-05
4bla bla bla bla bla bla bla blablablablablablablablablabla 30A-115-26 blablablablablablablabla blablablablablabla blabla30A-115-26
5blablablablablabla 30F-1180-02 blablablablablablablablablablablablabla30F-1180-02
6blablablablablablablablablabla 30F-118-03 blablablabla 130H-1252-0530F-118-03, 130H-1252-05
7blablablablablablablablablablablabla 30F-1252-04 blablablablablablablablablabla30F-1252-04
8blablablabla400 blablablablabla 130H-1252-05 blablabla130H-1252-05
95000 blablablablablablablablablablablablablablablablablablabla 30F-1252-18 blablablablablablablablablablablablablablablablabla30F-1252-18
10blabla 30F-126-01 blablablablablablablablabla30F-126-01
11blablablablablablablablablablablablabla 40G-127-01 blabla40G-127-01
12blablablablablablablablablablablablabla 30F-108-01 blablablabla30F-108-01
13blablabla 31F-299-91 blablablablablablablablablablablablablablablablabla31F-299-91
14blablablablablablablablablablablablablablablablablablabla 30F-130-01 blablablablablablablablablablablablablablabla30F-130-01
15blablablablablablablablablablablablablablablabla 30F-131-06 blablablablablablablablabla30F-131-06
16301F-115-05 blabla 30F-132-01 blablablablablablablablablablablabla 105301F-115-05, 30F-132-01
1730F-130-01 blablabla30F-130-01
18blablabla 30F-130-0130F-130-01
Sheet2

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Comment", type text}}),
    GetCodes = Table.AddColumn(#"Changed Type", "GetCodes", each let Words = Text.Split([Comment]," "),
Codes = List.Select(Words, each List.AllTrue({ Text.Contains(_, "-"),
Text.PositionOfAny(_, {"0".."9"} ) <> -1,
Text.PositionOfAny(_, {"A".."Z"} ) <> -1})),
Combine = Text.Combine(Codes, ", ")
in
Combine)
in
    GetCodes
Wow!! Certainly! With the one exception that more often than not a code is followed by a comma (no space). But this I can easily correct of course (with interface). I'll try Bo_Ry's suggestion too, now (before I mark best solution). Thank you!!
 
Upvote 0
Use data from GraH

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    Regex = Table.AddColumn(Source, "Regex", each  Text.Combine(Json.Document(Web.Page(
    "<meta http-equiv='X-UA-Compatible' content='IE=edge'> <script>
    document.write(JSON.stringify('"& [Comment] &"'.match(/\d+[A-Z]-\d+-\d+/g)));
    </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}),", "))
in
    Regex


Power Query Regex.xlsx
AB
1CommentRegex
2bla blabla bla 230F-115-04 blablablabla blablabla blablablabla blablabla blabla230F-115-04
3blabla blablablablablabla blablabla bla 301F-115-05 blabla bla301F-115-05
4bla bla bla bla bla bla bla blablablablablablablablablabla 30A-115-26 blablablablablablablabla blablablablablabla blabla30A-115-26
5blablablablablabla 30F-1180-02 blablablablablablablablablablablablabla30F-1180-02
6blablablablablablablablablabla 30F-118-03 blablablabla 130H-1252-0530F-118-03, 130H-1252-05
7blablablablablablablablablablablabla 30F-1252-04 blablablablablablablablablabla30F-1252-04
8blablablabla400 blablablablabla 130H-1252-05 blablabla130H-1252-05
95000 blablablablablablablablablablablablablablablablablablabla 30F-1252-18 blablablablablablablablablablablablablablablablabla30F-1252-18
10blabla 30F-126-01 blablablablablablablablabla30F-126-01
11blablablablablablablablablablablablabla 40G-127-01 blabla40G-127-01
12blablablablablablablablablablablablabla 30F-108-01 blablablabla30F-108-01
13blablabla 31F-299-91 blablablablablablablablablablablablablablablablabla31F-299-91
14blablablablablablablablablablablablablablablablablablabla 30F-130-01 blablablablablablablablablablablablablablabla30F-130-01
15blablablablablablablablablablablablablablablabla 30F-131-06 blablablablablablablablabla30F-131-06
16301F-115-05 blabla 30F-132-01 blablablablablablablablablablablabla 105301F-115-05, 30F-132-01
1730F-130-01 blablabla30F-130-01
18blablabla 30F-130-0130F-130-01
Sheet1
Wow!! This method is even better in that it allows (omits) for comma's that follow the code (no space)!! Thank you. I'll try to find time to read Imke's article next week... Though this is the best solution, it would be really best if I can understand the code ;p Thank you (both!)!! Amazing!
 
Upvote 0
The most important is this line

document.write(JSON.stringify('"& [Comment] &"'.match(/\d+[A-Z]-\d+-\d+/g)))

where \d+[A-Z]-\d+-\d+ is regex Pattern to capture string

\d+ Any number more than 1 character
[A-Z] Any from A-Z
- dash

 
Upvote 0
Wow!! Certainly! With the one exception that more often than not a code is followed by a comma (no space). But this I can easily correct of course (with interface). I'll try Bo_Ry's suggestion too, now (before I mark best solution). Thank you!!
One can add a step to replace "," by " ". That's why I asked for a sample.
But indeed the regex trick is awesome.
 
Upvote 0
Trying to use run the Regex trick, I run into a problem right away. Am I right that it is to with my worksheet not being a table. The error text is:

Expression.Error: The field 'Comment' of the record wasn't found.
Details:
Name=Sheet1
Data=

Item=Sheet1
Kind=Sheet
Hidden=FALSE

I've tried replacing bits of the Advanced Editor with the trick code but M isn't straightforward enough for a novice such as myself.
Simplifying my workbook, convert to range etc. might do the trick but... can the Regex trick be altered to be inserted mid Adv.Ed. so to speak?
 
Upvote 0
...somehow not all of the error text seems to have uploaded:

Data= should have been followed by [ Table ]
 
Upvote 0

Forum statistics

Threads
1,224,746
Messages
6,180,703
Members
452,994
Latest member
Janick

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top