Extract all number of 6 digits in a string

Heremion

New Member
Joined
Nov 29, 2023
Messages
9
Office Version
  1. 365
Platform
  1. Windows
Hello everybody!

I'm a beginner in PowerQuery and I try to extract all numbers of 6 digits in a string for each of my records.

For example, following string I have :
497887;443445;#789788,112456,test;999877;john 788487

From this one, I'd like to extract the following value : 497887;443445;#789788,112456,;999877 788487

I'd like to include all separators (space, comma, etc...)

Could you help me please :(

Thanks a lot!
 

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
This probably isn't the easiest way, but assuming your column is called Column1, you could do:

Power Query:
Text.Remove(Text.Lower([Column1]),{"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"})
 
Upvote 1
@ExcelToDAX - I think it is the best way to keep the existing delimiters and symbols.
Perhaps a little shortcut for the letter list in your code.
Power Query:
Text.Remove(Text.Lower([Column1]),{"a".."z"})
 
Upvote 1
Solution
@ExcelToDAX - I think it is the best way to keep the existing delimiters and symbols.
Perhaps a little shortcut for the letter list in your code.
Power Query:
Text.Remove(Text.Lower([Column1]),{"a".."z"})
beautiful!! I knew there was some kind of shortcut :)
 
Upvote 0
Thanks for your answers! :) I completed your code with some others caracters to remove and its works fine!

Thanks a lot!
 
Upvote 0
Thanks for your answers! :) I completed your code with some others caracters to remove and its works fine!

Thanks a lot!
Good to hear it works @Heremion.
Can you tell us how you added other characters to remove them? I feel like the initial answer from @ExcelToDAX helped you better.

beautiful!! I knew there was some kind of shortcut :)
@ExcelToDAX - Actually, there is a recent trick about using Regex in PowerQuery that I have learned from @lrobbo314 here. Regex is really easy way of doing this. However, when I realized the delimiters and special chars should stay, I didn't post it. Still, as a reference, here is that solution for future readers.

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AddColumn = Table.TransformColumns(Source, 
        {"TextColumn", 
            (txt as text) =>
                let 
                    Matches = Web.Page("<script>document.write('" & txt & "'.match(/\d{6}/g))</script>"),
                    Numbers =  Matches{0}[Data]{0}[Children]{1}[Children]{0}[Text]
                            
                in
                    if Numbers = "null" then null else Numbers 
        }
    )
in
    AddColumn
Source and Result:
TextColumnTextColumn
497887;443445;#789788,112456,test;999877;john 788487497887,443445,789788,112456,999877,788487
497881;443445;#789788,112456,test;999877;john 788484497881,443445,789788,112456,999877,788484
497882;443445;#789788,112456,test;999877;john 788483497882,443445,789788,112456,999877,788483
49788;44345;#78988,11256,test;9877;john 78883
 
Upvote 0
A little modification to the Regular Expression as well as some Javascript to join the values in the array with a blank delimiter.

Book2
ABC
1TextColumnTextColumn
2497887;443445;#789788,112456,test;999877;john 788487497887;443445;789788,112456,;999877;788487
3497881;443445;#789788,112456,test;999877;john 788484497881;443445;789788,112456,;999877;788484
4497882;443445;#789788,112456,test;999877;john 788483497882;443445;789788,112456,;999877;788483
549788;44345;#78988,11256,test;9877;john 78883
Sheet1


Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AddColumn = Table.TransformColumns(Source,
        {"TextColumn",
            (txt as text) =>
                let
                    Matches = Web.Page("<script>document.write('" & txt & "'.match(/(\d{6}|;|,)/g).join(''))</script>"),
                    Numbers =  Matches{0}[Data]{0}[Children]{1}[Children]{0}[Text]
                           
                in
                    if List.ContainsAny(Text.ToList(Numbers),{"0".."9"}) then Numbers else null
        }
    )
in
    AddColumn
 
Upvote 0
Nice touch, @lrobbo314!

Maybe a small Regex pattern change to avoid adjacent delimiters when the next item is not a 6-digits number:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    AddColumn = Table.TransformColumns(Source,
        {"TextColumn",
            (txt as text) =>
                let
                    Matches = Web.Page("<script>document.write('" & txt & "'.match(/(\d{6}[;,]?)/g).join(''))</script>"),
                    Found = Matches{0}[Data]{0},
                    Numbers = if Table.RowCount(Found[Children]) = 1 then "" else Found[Children]{1}[Children]{0}[Text]
                in
                    if List.ContainsAny(Text.ToList(Numbers),{"0".."9"}) then Numbers else null
        }
    )
in
    AddColumn

TextColumnTextColumn
497887;443445;#789788,112456,test;999877;john 788487497887;443445;789788,112456,999877;788487
497881;443445;#789788,112456,test;999877;john 788484497881;443445;789788,112456,999877;788484
497882;443445;#789788,112456,test;999877;john 788483497882;443445;789788,112456,999877;788483
49788;44345;#78988,11256,test;9877;john 78883

(I like the way you are using the List.ContainsAny() function by the way.)
 
Upvote 0
Thank you, sir. I dig the changes you made to the Regex.

Just because this was kind of fun, I did it with Python in Power BI as well.

Power Query:
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMrE0t7AwtzYxMTYxMbVWNrcA8XUMDY1MTM10SlKLS6wtLS0tzM2ts/Iz8hSAciYW5kqxOlCNhiRpNEFoNCJJozFCI1gfRBtEF1wTshaQjlgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TextColumn = _t]),
    RunPython = Python.Execute("import pandas as pd#(lf)import re#(lf)dataset[""TextColumn""] = dataset['TextColumn'].apply(lambda x: ''.join(re.findall('(\d{6}[;,]?)',x))).apply(lambda x: x if bool(re.search(r'\d',x)) else None)",[dataset=Source])
in
    RunPython
 
Upvote 0

Forum statistics

Threads
1,223,880
Messages
6,175,155
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top