Extract numerical data from a specific type of substrings

vladimiratanasiu

Active Member
Joined
Dec 17, 2010
Messages
347
Office Version
  1. 365
  2. 2021
Platform
  1. Windows
Hello!

I have a very large database of products. Each product is defined by a string, that includes information about its name (e.g. METRO CHEF MURATURI ASORTATE, MIRINDA STRUGURI SI PEPENE GALBEN etc. – see the red colored texts) and quantity. The quantity of the contents is defined by specific units of measure (UM - see black colored texts), depending on the type of product: weight – (kilo)grams – KG /G; volume – (mili)litre – ML /L; length – centimeters – CM; number of pieces / set - BUC. Some products are not allocated any UM (e.g. ORNAMENT MOS CRACIUN BARBA, CIOCAN LEMN PT CARNE). Most of products have the UM substrings data placed at the end of the string, but in some cases they are inserted in the middle of it (e.g. SOS USTUROI 420G UNIVER, CANA OPAL 250ML PARADISE). No name has the UM at the beginning of the string. I mention that all UM substrings have a maximum number of 4 digits, and are separated from the rest of string’s elements by space(s). Rarely, some values are expressed as decimal numbers with dots (e.g. MIRINDA STRUGURI SI PEPENE GALBEN 0.5L, ZAREA DACIC CIDRU AFINE 0.275L). I need to extract in another column the numerical data of UM substrings, defined on the basis of the above mentioned UMs. I wonder if the substrings could be defined as wildcards, built on the basis of UMs standard names. If the UM substring misses from a string, I would like result be blank or 0 in the destination cell.

Thank you!

P.S. I attached a screenshot too of the table , as the XL2BB couldn't import the original table formatted completly.

Book1.xlsx
AB
1Initial dataDesired results
2ORNAMENT MOS CRACIUN BARBA
3METRO CHEF MURATURI ASORTATE 1600G1600
4MIRINDA STRUGURI SI PEPENE GALBEN 0.5L0,5
5MIRINDA STRUGURI/PEPENE GALBEN 2L2
6IAURT CAPRA AMILACT 340ML340
7COSMIN MUSLI 30% FRUCTE 250G250
8DR.O PRAF TORT GELLE FRUCTE PADURE 8G8
9CHIO POMBAR PIZZA 40G40
10GEL MAINI AMANTE 65ML65
11FORMA TEFLON CHEC 25CM ZENKER25
12TAVA CUPTOR OTI 38X26CM38X26
13MONODOZE CAFEA CAPSULE BAROCCO 50BUCATI50
14CIOCAN LEMN PT CARNE
15APARATE RAS GILLETTE VENUS3 3BUCATI3
16ZAREA DACIC CIDRU AFINE 0.275L0.275
17GIANA ULEI TURTE MASLINE 1000ML1000
18SOS USTUROI 420G UNIVER420
19CANA OPAL 250ML PARADISE250
Sheet1
 

Attachments

  • Untitled.png
    Untitled.png
    18.1 KB · Views: 12
Last edited:

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Maybe something like this.

VBA Code:
Sub ExtractNumbers()
    Dim WS As Worksheet
    Dim CellRange As Range
    Dim R As Range
    Dim I As Long
    Dim NS As String
    Dim SA As Variant

    Set WS = ActiveSheet

    With WS
        Set CellRange = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With

    For Each R In CellRange
        SA = Split(Application.Trim(Replace(R.Value, Chr(9), " ")))
        For I = UBound(SA) To 0 Step -1
            NS = OnlyNumbers(CStr(SA(I)))
            If NS <> "" Then
                R.Offset(0, 1).Value = NS
                If InStr(SA(I), ".") > 0 Or InStr(SA(I), ",") > 0 Or InStr(SA(I), "X") > 0 Then
                    NS = SA(I)
                    Do While Not IsNumeric(Right(NS, 1))
                        NS = Left(NS, Len(NS) - 1)
                    Loop
                    R.Offset(0, 1).Value = NS
                End If
                Exit For
            Else
                R.Offset(0, 1).Value = ""
            End If
        Next I
    Next R
End Sub


Function OnlyNumbers(S As String) As String
    Dim RX As Object

    Set RX = CreateObject("VBScript.RegExp")
    
    RX.Global = True
    RX.Pattern = "[^0-9]"
    
    OnlyNumbers = RX.Replace(S, "")
End Function
 
Upvote 0
Maybe something like this.

VBA Code:
Sub ExtractNumbers()
    Dim WS As Worksheet
    Dim CellRange As Range
    Dim R As Range
    Dim I As Long
    Dim NS As String
    Dim SA As Variant

    Set WS = ActiveSheet

    With WS
        Set CellRange = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With

    For Each R In CellRange
        SA = Split(Application.Trim(Replace(R.Value, Chr(9), " ")))
        For I = UBound(SA) To 0 Step -1
            NS = OnlyNumbers(CStr(SA(I)))
            If NS <> "" Then
                R.Offset(0, 1).Value = NS
                If InStr(SA(I), ".") > 0 Or InStr(SA(I), ",") > 0 Or InStr(SA(I), "X") > 0 Then
                    NS = SA(I)
                    Do While Not IsNumeric(Right(NS, 1))
                        NS = Left(NS, Len(NS) - 1)
                    Loop
                    R.Offset(0, 1).Value = NS
                End If
                Exit For
            Else
                R.Offset(0, 1).Value = ""
            End If
        Next I
    Next R
End Sub


Function OnlyNumbers(S As String) As String
    Dim RX As Object

    Set RX = CreateObject("VBScript.RegExp")
  
    RX.Global = True
    RX.Pattern = "[^0-9]"
  
    OnlyNumbers = RX.Replace(S, "")
End Function

Thank you

Maybe something like this.

VBA Code:
Sub ExtractNumbers()
    Dim WS As Worksheet
    Dim CellRange As Range
    Dim R As Range
    Dim I As Long
    Dim NS As String
    Dim SA As Variant

    Set WS = ActiveSheet

    With WS
        Set CellRange = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With

    For Each R In CellRange
        SA = Split(Application.Trim(Replace(R.Value, Chr(9), " ")))
        For I = UBound(SA) To 0 Step -1
            NS = OnlyNumbers(CStr(SA(I)))
            If NS <> "" Then
                R.Offset(0, 1).Value = NS
                If InStr(SA(I), ".") > 0 Or InStr(SA(I), ",") > 0 Or InStr(SA(I), "X") > 0 Then
                    NS = SA(I)
                    Do While Not IsNumeric(Right(NS, 1))
                        NS = Left(NS, Len(NS) - 1)
                    Loop
                    R.Offset(0, 1).Value = NS
                End If
                Exit For
            Else
                R.Offset(0, 1).Value = ""
            End If
        Next I
    Next R
End Sub


Function OnlyNumbers(S As String) As String
    Dim RX As Object

    Set RX = CreateObject("VBScript.RegExp")
   
    RX.Global = True
    RX.Pattern = "[^0-9]"
   
    OnlyNumbers = RX.Replace(S, "")
End Function
Thank you very much, Rlv01! Your macro works perfectly in my previous example. Considering my real database, I think they are still necessary some changes within the macro and I don't know how to make them. As it's seen, in my real table the source of products data is placed in column I, starting from cell I7 to about I15000. Similarly, the target / destination range of numerical values extracted is column P, keeping the same structure. At the same time, the column Q requires data about UMs name / acronym. In this sense, I ask you politely if is any way to extract in every cell of column Q, similarly to the column P, the UM name from corresponding substrings of column I (see the yellow coloures cells).

Thank you!

Book1
AIPQ
1MAKTXBRGEWGEWEI
2Date de baza 1Date de baza 1Date de baza 1
3Descriere material lb romanaGreutate brutăUnitate de greutate
4CHARQUANUNIT
540173
6denumirea produsului in 40 caracteregreutatea bruta
71LEGUME-FRUCTE PRET REDUS 0.4
82ARDEI IUTE GALBEN KG
93BOROMIR CHEC MARMORAT CIOCOLATA/ALUNE 400G400G
104FOX SALAM RUSTIC GASTRO KGKG
117JUCARIE MASINA FARMER TRUCK
1211JUCARIE SET ANIMALE
1312ORNAMENT MOS CRACIUN BARBA
1413METRO CHEF MURATURI ASORTATE 1600G1600G
1514MIRINDA STRUGURI SI PEPENE GALBEN 0.5L500ML
1615MIRINDA STRUGURI/PEPENE GALBEN 2L2L
1716LIPTON ICE TEA GREEN WHITE PIERSICA 1.5L1.5L
1817IAURT CAPRA AMILACT 340ML340ML
Sheet1
 
Upvote 0
VBA Code:
Sub ExtractNumbers2()
    Dim WS As Worksheet
    Dim CellRange As Range
    Dim R As Range
    Dim I As Long
    Dim NS As String, S As String
    Dim SA As Variant

    Set WS = ActiveSheet

    With WS
        Set CellRange = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With

    For Each R In CellRange
        SA = Split(Application.Trim(Replace(R.Value, Chr(9), " ")))
        For I = UBound(SA) To 0 Step -1
            NS = OnlyNumbers(CStr(SA(I)))
            If NS <> "" Then
                NS = SA(I)
                S = CStr(SA(I))
                Do While Not IsNumeric(Right(NS, 1))
                    NS = Left(NS, Len(NS) - 1)
                Loop
                R.Offset(0, 1).Value = NS
                S = Mid(S, Len(NS) + 1, Len(S))
                If Len(S) <= 4 Then
                    R.Offset(0, 2).Value = S
                Else
                    R.Offset(0, 2).Value = ""
                End If
                Exit For
            Else
                R.Offset(0, 1).Value = ""
                R.Offset(0, 2).Value = ""
            End If
        Next I
    Next R
End Sub
 
Upvote 0
VBA Code:
Sub ExtractNumbers2()
    Dim WS As Worksheet
    Dim CellRange As Range
    Dim R As Range
    Dim I As Long
    Dim NS As String, S As String
    Dim SA As Variant

    Set WS = ActiveSheet

    With WS
        Set CellRange = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row)
    End With

    For Each R In CellRange
        SA = Split(Application.Trim(Replace(R.Value, Chr(9), " ")))
        For I = UBound(SA) To 0 Step -1
            NS = OnlyNumbers(CStr(SA(I)))
            If NS <> "" Then
                NS = SA(I)
                S = CStr(SA(I))
                Do While Not IsNumeric(Right(NS, 1))
                    NS = Left(NS, Len(NS) - 1)
                Loop
                R.Offset(0, 1).Value = NS
                S = Mid(S, Len(NS) + 1, Len(S))
                If Len(S) <= 4 Then
                    R.Offset(0, 2).Value = S
                Else
                    R.Offset(0, 2).Value = ""
                End If
                Exit For
            Else
                R.Offset(0, 1).Value = ""
                R.Offset(0, 2).Value = ""
            End If
        Next I
    Next R
End Sub
Hello, rlv01! I tried to run the macro updated. Unfortunately, it generates the photo messages from below. I attached the full structure of database Fisier Migrare Materiale RETAIL.xlsm, containing more relevant products for testing. Please, verify it and tell me how could I solve this issue. Thank you!
 

Attachments

  • Error Only numbers.png
    Error Only numbers.png
    180.6 KB · Views: 14
  • Sub ExtractNumbers.png
    Sub ExtractNumbers.png
    178 KB · Views: 14
Upvote 0
In B3 enter 1600
Select B4
Data--> (in Data tools) click Flash Fill.
In column B you get required result.

Hello, kvsrinivasamurthy! I tested before your option too, but it is not the proper one for my issue. I need only numerical results from specific substrings and it gives many errors like the yellow coloured ones. Please, analyze all content of this topic, and if you find / know another efficient solution I'm very glad for your support. Thank you!
 

Attachments

  • Flash Fill.png
    Flash Fill.png
    18.4 KB · Views: 12
Upvote 0
Do not enter BARBA in B2.
Enter 1600 in B3. select B4 then click Flash Fill.
If you read all posts, it's mentioned that the database include names of products as strings. Plenty of them have alpha numerical substrings, other miss the numerical part. The previous macro needs some adjustments, but basically it covers the necessary conditions. If you search for an efficient solution, please take into account all data posted before. Thank you!
 
Upvote 0

Forum statistics

Threads
1,223,893
Messages
6,175,242
Members
452,623
Latest member
russelllowellpercy

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top