Extract Data Only Roman Number

muhammad susanto

Well-known Member
Joined
Jan 8, 2013
Messages
2,102
Office Version
  1. 365
  2. 2021
Platform
  1. Windows
hi all...

how to extract data contains roman number like this :

<style type="text/css">
table.tableizer-table {
font-size: 12px;
border: 1px solid #CCC ;
font-family: Arial, Helvetica, sans-serif;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #CCC ;
}
.tableizer-table th {
background-color: #104E8B ;
color: #FFF ;
font-weight: bold;
}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>data</th><th>target</th></tr></thead><tbody>
<tr><td>Rumah Negara Golongan I Permanen</td><td>I</td></tr>
<tr><td>Rumah Negara Golongan II Permanen</td><td>II</td></tr>
<tr><td>Rumah Negara Golongan III Permanen</td><td>III</td></tr>
<tr><td>Rumah Negara Golongan IIA Permanen</td><td>II</td></tr>
</tbody></table>


thanks in advance..

.sst
 
Michael Douglas IIA Actor

You would want II returned to you; however, strooman's function returns MDII instead. Any name starting with an upper case letter that is a Roman Number "digit" letter will have that letter returned to you using strooman's code.

This is correct. You also need valid Roman numerals
We can turn the Function in a Sub and adjust it a little bit by using the Split function.

Select your cell you want to scan and run this code.

Code:
Sub extractRoman()
Dim arrItems As Variant
Dim x As Long, y As Long
Dim result As String
Dim myMatch As Object, regEx As Object

    Set regEx = CreateObject("vbscript.regexp")
    strPattern = "(M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))"
    
    strInputText = Selection
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = strPattern
    End With
    
    Set myMatch = regEx.Execute(strInputText)
    
    For x = 0 To myMatch.Count - 1
            result = result & myMatch.Item(x) & ","
    Next
    arrItems = Split(Mid(result, 1), ",")
    Selection.Offset(0, 1).Resize(1, UBound(arrItems)) = arrItems
End Sub

[Table="width:, class:grid"][tr][td]Row\Col[/td][td]
A​
[/td][td]
B​
[/td][td]
C​
[/td][td]
D​
[/td][td]
E​
[/td][td]
F​
[/td][/tr][tr][td]
1​
[/td][td="bgcolor:#0070C0"]Your_Line[/td][td="bgcolor:#0070C0"]First[/td][td="bgcolor:#0070C0"]Second[/td][td="bgcolor:#0070C0"]Third[/td][td="bgcolor:#0070C0"]Fourth[/td][td="bgcolor:#0070C0"]Fifth[/td][/tr]
[tr][td]
2​
[/td][td]Rumah Negara Golongan I Permanen[/td][td="bgcolor:#FFFF00"]I[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
3​
[/td][td]Rumah Negara Golongan II Permanen[/td][td="bgcolor:#FFFF00"]II[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
4​
[/td][td]Of the letters used CCLIII commonly in Roman numerals[/td][td="bgcolor:#FFFF00"]CCLIII[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
5​
[/td][td]Rumah Negara Golongan IIA Permanen[/td][td="bgcolor:#FFFF00"]II[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
6​
[/td][td]Super Bowl XXX.[/td][td="bgcolor:#FFFF00"]XXX[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
7​
[/td][td]Years in Roman numerals: a.d. MCMLXXXIX = a.d. 1989.[/td][td="bgcolor:#FFFF00"]MCMLXXXIX[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
8​
[/td][td]This is the number 13, XIII[/td][td="bgcolor:#FFFF00"]XIII[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
9​
[/td][td]We live in the year 2018 wich is MMXVIII[/td][td="bgcolor:#FFFF00"]MMXVIII[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
10​
[/td][td]MCMXCIX[/td][td="bgcolor:#FFFF00"]MCMXCIX[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
11​
[/td][td]Combination of MMXVIII and CMA[/td][td="bgcolor:#FFFF00"]C[/td][td="bgcolor:#FFFF00"]MMXVIII[/td][td="bgcolor:#FFFF00"]C[/td][td="bgcolor:#FFFF00"]M[/td][td="bgcolor:#FFFF00"][/td][/tr]
[tr][td]
12​
[/td][td]Michael Douglas IIA Actor[/td][td="bgcolor:#FFFF00"]M[/td][td="bgcolor:#FFFF00"]D[/td][td="bgcolor:#FFFF00"]II[/td][td="bgcolor:#FFFF00"][/td][td="bgcolor:#FFFF00"][/td][/tr]
[/table]
 
Upvote 0

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
Assuming that the non-Roman Number text is either in all lower case or in proper case, here is a macro that I believe will return only real Roman Numbers as long as that Roman Number begins the text or is preceded by a space character...
Code:
[table="width: 500"]
[tr]
	[td]Sub ExtractRomanNumbers()
  Dim R As Long, X As Long, C As Long, Z As Long, LastRow As Long, MaxWordCount As Long
  Dim Arr As Variant, Data As Variant, Result As Variant
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  MaxWordCount = Evaluate(Replace("1+MAX(LEN(A2:A#)-LEN(SUBSTITUTE(A2:A#,"" "","""")))", "#", LastRow))
  Data = Range("A2:A" & LastRow)
  ReDim Result(1 To UBound(Data), 1 To MaxWordCount)
  For R = 1 To UBound(Data)
    Arr = Split(Data(R, 1))
    C = 0
    For X = 0 To UBound(Arr)
      If Arr(X) Like "[IVXLCMD]*" Then
        If Mid(Arr(X) & " ", 2, 1) Like "[IVXLCMD ]" Then
          C = C + 1
          For Z = 2 To Len(Arr(X)) + 1
            If Mid(Arr(X) & " ", Z, 1) Like "[!IVXLCMD]" Then
              Result(R, C) = Left(Arr(X), Z - 1)
              Exit For
            End If
          Next
        End If
      End If
    Next
  Next
  Range("B2").Resize(UBound(Result, 1), UBound(Result, 2)) = Result
End Sub[/td]
[/tr]
[/table]
 
Upvote 0
As a follow up to the code I posted in Message #22 above, I modified the code to remove the restriction that the Roman Numbers had to be preceded by a space character (when it did not start the text). This will allow the Roman Number to be preceded by or followed by parentheses, brackets, braces, quote marks, dashes, etc. I think I have caught all of the problem areas, but if you come across one, please let me know so I can try to fix the macro for it.
Code:
[table="width: 500"]
[tr]
	[td]Sub ExtractRomanNumbers()
  Dim R As Long, X As Long, C As Long, Z As Long, LastRow As Long, MaxWordCount As Long
  Dim Arr As Variant, Data As Variant, Result As Variant
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  MaxWordCount = Evaluate(Replace("1+MAX(LEN(A2:A#)-LEN(SUBSTITUTE(A2:A#,"" "","""")))", "#", LastRow))
  Data = Range("A2:A" & LastRow)
  ReDim Result(1 To UBound(Data), 1 To MaxWordCount)
  For R = 1 To UBound(Data)
    For X = 1 To Len(Data(R, 1))
      If Mid(Data(R, 1) & " ", X, 1) Like "[!IVXLCMD]" Or Mid(Data(R, 1), X + 1, 1) Like "[a-z]" Then
        Mid(Data(R, 1), X) = " "
      End If
    Next
    Arr = Split(Application.Trim(Data(R, 1)))
    C = 0
    For X = 0 To UBound(Arr)
      C = C + 1
      Result(R, C) = Arr(X)
    Next
  Next
  Range("B2").Resize(UBound(Result, 1), UBound(Result, 2)) = Result
End Sub[/td]
[/tr]
[/table]
 
Upvote 0
Very thorough solution Rick. You gave it a good try to come up with something so robust. Definitly come in handy.
 
Upvote 0

Forum statistics

Threads
1,224,829
Messages
6,181,219
Members
453,024
Latest member
Wingit77

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top