Regex to split substring

dmd123

New Member
Joined
Aug 16, 2016
Messages
5
Hi,

I am trying to figure out how to implement a regex solution to a data field and was looking for some help.

I have got a fair bit done using mid and search/find but I am still doing a lot of clean up to the data after extraction to compensate for the different way people record the same info. The regular expression I want to extract from is:

## hours @ £##.##

I have been using the @ sign to get the hours and the £ sign to get the cost. However....some people write hrs, others write hour or hours and some just hr. The hours may be part values as well, so 10.5 or 9.25 etc. For the price some will include the decimal place and others won't. The expression sits within a larger string and is never in the same place. I had tried to use find and replace to standardise the hours data but it starts to change existing text that includes some of the combinations - for example gilchrist becomes gilchourist, not ideal!!

The £ value also has issues, some people put a space after the £ sign whilst others incude no decimal point so £15 instead of £15.00 I have used Search within MID to get the starting point for extraction but because of the variation I often have values coming out with other aspects of the string that I don't want to see.

By using the @ sign I can get at the data either side of it. The main issue is extracting the hours. I thought I could use a regex to pick up on the variation and still pull through the numerical value for the hours irrespective of whether someone has input hr/hrs/hour/hours.

I was contemplating writing a series of formulas using mid and then choosing the one that returned the cleanest value However, when tweaking the formula manually this changes from returning 3 characters to returning eight. Whilst it works I get the feeling there is a better way and when I looked I saw regex and thought that might work. I am looking to get the hours in one cell in the row and the £value in another.

An example of the output would be that the expression 10 hrs @ £50.50 results in 10 in one cell and £50.50 in another.

Thanks for your help.
 

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
Excel works best if each field of a record is entered into a different cell. e.g. 10 in one column and 50.50 in another.

As you are discovering, combining different fields in the same cell just ads a ton of busy work to parse the entry.
 
Last edited:
Upvote 0
I know. However, it has been my experience that, a lot of people don't know how best to use it.

I have had a passing awareness of regex but only started looking at it more seriously the other day. It is quite an unusual shorthand compared to excel. It looks powerful. I found some info on stack exchange that let me get to the data I was after and pull it out irrespective of where it lay. This is much cleaner than my usual text formulas! Here is the link:

https://stackoverflow.com/questions...gex-in-microsoft-excel-both-in-cell-and-loops
 
Upvote 0
I am trying to figure out how to implement a regex solution to a data field and was looking for some help.

I have got a fair bit done using mid and search/find but I am still doing a lot of clean up to the data after extraction to compensate for the different way people record the same info.
I think you should be able to do this quite efficiently without using RegExp, but I won't know for sure until I see a representative sample of the data itself that you have to parse. If you could post, say, 5 examples of the text (I understand about the variance in the hours, so don't worry about varying them, I need to see the overall text value itself). I am also interested in what cells these text examples would be in on your worksheet (makes writing code easier). Oh, and is that symbol in front of the last number a currency symbol? If so, which is it (I see two individual characters, an  and a £ next to each other)?
 
Upvote 0
Thanks that would be great :).

The character is currency, UK pound sign. The data appears in column C and here is a sample of the text below.


<tbody>
[TD="class: xl65"]A. Stranger 23.25 hours @ £9.00 per hour plus varying length text.[/TD]

[TD="class: xl65"]AB Stranger 72hours @ £12.[/TD]

[TD="class: xl65"]Person on holiday only 5 hours @ £19.99. Additional varying length text here.[/TD]

[TD="class: xl65"]New purchase order ref 123456. 12.5 hours @ £19.50. Additional text here.[/TD]

[TD="class: xl65"]ABC McStranger received 152 hours @ £14.25. Additional varying length text here.[/TD]

</tbody>
 
Upvote 0
Thanks that would be great :).

The character is currency, UK pound sign. The data appears in column C and here is a sample of the text below.


<tbody>
[TD="class: xl65"]A. Stranger 23.25 hours @ £9.00 per hour plus varying length text.[/TD]

[TD="class: xl65"]AB Stranger 72hours @ £12.[/TD]

[TD="class: xl65"]Person on holiday only 5 hours @ £19.99. Additional varying length text here.[/TD]

[TD="class: xl65"]New purchase order ref 123456. 12.5 hours @ £19.50. Additional text here.[/TD]

[TD="class: xl65"]ABC McStranger received 152 hours @ £14.25. Additional varying length text here.[/TD]

</tbody>
See if these two UDFs (User Defined Functions) work for you...
Code:
Function GetHours(ByVal S As String) As Double
  Dim Parts() As String
  Parts = Split(Replace(S, " @", "@"), "@")
  Parts = Split(Parts(0))
  If Parts(UBound(Parts)) Like "[0-9.]*" Then
    GetHours = Val(Parts(UBound(Parts)))
  Else
    GetHours = Parts(UBound(Parts) - 1)
  End If
End Function

Function GetPrice(ByVal S As String) As Double
  GetPrice = Split(Val(Split(Replace(S, "£", ""), "@")(1)))(0)
End Function
 
Upvote 0
Hi Rick, thanks very much, this works a treat!

Is there a good reference book on VBA you could point me to so I can learn to develop functions like this myself?

Thanks again!
 
Upvote 0
Is there a good reference book on VBA you could point me to so I can learn to develop functions like this myself?
I am the wrong person to ask that question of. I am completely self-taught and I have been programming since 1981 when I bought my first personal computer (a TI-99/4). While I did do some programming in Fortran, C, C++ and awk (a UNIX scripting language), most of my programming efforts were in various forms of BASIC (a predecessor to VB/VBA), then the compiled version of Visual Basic and over the last several years VBA (the language built into Excel). So what I know comes from over 37 years of hands on coding experience. The best I can do for you is to refer you to a gigantic list of references which a gentleman who posts under the name hike95 has compiled and generously shared in the past... hopefully that will help you.

https://www.mrexcel.com/forum/excel...w-use-vbulletin-excel-2007-a.html#post4760720
 
Upvote 0
I am trying to figure out how to implement a regex solution ...
If you still wanted to consider a regex approach, you could try this UDF, used as shown in columns B:C in the screen shot below.

Code:
Function HrCost(S As String, HC As String) As Double
  With CreateObject("VBScript.RegExp")
    .Pattern = "(\d+\.?\d*)(\D+@\D+)(\d+\.?\d*)"
    HrCost = .Execute(S)(0).Submatches(IIf(UCase(HC) = "H", 0, 2))
  End With
End Function

I was contemplating writing ... formulas
Columns E:F have some suggestions that seem to work for the sample data given.


Excel 2016 (Windows) 32 bit
ABCDEF
1A. Stranger 23.25 hours @ £9.00 per hour plus varying length text.23.25923.259
2AB Stranger 72hours @ £12.72127212
3Person on holiday only 5 hours @ £19.99. Additional varying length text here.519.99519.99
4New purchase order ref 123456. 12.5 hours @ £19.50. Additional text here.12.519.512.519.5
5ABC McStranger received 152 hours @ £14.25. Additional varying length text here.15214.2515214.25
Sheet1
Cell Formulas
RangeFormula
B1=HrCost(A1,"H")
C1=HrCost(A1,"C")
E1=LEFT(RIGHT(SUBSTITUTE(TRIM(SUBSTITUTE(UPPER(LEFT(A1,FIND("@",A1)-1)),"H"," H"))," ",REPT(" ",20)),40),20)+0
F1=REPLACE(LEFT(A1,FIND(" ",SUBSTITUTE(A1,". "," ")&" ",FIND("£",A1)+2)-1),1,FIND("£",A1),"")+0
 
Upvote 0
Hi Peter,

Thanks very much. I am intrigued by regex and I like the formulas too. So it is great to have the choice.

I have them all on one sheet at minute and it doesn't like it, going really slow.

I am going to work through the data and pick which of the regex and/or formulas that get the most results and is also easiest on the processors. The regex and formula all work for the test cases given - however, the level of variation in user input means that some pick up more of the variations than others.

Thanks again Peter. Much appreciated.
 
Upvote 0

Forum statistics

Threads
1,223,885
Messages
6,175,183
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top