Extracting year from text string

Ducklings23

New Member
Joined
Mar 11, 2020
Messages
20
Office Version
  1. 2016
Platform
  1. Windows
Hi all. I'm trying to extract the year (so four digit number) from various strings of text. The text may or may not contain other numbers. For context - worksheet is a list of incident reports and in some instances the date is contained within the description of the incident. I need to break the year out. Most of the time it just say things like "it was in November or late 2015" etc. I just want to extract any 4 digit number from the text, then I can check if it is a year or not afterwards. The year can appear in any part of the text string. Hope that makes sense! Thanks in advance.
 

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
Hi all. I'm trying to extract the year (so four digit number) from various strings of text. The text may or may not contain other numbers. For context - worksheet is a list of incident reports and in some instances the date is contained within the description of the incident. I need to break the year out. Most of the time it just say things like "it was in November or late 2015" etc. I just want to extract any 4 digit number from the text, then I can check if it is a year or not afterwards. The year can appear in any part of the text string. Hope that makes sense! Thanks in advance.
Please post a sample of a bunch of these strings.
Try to cover all the different structure possibilities, so we get a complete picture of all the different ways these strings may appear, so we can devise an answer that should work in most (if not all) cases for you.
 
Upvote 0
Perhaps a user-defined function like this might help? To implement ..
1. Right click the sheet name tab and choose "View Code".
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below (you can use the icon at the top right of the code pane below) into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Enter the formula as shown in the screen shot below and copy down.
6. Your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

VBA Code:
Function FourDigits(s As String) As String
  Dim M As Variant
 
  With CreateObject("VBSCript.RegExp")
    .Global = True
    .Pattern = "(^|\D)(\d{4})(?=\D|$)"
    For Each M In .Execute(s)
      FourDigits = FourDigits & ", " & M.Submatches(1)
    Next M
  End With
  FourDigits = Mid(FourDigits, 3)
End Function

Ducklings23.xlsm
AB
1it was in November or late 20152015
2It was 2015 or 2014 in November2015, 2014
3 
4cbcb 
523659 abc 23452345
620122012
Sheet1
Cell Formulas
RangeFormula
B1:B6B1=FourDigits(A1)
 
Upvote 0
Or

VBA Code:
Function jec(cell As String) As String
  c00 = Split(cell)
  For i = 0 To UBound(c00)
    If Len(c00(i)) = 4 And IsNumeric(c00(i)) Then jec = jec & IIf(jec = "", "", ", ") & c00(i)
  Next
End Function

Excel Formula:
=jec(A1)
 
Upvote 0
@JEC
IsNumeric is not a very robust test. Try your function with these texts
"I paid $215 in 2021"
"I used 22.5 cans"
 
Upvote 0
True! Then this one is better

VBA Code:
Function jec(cell As String) As String
  c00 = Split(cell)
  For i = 0 To UBound(c00)
    If c00(i) Like "[0-9][0-9][0-9][0-9]" Then jec = jec & IIf(jec = "", "", ", ") & c00(i)
  Next
End Function
 
Upvote 0
Then this one is better
True, but then it would miss ones like this (if such data is possible)
"it was on 5/8/2015"

.. or if there was a data entry typo
"it was in November or late2015"
 
Upvote 0
Yes, such data can only be handled with regex.
 
Upvote 0
Yes, such data can only be handled with regex.
Well, I think it can be done without RegExp, but I generally prefer to avoid so much looping.

VBA Code:
Function Digits4(s As String) As String
  Dim i As Long
 
  s = "." & s & "."
  For i = 1 To Len(s) - 5
    If Mid(s, i, 6) Like "[!0-9][0-9][0-9][0-9][0-9][!0-9]" Then Digits4 = Digits4 & ", " & Mid(s, i + 1, 4)
  Next i
  Digits4 = Mid(Digits4, 3)
End Function

Ducklings23.xlsm
AB
1it was in November or late 20152015
2It was 2015 or 2014 in November2015, 2014
3 
4cbcb 
523659 abc 23452345
620122012
7It was on 1Dec 20152015
8I paid $215 in 2021, 2020 and 20182021, 2020, 2018
9I used 22.5 cans 
10it was on 5/8/20152015
11it was in November or late20152015
Sheet1
Cell Formulas
RangeFormula
B1:B11B1=Digits4(A1)
 
Upvote 0
I don't like looping too, but sometimes Regex is not very easy to read. Especially if you have never worked with regex before.

Very powerful tool and takes time to learn! I like it.:cool:
 
Upvote 0

Forum statistics

Threads
1,224,517
Messages
6,179,234
Members
452,898
Latest member
Capolavoro009

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top