Find specific lines then find data within those lines - that begins after a specific string

User45

New Member
Joined
Jun 28, 2013
Messages
25
Hello,
Hoping someone can help me with this complicated task.What I have is a file with hundreds of lines (all in Column A; like A1,A2,A3...):
Rich (BB code):
cell1
cell2
>>>>cell3\randomdata\city\London\randomdata\randomdata\country\England*&^\randomdata\
cell4
>>cell5 Language: English
cell6
cell7
>>>>cell8\randomdata\randomdata\randomdata\randomdata\country\S*pai^n\randomdata\city\Madrid\
cell9
cell10
cell11
>>cell12 Language: Spanish
cell13
...

In cell3 and cell8 ,the word "city" and the name of the city are always next to each other(same is true for country), as shown above, and they are always separated with a backslash "\". Cells that contain this info always begin with ">>>>".The name of the country can have special characters, as shown on cell3 and cell8.All characters except for "%" are possible.The data we need to extract(city/country names) are always between "\" characters,which basically indicates the beginning and end of the data we need to extract.
Cell5 and Cell12 contain a language info.These cells will always begin with ">>" and the name of the language is always listed after "Language: ".There is no other info in Cell5 and Cell12.There is only one instance of ":" character.
Cell5's info is associated with Cell3's info(and cell12 to cell8) and it always comes after it, but not necessity on the very next cell.As shown above, there can be multiple cells between cells that begin with ">>>>" and ">>".


The goal is to have:
Rich (BB code):
city\London\country\England*&^\Language\English
city\Madrid\country\S*pai^n\Language\Spanish

I'm using Excel2013. Any help is appreciated !!
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
So, to clarify.

Somewhere between Nth occurence of >>>> and the N+1 >>>>, there will always be a >>, indicating the language? There will always be at least one - and never more than one >>, in-between the >>>>'s?

If this is correct, then what happens at the very end of the list, when you encounter the last >>>>?
 
Upvote 0
Correct.
Sometime after >>>> ,the next cell that we care about is always >> , then after a while >>>>, then >> again .
No two >> cells are "neighbors".
No two >>>> cells are "neighbors".
It is guaranteed that the last cell of interest begins with >> . There is always >> after the >>>> .

start
>>>>
>>
>>>>
>>
...
>>>>
>>
end

EDIT: When I encounter the last >>>> , I should continue in order to find the next >> , to "combine" their data.
 
Last edited:
Upvote 0
Hi @iliace, I don't have time to look at this but if you do, how about trying 2 arrays, one mapping ">>>>" and the other ">>"? The indexes then match the data.
 
Upvote 0
This seems to work with the posted data. Please test this with your data.
It assumes pairing of >>>> and >> always exists.
Code:
Option Explicit
Public Sub ExtractData()
Dim bNewEntry As Boolean
Dim i As Long
Dim oDict As Object
Dim sCity As String, sCountry As String, sLanguage As String, sInput As String
Dim vSplit As Variant
'\\ Create dictionary for array. It will also avoid duplicates, if any
Set oDict = CreateObject("Scripting.Dictionary")
For i = 1 To Cells(Rows.Count, "A").End(xlUp).Row
    'Check for entry of first city and country
   If InStr(Cells(i, 1).Value, ">>>>") = 1 Then
        vSplit = Split(Cells(i, 1).Value, "\")
        sCity = Application.Index(vSplit, Application.Match("city", vSplit, 0) + 1)
        sCountry = Application.Index(vSplit, Application.Match("country", vSplit, 0) + 1)
    'Check for Language
   ElseIf InStr(Cells(i, 1).Value, ">>") = 1 Then
        sLanguage = Mid(Cells(i, 1).Value, InStr(Cells(i, 1).Value, ":") + 1, 99)
        bNewEntry = True
   End If
    'Update new entry
   If bNewEntry Then
        sInput = "city\" & sCity & "\country\" & sCountry & "\Language\" & sLanguage
   Else
        sInput = ""
   End If
    'Check if it is there. If not add new record
   If Not oDict.exists(sInput) And bNewEntry Then
        oDict.Add sInput, sInput
        bNewEntry = False
   End If
Next i
'Paste data in the cells
Range("B1").Resize(oDict.Count, 1).Value = Application.Transpose(oDict.keys)
'Reset
Set oDict = Nothing
End Sub
It should paste results in cell B1 onwards.
 
Upvote 0
Thank you Taurean ! That works like a magic.

Another question.What if Cell3 is
>>>>cell3\randomdata\city\London\randomdata\country\England*&^\randomdata\continent\Europe

But Cell8 does not have "continent" info(same as written above), and if I were to include Cell3's continent info to make it like
Code:
city\London\country\England*&^\Language\English\continent\Europe
city\Madrid\country\S*pai^n\Language\Spanish

What changes would I need to make? Right now when I define sContinent As String , and make associated changes to code, it does not work because it can't find continent info on Cell8 so it gives "Run-Time Error 13, Type mismatch" when I run it.
 
Upvote 0
Thank you Taurean ! That works like a magic.

Another question.What if Cell3 is
>>>>cell3\randomdata\city\London\randomdata\country\England*&^\randomdata\continent\Europe

But Cell8 does not have "continent" info(same as written above), and if I were to include Cell3's continent info to make it like
Code:
city\London\country\England*&^\Language\English\continent\Europe
city\Madrid\country\S*pai^n\Language\Spanish

What changes would I need to make? Right now when I define sContinent As String , and make associated changes to code, it does not work because it can't find continent info on Cell8 so it gives "Run-Time Error 13, Type mismatch" when I run it.

I have a question about your data. Is the word "cell" followed by a number in any of your cells (including after the >>>> as shown) or did you use cell1, cell2, etc. as an identifier only? What I am trying to understand is if all your cells are blank except for those that start with ">>>>" and ">>".

Also, your original >>>> data showed the text ending with a backslash, but your "continent" example above does not end with a backslash... was that a mistake or done on purpose?
 
Last edited:
Upvote 0
I used "cell3","cell8" to identify them easier in here.In reality they are like this
>>>>\randomdata\city\London\randomdata\randomdata\etc\
The cells that begin with >>>> always end with "\". I made a mistake on my last post.It should have been \continent\Europe\
The cells that being with >>>> are always followed by "\". Example ">>>>\...\...\...\" , so there is no digit immediately after >>>> .


It is guaranteed that there are no blank cells.
Cell1, Cell2, Cell4 ... contain random data, but they never begin with >>>> or >>
 
Last edited:
Upvote 0
I think this somewhat "shortish" macro will do what you want...
Code:
Sub ExtractData()
  Dim X As Long, Lines() As String, DataOut As Variant
  Lines = Split(Application.Trim(Join(Application.Transpose(Range("A1", Cells(Rows.Count, "A").End(xlUp)).Value), " ")), " >>>>")
  ReDim DataOut(1 To UBound(Lines) + 1, 1 To 1)
  For X = 1 To UBound(Lines)
    DataOut(X, 1) = "city\" & Split(Split(Lines(X), "\city\")(1), "\")(0)
    DataOut(X, 1) = DataOut(X, 1) & "\country\" & Split(Split(Lines(X), "\country\")(1), "\")(0)
    DataOut(X, 1) = DataOut(X, 1) & "\Language\" & Split(Split(Lines(X), "Language: ")(1), " ")(0)
    If InStr(Lines(X), "continent") Then
      DataOut(X, 1) = DataOut(X, 1) & "\continent\" & Split(Split(Lines(X), "\continent\")(1), "\")(0)
    End If
  Next
  Range("B1:B" & UBound(DataOut)) = DataOut
End Sub
 
Upvote 0

Forum statistics

Threads
1,224,811
Messages
6,181,080
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top