Help with formula to tabulate data in stacked form

Fractalis

Active Member
Joined
Oct 11, 2011
Messages
328
Office Version
  1. 365
Platform
  1. Windows
Hi all,

I have the below text in column A that has data with pairs of parameter,value in stacked way and separated with " = ". For this I'm trying to have a formula to tabulate it
in order to extract some parameters of each block and put them in each line.

In B2 I have this formula so far "=MID(INDEX($A$1:$A$79,MATCH("*"&B$1&"*",$A$1:$A$79,0)),16,10)" but when I copy accross and down, only
shows the data of first block.

How to tabulate the values of parameters ID,NAME,TYPE,RSGSN?

The output I'm trying to get is from column B to E.

Thanks in advance


file.xlsx
ABCDE
1IDNAMETYPERSGSN
2some text some text0INV445ECT18560
3some text0INV445ECT6103
4some text some text0INV445ECT8180
5some text some text2MODULE18HSSD2962
62MODULE18HSSD13603
7 RSN = 12MODULE18HSSD764
8 ID = 02MODULE18HSSD6200
9 NAME = INV445
10 TYPE = ECT
11 AMDEF = STR
12 RSGSN = 18560
13 AMRUL = TOMIZ
14 RREST = 0
15 PERAR = FALSE
16 ERANN = FALSE
17 RANNO = FALSE
18
19 RSGSN = 6103
20 AMRUL = TOMIZ
21 RREST = 0
22 PERAR = FALSE
23 ERANN = FALSE
24 RANNO = FALSE
25
26 RSGSN = 8180
27 AMRUL = TOMIZ
28 RREST = 0
29 PERAR = FALSE
30 ERANN = FALSE
31 RANNO = FALSE
32
33some text some text
34
35some text some text
36
37--- END
38some text some text
39
40some text some text
41some text some text
42some text some text
43some text some text
44
45 RSN = 1
46 ID = 2
47 NAME = MODULE18
48 TYPE = HSSD
49 AMDEF =SSN
50 RSGSN = 2962
51 AMRUL = STR
52 PERAR = FALSE
53 ERANN = FALSE
54 RANNO = FALSE
55
56 RSGSN = 13603
57 AMRUL = STR
58 PERAR = FALSE
59 ERANN = FALSE
60 RANNO = FALSE
61
62 RSGSN = 764
63 AMRUL = TOMIZ
64 RREST = 0
65 PERAR = FALSE
66 ERANN = FALSE
67 RANNO = FALSE
68
69 RSGSN = 6200
70 AMRUL = TOMIZ
71 RREST = 0
72 PERAR = FALSE
73 ERANN = FALSE
74 RANNO = FALSE
75
76some text some text
77
78some text some text
79
80--- END
Sheet1
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Wow, this would have been so much easier if I'd just written a macro! But I got started on this approach, and after I got started, I was just stubborn enough to finish this method. Consider:

Book1
ABCDEFGHIJ
1IDNAMETYPERSGSNRowStartEndMax
2some text some text0INV445ECT1856021373
3some text0INV445ECT6103538804
4some text some text0INV445ECT8180    
5some text some text2MODULE18HSSD2962    
62MODULE18HSSD13603    
7 RSN = 12MODULE18HSSD764 
8 ID = 02MODULE18HSSD6200
9 NAME = INV445    
10 TYPE = ECT    
11 AMDEF = STR    
12 RSGSN = 18560    
13 AMRUL = TOMIZ
14 RREST = 0
15 PERAR = FALSE
16 ERANN = FALSE
17 RANNO = FALSE
18
19 RSGSN = 6103
20 AMRUL = TOMIZ
21 RREST = 0
22 PERAR = FALSE
23 ERANN = FALSE
24 RANNO = FALSE
25
26 RSGSN = 8180
27 AMRUL = TOMIZ
28 RREST = 0
29 PERAR = FALSE
30 ERANN = FALSE
31 RANNO = FALSE
32
33some text some text
34
35some text some text
36
37--- END
38some text some text
39
40some text some text
41some text some text
42some text some text
43some text some text
44
45 RSN = 1
46 ID = 2
47 NAME = MODULE18
48 TYPE = HSSD
49 AMDEF =SSN
50 RSGSN = 2962
51 AMRUL = STR
52 PERAR = FALSE
53 ERANN = FALSE
54 RANNO = FALSE
55
56 RSGSN = 13603
57 AMRUL = STR
58 PERAR = FALSE
59 ERANN = FALSE
60 RANNO = FALSE
61
62 RSGSN = 764
63 AMRUL = TOMIZ
64 RREST = 0
65 PERAR = FALSE
66 ERANN = FALSE
67 RANNO = FALSE
68
69 RSGSN = 6200
70 AMRUL = TOMIZ
71 RREST = 0
72 PERAR = FALSE
73 ERANN = FALSE
74 RANNO = FALSE
75
76some text some text
77
78some text some text
79
80--- END
Sheet9
Cell Formulas
RangeFormula
B2:E9B2=IFERROR(MID(INDEX($A:$A,AGGREGATE(15,6,ROW(INDIRECT(VLOOKUP(ROW(),$G$2:$J$100,2)&":"&VLOOKUP(ROW(),$G$2:$J$100,3)))/ISNUMBER(SEARCH(B$1&" =",INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,2)):INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,3)))),ROW()-VLOOKUP(ROW(),$G$2:$J$100,1)+1)),17,10),IF(ROW()>SUM($J:$J)+ROW(B$2)-1,"",B1))
B10:E11B10=IFERROR(MID(INDEX($A:$A,AGGREGATE(15,6,ROW(INDIRECT(VLOOKUP(ROW(),$G$2:$J$100,2)&":"&VLOOKUP(ROW(),$G$2:$J$100,3)))/ISNUMBER(SEARCH(B$1&" =",INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,2)):INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,3)))),ROW()-VLOOKUP(ROW(),$G$2:$J$100,1)+1)),17,10),IF(ROW()>SUM($J:$J)+ROW(B$2),"",B9))
B12:E12B12=IFERROR(MID(INDEX($A:$A,AGGREGATE(15,6,ROW(INDIRECT(VLOOKUP(ROW(),$G$2:$J$100,2)&":"&VLOOKUP(ROW(),$G$2:$J$100,3)))/ISNUMBER(SEARCH(B$1&" =",INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,2)):INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,3)))),ROW()-VLOOKUP(ROW(),$G$2:$J$100,1)+1)),17,10),"")
G2:G6G2=IF(J2="","",ROW($B$2)+SUM(J$1:J1))
I2:I6I2=IFERROR(AGGREGATE(15,6,ROW(INDEX(A:A,H2):INDEX(A:A,H2+1000))/ISNUMBER(SEARCH("END",INDEX(A:A,H2):INDEX(A:A,H2+1000))),1),"")
H3:H6H3=IFERROR(IF(COUNTIF(INDEX(A:A,I2+1):INDEX(A:A,I2+1000),"*END*"),I2+1,""),"")
J2:J7J2=IF(I2="","",MAX(COUNTIF(INDEX(A:A,H2):INDEX(A:A,I2),"*"&$B$1:$E$1&" =*")))


I first came up with a helper table (G1:J10). This table tabulates certain information from each record in the A column. It has the start and end of each record. The first record starts on row 1, and ends with the text "END" in column A. (We may need to include the dashes, because "end" in some of the other random text would throw things off.) The next record starts 1 row after that, and looks for the next END, and so on. Then the Max column finds the maximum number of elements of each type in that range. So in the first record, ID happens once, Name happens once, Type happens once, and RSGSN happens 3 times, so the max is 3. Finally, the Row column is the row where the output for each record will start.

Once we have all that, then the formula in B2 searches for every row with "ID =" in the A1:A37 range and displays them in order. If there are less than the number of rows, it just repeats the cell above it.

This would be much easier with VBA, or even the newer functions available in Excel 365. But take a look and let me know what you think.
 
Upvote 0
Hi

Please check this

Two methods have been provided- using SORTBY, TEXTSPLIT etc (may work only in Office 365 and beyond), and using older functions (should work in older versions also)

They have some intermediate steps, but the all of them are linked, and hence the methods should work fine

A simplistic version of the input is in Cells A5:A13

Output is in cells O2:S3

1. Splitting them into left and right of "="
2. Removing blank entries (if any)
3. Finding serial numbers of each set of entries
4. Sorting them in the order ID- NAME- RSN- TYPE
5. Using FILTER to arrange them in the output format

The intermediate steps may be kept in a backup sheet as well

Method 1
Ordering random entries.xlsx
ABCDEFGHIJKLMNOPQRS
2Last row12S NoIDNAMERSNTYPE
310INV4451ECT
4InputSplit OutRemoved BlanksSl NoOrderSortedOrder21INV4462HSSD
5RSN=1RSN1RSN113ID011RSN33    
6ID=0ID0ID011NAMEINV44512ID1
7NAME=INV445NAMEINV445NAMEINV44512RSN113NAME2
8TYPE=ECTTYPEECTTYPEECT14TYPEECT14TYPE4
9 RSN223ID1215
10RSN=2RSN2ID121NAMEINV44622
11ID=1ID1NAMEINV44622RSN223
12NAME=INV446NAMEINV446TYPEHSSD24TYPEHSSD24
13TYPE=HSSDTYPEHSSD
Sheet3
Cell Formulas
RangeFormula
G2G2=ROW(OFFSET(F4,COUNTA(F4:F1000),1))-1
P3:S5P3=IFERROR(FILTER($I$5:$I$100,($H$5:$H$100=P$2)*($J$5:$J$100=$O3)),"")
D5:E12D5=FILTER(B5:C13,B5:B13<>"")
F5:F12F5=COUNTIF(D$5:D5,"RSN")
G5:G12G5=VLOOKUP(D5,$M$5:$N$9,2,0)
H5:K12H5=SORTBY(D5:INDIRECT("G"&G2),F5:INDIRECT("F"&G2),1,G5:INDIRECT("G"&G2),1)
B10:C13,B9,B5:C8B5=IFERROR(TEXTSPLIT(A5,"="),"")
Dynamic array formulas.


Method 2
Ordering random entries.xlsx
ABCDEFGHIJKLMNOPQRS
2Last row12S NoIDNAMERSNTYPE
310INV4451ECT
4InputSplit OutOrderSl NoSortedOrder21INV4462HSSD
5RSN=1RSN131ID011RSN33    
6ID=0ID011NAMEINV44521ID1
7NAME=INV445NAMEINV44521RSN131NAME2
8TYPE=ECTTYPEECT41TYPEECT41TYPE4
9  510000ID1125
10RSN=2RSN232NAMEINV44622
11ID=1ID112RSN232
12NAME=INV446NAMEINV44622TYPEHSSD42
13TYPE=HSSDTYPEHSSD42510000
Sheet3 (2)
Cell Formulas
RangeFormula
G2G2=ROW(OFFSET(F4,COUNTIF(F4:F1000,"<>"),1))-1-COUNTIF(A5:A13,"")
P3:S5P3=IFERROR(FILTER($G$5:$G$100,($F$5:$F$100=P$2)*($I$5:$I$100=$O3)),"")
B5:B13B5=IFERROR(LEFT(A5,FIND("=",A5)-1),"")
C5:C13C5=IFERROR(SUBSTITUTE(A5,B5&"=",""),"")
D5:D13D5=VLOOKUP(B5,$K$5:$L$9,2,0)
E5:E13E5=IF(A5<>0,COUNTIF(B$5:B5,"RSN"),10000)
F5:I13F5=SORT(SORT(B5:E13,3,1),4,1)
Dynamic array formulas.
 
Upvote 0
Wow, this would have been so much easier if I'd just written a macro! But I got started on this approach, and after I got started, I was just stubborn enough to finish this method. Consider:

But take a look and let me know what you think.
Hi @Eric W , I never thought that could be too difficult. Maybe I forget to say "if someone thinks is too difficult to do it with Excel Formula, could be with VBA". I thought that if I had been able to find the solution for the first block (with my modest formula), it would only need a small tweak for the others. I've tried your formula with actual data (more than 14K rows) and even find the correct values for Row, Start, End, Max in column G:J, in column B:E shows empty from B:D and "R´s" in column E. I am ashamed for all the time you could invest in helping me, I have access to office 365 to try or VBA if somebody can. Regards

Hi

Please check this

Two methods have been provided- using SORTBY, TEXTSPLIT etc (may work only in Office 365 and beyond), and using older functions (should work in older versions also)
Thanks @arun_eagertolearn for your help, my actual data has more than 14K. I'll need to modify the input to look like yours in order try your method. I'm trying to understand so far your logic.. Regards
 
Last edited:
Upvote 0
One option with 365. Start with Column E then fill down B.
Book1
BCDE
1IDNAMETYPERSGSN
20INV445ECT18560
30INV445ECT6103
40INV445ECT8180
52MODULE18HSSD2962
62MODULE18HSSD13603
72MODULE18HSSD764
82MODULE18HSSD6200
Sheet13
Cell Formulas
RangeFormula
E2:E8E2=TEXTAFTER(FILTER($A$2:$A$80,ISNUMBER(SEARCH(E1,$A$2:$A$80))),"= ")
B2:D8B2=LET(t,TAKE($A$2:$A$80,XMATCH("*" & $E2 & "*",$A$2:$A$80,2)),TEXTAFTER(XLOOKUP("*" & B$1:D$1 & "*",t,t,,2,-1),"= "))
Dynamic array formulas.
 
Upvote 0
One option with 365. Start with Column E then fill down B.
Thanks @Cubist for your help. I've tried with actual data and since to work, only one minor issue I think. Not sure why when values of parameter "RSGSN" have only one digit, the values in column B:C are shown like "#N/A". You could try changing all values or some values of "RSGSN" to one digit to see the issue. Regards
 
Upvote 0
Hi @Eric W , I never thought that could be too difficult. Maybe I forget to say "if someone thinks is too difficult to do it with Excel Formula, could be with VBA". I thought that if I had been able to find the solution for the first block (with my modest formula), it would only need a small tweak for the others. I've tried your formula with actual data (more than 14K rows) and even find the correct values for Row, Start, End, Max in column G:J, in column B:E shows empty from B:D and "R´s" in column E. I am ashamed for all the time you could invest in helping me, I have access to office 365 to try or VBA if somebody can. Regards
Don't worry about the time spent, no one was twisting my arm! 😉

I'm not sure why the B:E columns aren't populating correctly, but I suspect that it's because the number of spaces that the values start with aren't what I expected. I tried adding a TRIM to the function, see if this helps:

B2:

Excel Formula:
=IFERROR(MID(TRIM(INDEX($A:$A,AGGREGATE(15,6,ROW(INDIRECT(VLOOKUP(ROW(),$G$2:$J$100,2)&":"&VLOOKUP(ROW(),$G$2:$J$100,3)))/ISNUMBER(SEARCH(B$1&" =",INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,2)):INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,3)))),ROW()-VLOOKUP(ROW(),$G$2:$J$100,1)+1))),LEN(B$1)+4,10),IF(ROW()>SUM($J:$J)+ROW(B$2)-1,"",B1))

If you have Excel 365 to work with, I'd suggest giving Cubist's formulas a shot, since they're far shorter. If you'd like me or Cubist to tweak our formulas, you'll need to show us the values that should be there, but aren't showing correctly.

Also, just to be sure, are you expecting 1 value each of ID, Name, and Type, and multiple of RSGSN?

Let us know. If you'd like to try a VBA solution, let me know.
 
Upvote 0
Thanks @Cubist for your help. I've tried with actual data and since to work, only one minor issue I think. Not sure why when values of parameter "RSGSN" have only one digit, the values in column B:C are shown like "#N/A". You could try changing all values or some values of "RSGSN" to one digit to see the issue. Regards
Minor change to the formula. Try this.
Book1
BCDE
1IDNAMETYPERSGSN
20INV445ECT1
30INV445ECT2
40INV445ECT8180
52MODULE18HSSD2962
62MODULE18HSSD13603
72MODULE18HSSD764
82MODULE18HSSD6200
Sheet2
Cell Formulas
RangeFormula
E2:E8E2=TEXTAFTER(FILTER($A$2:$A$80,ISNUMBER(SEARCH(E1,$A$2:$A$80))),"= ")
B2:D8B2=LET(t,TAKE($A$2:$A$80,XMATCH("*RSGSN = " & $E2 & "*",$A$2:$A$80,2)),TEXTAFTER(XLOOKUP("*" & B$1:D$1 & "*",t,t,,2,-1),"= "))
Dynamic array formulas.
 
Upvote 0
I ended up writing the macro, and it really did take less time than the formulas. If you'd like to try it:

Open a COPY of your workbook. Press Alt-F11 to open the VBA editor. Press Alt-IM to Insert a Module. Copy the following code to the window that opens:

VBA Code:
Sub GetData()
Dim InputCell As Range, OutputCell As Range, MyTab As Variant, variables As Variant
Dim i As Long, j As Long, res(1 To 1, 1 To 4) As String, loc As Long, w As String, r As Long

    Set InputCell = Sheets("Sheet9").Range("A2")            ' Where does the text data start
    Set OutputCell = Sheets("Sheet9").Range("M1")           ' Where do you want the results?
    variables = Array("ID", "NAME", "TYPE", "RSGSN")        ' What values are you looking for?
    
    OutputCell.Resize(, 4) = variables                      ' Set the values as headers
' Get the input table
    MyTab = Range(InputCell, InputCell.Offset(InputCell.Offset(1000000).End(xlUp).Row - InputCell.Row + 1)).Value
    
    Application.ScreenUpdating = False
    r = 1                                                   ' Set the first output row (as an offset)
    For i = 1 To UBound(MyTab)                              ' check each row in the input table
        w = WorksheetFunction.Trim(MyTab(i, 1))             ' remove excess spaces
        For j = 0 To 3                                      ' check to see if any of the values are found
            loc = InStr(1, w, variables(j) & " =", vbTextCompare)   ' where does the value start?
            If loc > 0 Then                                         ' Did we find one?
                If res(1, j + 1) <> "" Then                         ' Have we already got one we haven't printed?
                    OutputCell.Offset(r).Resize(, 4).Value = res    ' yes, print it
                    r = r + 1                                       ' increment the output row
                End If
                res(1, j + 1) = Trim(Mid(w, loc + Len(variables(j)) + 2))   ' save the found value
                Exit For                                            ' exit the loop
            End If
        Next j
        If UCase(Right(Trim(w), 3)) = "END" Then            ' Is this an END line?
            OutputCell.Resize(, 4).Offset(r) = res          ' Yes, display the record in progress
            r = r + 1                                       ' Increment the output row
            Erase res                                       ' Clear the record in progress
        End If
    Next i
    OutputCell.Offset(r).Resize(, 4).Value = res            ' Display the last record
    Application.ScreenUpdating = True
    
End Sub

On the first 2 lines that start with "Set InputCell = " and "Set OutputCell = ", change the sheet and cell names to where your data starts, and where you want the results. Go back to Excel. Press Alt-F8 to open the macro selector. Select GetData and click Run.
 
Last edited:
Upvote 0
Don't worry about the time spent, no one was twisting my arm! 😉

I'm not sure why the B:E columns aren't populating correctly, but I suspect that it's because the number of spaces that the values start with aren't what I expected. I tried adding a TRIM to the function, see if this helps:

B2:

Excel Formula:
=IFERROR(MID(TRIM(INDEX($A:$A,AGGREGATE(15,6,ROW(INDIRECT(VLOOKUP(ROW(),$G$2:$J$100,2)&":"&VLOOKUP(ROW(),$G$2:$J$100,3)))/ISNUMBER(SEARCH(B$1&" =",INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,2)):INDEX($A:$A,VLOOKUP(ROW(),$G$2:$J$100,3)))),ROW()-VLOOKUP(ROW(),$G$2:$J$100,1)+1))),LEN(B$1)+4,10),IF(ROW()>SUM($J:$J)+ROW(B$2)-1,"",B1))

If you have Excel 365 to work with, I'd suggest giving Cubist's formulas a shot, since they're far shorter. If you'd like me or Cubist to tweak our formulas, you'll need to show us the values that should be there, but aren't showing correctly.

Also, just to be sure, are you expecting 1 value each of ID, Name, and Type, and multiple of RSGSN?

Let us know. If you'd like to try a VBA solution, let me know.
I've tried with your modified formula containing TRIM() and now the values are polulating but only for values related with Type=ECT and doesn´t show the values related with type=HSSD. (* In actual data the parameter names are larger, so maybe needs something independent of the parameter name length)

After trying your Formula solutions, for the sake of perfomance and for ease, I now think could be better VBA solution since when I try with actual data (more than 14K rows) it happen to be very slow.

Image is better than 1000 words, so I'm attaching an image explaining the structure of the input data in column A. Basically is conformed of one ore more blocks. Each block has a main sub-block and 0 or more sub-blocks and ends with line "--- END". Main Sub-block contains all parameters (including TYPE) and the other sub-blocks only some parameters (inlcuding RSGSN, but TYPE is not included). All Parameters in each Main sub-block or in secundary sub-blocks always have a value related for each parameter. Thanks again for any help.
 

Attachments

  • Blocks.jpg
    Blocks.jpg
    160.3 KB · Views: 7
Upvote 0

Forum statistics

Threads
1,221,846
Messages
6,162,376
Members
451,760
Latest member
samue Thon Ajaladin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top