Method to transpose vertically stacked, variable sized, subtotaled data from an HTML format to an Excel range suitable for Pivot Tables

curtishavak

New Member
Joined
May 31, 2012
Messages
41
Firstly, apologies for the novel, but I don't know any other way to explain it...

I need to import a (poorly-formatted) HTML report generated from Oracle into Excel. I do this on a regular basis and have developed a technique and even a few basic recorded (I have zero VBA knowledge) macros. However, one of the key things that I run into is the need to use autofill for non consistent number of cells.

I start with a copy paste of the HTML text into Excel, cell A1. I get something that looks like this:
A / B / C / D
1) Total for Posting Year/Month: 2010/04 $4,000
2) Total for Posting Year/Month: 2010/06 $5,000
3) Total for Posting Year/Month: 2010/09 $6,000
4) Total for Customer: ABC, Inc. $15,000
5) Total for Market Segment: Aerospace $93,134
6) Total for Customer Class: Industrial $405,231

All text/values are in column A (A1, A2, A3, etc.) and this is a very small snippet of a much larger report (thousands of lines like this).

Basically this is a descending summation: All totals for each posting are summed from lines of orders that shipped in that month/year and these shipments are all for the customer that comes below them in the list (ABC, Inc.) and ABC, Inc. and all such customers with posting months (non-consecutive, so note that if shipments were only in 4, 6 and 9, all other months are left out of the subtotals completely instead of giving zero values as place-holders) are summed into their Market Segment (Aerospace) and likewise all Market segments are then grouped by Customer Class (Industrial) which corresponds to a sales division.

My goal is to reduce the list to the most basic subtotal (posting year/month) with categories Customer name, Market Segment and Customer Class being listed to the right in their own columns so that I can use a Pivot Table. After a lot of playing with text-to-columns, filtering, indexing the rows and then flipping the whole list upside down so I can use Auto Fill, I get a list that looks like this:

A / B / C / D
Index/ "Totals for" / Category / Shipments
1) 6 / Total for Customer Class: / Industrial /$405,231
2) 5 / Total for Market Segment: / Aerospace / $93,134
3) 4 / Total for Customer: / ABC, Inc. / $15,000
4) 3 / Total for Posting Year/Month: / 2010/09 / $6,000
5) 2 / Total for Posting Year/Month: / 2010/06 / $5,000
6) 1 / Total for Posting Year/Month: / 2010/04 / $4,000


I then use filter on row B to, say, customer class. In this example, this would show one row, but imagine, again, that this contains thousands of values. I then then the following formula into E1:

= C1

And then use auto fill to copy the formula down into the filtered cells in Row E.

I then filter column B by the next "Total for", namely Market Segment and in Row F, use the same formula and use autofill to copy it down.

I do the same then for Customer Name in Row G.

Finally, I unfilter column B, highlight rows E, F and G, copy and paste special values to eliminate the formulas. This gives me something like this (note, I had to shorten some things to fit all this in this post and I couldn't fit in column G. I also added more rows to help illustrate what comes next):

A / B / C / D / E / F
Ind/ "Totals for" / Category / Shipments / Cust. Class / Mkt. Seg
1)12 / T...Customer Class: / Medical /$405,231 / Medical /
2)11 / T...Market Segment:/ Neuro / $93,134 / /Neuro
3)10 / T...Customer: / EFG, Inc. / $15,000 / /
4)9 / T...Year/Month: / 2010/08 / $6,000 / /
5)8 / T...Year/Month: / 2010/09 / $5,000 / /
6)7 / T...Year/Month: / 2010/012 / $4,000 / /
7)6 / T...Customer Class: / Industrial /$405,231 / Industrial /
8)5 / T...Market Segment:/ Aerospace / $93,134 / /Aerospace
9)4 / T...Customer: / ABC, Inc. / $15,000 / /
10)3 / T...Year/Month: / 2010/09 / $6,000 / /
11)2 / T...Year/Month: / 2010/06 / $5,000 / /
12)1 / T...Year/Month: / 2010/04 / $4,000 / /

Now, for shorter reports, I have just been doing the highlighting cell E1, double-clicking the bottom right corner to auto fill down the value until it stops at the next cell with a value already in it, hit CTRL+Down Arrow, then repeat. I then to the same for columns F, G, etc. This yields the following:


A / B / C / D / E / F
Ind/ "Totals for" / Category / Shipments / Cust. Class / Mkt. Seg
1)12 / T...Customer Class: / Medical /$405,231 / Medical /
2)11 / T...Market Segment:/ Neuro / $93,134 / Medical /Neuro
3)10 / T...Customer: / EFG, Inc. / $15,000 / Medical /Neuro
4)9 / T...Year/Month: / 2010/08 / $6,000 / Medical /Neuro
5)8 / T...Year/Month: / 2010/09 / $5,000 / Medical /Neuro
6)7 / T...Year/Month: / 2010/012 / $4,000 / Medical /Neuro
7)6 / T...Customer Class: / Industrial /$405,231 / Industrial /Neuro
8)5 / T...Market Segment:/ Aerospace / $93,134 / Industrial /Aerospace
9)4 / T...Customer: / ABC, Inc. / $15,000 / Industrial /Aerospace
10)3 / T...Year/Month: / 2010/09 / $6,000 / Industrial /Aerospace
11)2 / T...Year/Month: / 2010/06 / $5,000 / Industrial /Aerospace
12)1 / T...Year/Month: / 2010/04 / $4,000 / Industrial /Aerospace

note that Row 7 then contains an inaccuracy (Industrial and Neuro together), but this doesn't matter because this row and those like it are removed by the final step wherein I filter column B to show only "Total for Posting Year/Month", which are then the only unique values containing all of the categories across the rows. I highlight the whole sheet while filtered, and copy into a new sheet to eliminate all the duplicates, yielding this:

A / B / C / D / E / F /G
Ind/ "Totals for" / Category / Shipments/ Cust. Class / Mkt. Seg /Cust.
1)9 / T...Year/Month:/ 2010/08 / $6,000 / Medical /Neuro /EFG Co
2)8 / T...Year/Month:/ 2010/09 / $5,000 / Medical /Neuro /EFG Co
3)7 / T...Year/Month:/ 2010/12 / $4,000 / Medical /Neuro /EFG Co
4)3 / T...Year/Month:/ 2010/09 / $6,000 / Industrial /Aerospace/ABC Co
5)2 / T...Year/Month:/ 2010/06 / $5,000 / Industrial /Aerospace /ABC Co
6)1 / T...Year/Month:/ 2010/04 / $4,000 / Industrial /Aerospace/ABC Co

Afterwards, I split the column C into separate year and month columns, but finally have the individual lines that I need, giving me something a pivot table can digest so I can make YTD comparisons, track customer, divisional and market segment growth, etc.

NOW (assuming you're still reading)... my question is this:

For really long reports, double clicking each cell, hitting control down, REPEAT...is an excruciatingly slow, monotonous task. All the other steps are relatively easy and can even be macro'ed.

Is there a way to automate this step?

I don't know enough about VBA to macro it myself, and when I try recording a Macro, my problem is non-uniform number of rows to fill. For example if one customer ships in Jan, Feb, May and Dec (4 months) and another in March, October and December (3 months), the auto fill doesn't work because when using absolute reference, it records the cell references, not that I want it to copy and paste into all blank cells between the current cell and the next cell with a value in it. Using relative reference yields the same problem only that it pastes in the number of cells I tell it to, so I can't build in variation. Plus each report I run is going to be a different length, so I can't tell it to loop X number of times (even if it did work); I need it to loop until there are simply no more values beneath it.

Alternatively, if there is no way to automate the step, does anyone have any ideas on an alternative way to transpose date organized in such a manner???

I have a 10,000+ line :eeek: report that my Sr. VP has asked for and would prefer not to give myself carpal tunnel with all the double-clicks this would otherwise entail.

GO!
 
ARRRRGH! - it works, but for some reason is stopping at row 48, when I need it to continue indefinitely until there are no more cells in that row.
 
Upvote 0

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
Do you mean, it's filling up to 48, or the loop is not moving past 48?

Step through it and make sure it's calculating the last row correctly. Try it on another column. Instead of "F", use "A".
 
Upvote 0
I must have been looking at something incorrectly, but the Macro is stopping at Row 60, not Row 48 as I originally said. Either way, I've looked at it and tried a few different options and it would appear that the macro only loops through from F2 to F60 (59 times?) and then goes to End Sub for some reason.

I don't know why, since the range was defined as (E2:E), so it should have continued on until the very last row, right?. I'm almost glad it didn't though, because this would have been over 1 million loops and it would have almost certainly have either crashed Excel, or locked up my computer for hours...

Anyway, I tried duplicating the code end on end:

Original:

Sub Monthly5()
'
' Sub Monthly5()
Dim WS As Worksheet
Dim lRow As Long
Dim aCell As Range


Set WS = Sheet1
With WS
lRow = .Cells(.Rows.Count, "F").End(xlUp).Row


For Each aCell In Range("F2:F" & lRow)


If aCell.Offset(1, 0).Value = Empty Then
aCell.Offset(1, 0).Value = aCell.Value
End If


Next
End With
End Sub


Duplicated code:

Sub Monthly5()
'
' Sub Monthly5()
Dim WS As Worksheet
Dim lRow As Long
Dim aCell As Range


Set WS = Sheet1
With WS
lRow = .Cells(.Rows.Count, "F").End(xlUp).Row


For Each aCell In Range("F2:F" & lRow)


If aCell.Offset(1, 0).Value = Empty Then
aCell.Offset(1, 0).Value = aCell.Value
End If


Next
End With
With WS
lRow = .Cells(.Rows.Count, "F").End(xlUp).Row

For Each aCell In Range("F2:F" & lRow)

If aCell.Offset(1, 0).Value = Empty Then
aCell.Offset(1, 0).Value = aCell.Value
End If

Next
End With



End Sub


But all it got me was to go to Row 61. I don't know why it worked even then, because I realized that in order to get it to work, I would have to change the range to (F60:F), and then it hit me: instead of (F2:F), I would use (F:F500), 500 being the arbitrary row number that I use in these reports since the monthly updates never contain more than 500 rows of data.

Now it works. Takes a little longer, because it has to go through 500 rows instead of 60, but it does the job and it still 1,000,000x faster than doing it manually. Worst comes to worst, I go get a snack while it runs.:)


Thanks again - this will be an enormous help in a number of reports!:biggrin:

 
Upvote 0

lRow = .Cells(.Rows.Count, "F").End(xlUp).Row


This is supposed to calculate the last row with data.

It is looking at column F. If there is another column that is full of data all the way to the end, Say, Col A, then you might try that column instead.

When you step through it, after it passes iRow, hover over IRow with mouse pointer and see what the value is. If it's different than the last row with data, then we have other problems.
 
Upvote 0

lRow = .Cells(.Rows.Count, "F").End(xlUp).Row


This is supposed to calculate the last row with data.

It is looking at column F. If there is another column that is full of data all the way to the end, Say, Col A, then you might try that column instead.

When you step through it, after it passes iRow, hover over IRow with mouse pointer and see what the value is. If it's different than the last row with data, then we have other problems.


David - I tried switching it back to the original code and changing "F":

lRow = .Cells(.Rows.Count, "A").End(xlUp).Row

Column A has data in every cell of the range in question, but I had the same problem. I decided to check where the macro was calculating the end of the range by putting a MsgBox command in to display the lRow value (MsgBox lRow) after the above-referenced line of code. When I ran it without changes, it returned "59". I did a google search on determining the end of a range and got this piece of code to test the macro you gave me.

Sub LastRowWithData_xlUp_2()


Dim lastRow As Long


lastRow = ActiveSheet.Range("A" & Rows.Count).End(xlUp).Row


MsgBox lastRow

End Sub

This macro returned "99", which is the correct end-of-range row number.

Sooooo, I checked the two macros against on another and found what I believe is the issue. There should be an "ActiveSheet" in front of ".Cells"... in the code:

lRow = ActiveSheet.Cells(.Rows.Count, "A").End(xlUp).Row

When I did this - it worked perfectly (so far).

Your thoughts?

Current Macro Code:
Sub Monthly5()

'
' Sub Monthly5()
Dim ws As Worksheet
Dim lRow As Long
Dim aCell As Range


Set ws = Sheet1
With ws
lRow = ActiveSheet.Cells(.Rows.Count, "A").End(xlUp).Row


For Each aCell In Range("F2:F" & lRow)


If aCell.Offset(1, 0).Value = Empty Then
aCell.Offset(1, 0).Value = aCell.Value
End If


Next
End With
End Sub
 
Upvote 0
That's what this bit does.
Code:
Set WS = Sheet1
    With WS

Was the data on sheet1? You could specify the sheet name.
Code:
[COLOR=#222222][FONT=Times New Roman]Set WS = ActiveWorkbook.Worksheets("Sheet1")
    With WS
        [/FONT][/COLOR]

The With is just a shorthand way to qualify a object.

So, to write out the LastRow wihtout the With.

lRow = worksheets("Sheet1").Cells(Worksheets("Sheet1").Rows.Count, "F").End(xlUp).Row
 
Upvote 0
This makes sense. Forgive me, but this is my first real foray into VBA coding and I'm sort of self-teaching, using Google searches and Excel forums.

But no - there is no Sheet1 in the workbook that I'm working with. I thought this might be part of the issue, but I didn't know the correct syntax to name the sheet that it should look on, so I tried ActiveSheet.

Oddly enough, though, when I update the code to change "ws" from "Sheet1" to "ActiveWorkbook.Worksheets("2012-2010 Data"), it calculates the last row as 2250. If I leave it as per my last post, it correctly updates the last row (in this case 99, in a test case 96).

Alternatively, if I remove the "With" statements altogether, then it calculates correctly.

Since it works as I have it, I think I'm going to stick with it as-is, especially give how much time I have in it at this point.


I really appreciate your help with all of this.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,176
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top