How to normalize a large dataset into less columns?

DevAlex

New Member
Joined
May 26, 2022
Messages
14
Office Version
  1. 2021
Platform
  1. Windows
I have the following data:

Item_ID2010_Sales2010_Costs2010_UnitsSold2011_Sales2011_Costs2011_UnitsSold2012_Sales2012_Costs2012_UnitsSold2013_Sales2013_Costs2013_UnitsSold2014_Sales2014_Costs2014_UnitsSold2015_Sales2015_Costs2015_UnitsSold
41988098​
448​
689​
410​
505​
761​
821​
822​
689​
871​
597​
837​
625​
795​
159​
688​
760​
541​
440​
85479389​
930​
930​
967​
732​
951​
429​
764​
340​
304​
272​
540​
424​
285​
298​
626​
718​
158​
740​
63981760​
785​
366​
874​
569​
818​
347​
462​
775​
759​
415​
302​
294​
134​
232​
395​
194​
708​
683​
46774076​
452​
526​
387​
885​
220​
470​
248​
563​
767​
851​
120​
490​
947​
622​
874​
742​
554​
352​
68727776​
500​
821​
204​
675​
817​
296​
214​
929​
136​
545​
604​
413​
147​
283​
917​
710​
109​
718​
13199311​
805​
219​
830​
587​
452​
731​
674​
972​
375​
612​
142​
979​
442​
595​
662​
981​
218​
563​
7026817​
971​
983​
368​
636​
644​
978​
357​
486​
376​
101​
387​
276​
636​
307​
744​
389​
705​
251​
68650619​
460​
344​
428​
397​
196​
594​
407​
490​
564​
872​
260​
217​
393​
276​
530​
532​
804​
197​
91968456​
541​
389​
423​
786​
824​
153​
419​
311​
856​
445​
710​
855​
691​
928​
770​
730​
994​
875​
6295037​
269​
475​
463​
537​
154​
621​
997​
410​
730​
315​
948​
577​
503​
706​
150​
389​
220​
509​
99957946​
144​
213​
319​
928​
631​
881​
227​
757​
758​
674​
770​
779​
484​
234​
711​
766​
592​
363​
38507206​
766​
697​
888​
344​
176​
289​
342​
819​
753​
341​
119​
289​
969​
744​
795​
998​
282​
891​
20902921​
256​
981​
942​
836​
311​
981​
772​
840​
161​
591​
562​
376​
102​
896​
355​
844​
766​
386​
93167988​
109​
388​
970​
720​
259​
220​
855​
748​
300​
392​
814​
801​
670​
637​
677​
197​
368​
981​
43881861​
662​
819​
156​
627​
649​
877​
143​
986​
498​
983​
304​
365​
378​
634​
167​
622​
492​
323​
97293992​
237​
368​
416​
386​
236​
487​
328​
629​
551​
627​
119​
903​
996​
320​
222​
779​
438​
747​
88816807​
521​
219​
928​
440​
778​
219​
583​
128​
243​
761​
523​
365​
755​
195​
449​
279​
969​
605​
15380507​
958​
204​
633​
379​
522​
609​
746​
175​
816​
227​
812​
399​
695​
105​
475​
216​
288​
631​
14420025​
718​
232​
781​
142​
680​
610​
230​
549​
787​
801​
960​
167​
597​
804​
146​
586​
732​
224​

The issue is that there is no year column in this dataset, which I want to add manually so that I can I graph the data. To do so I need to normalize the data and to have only 3 columns for each value (sales, costs, unitssold).

In the end I would like something like this:
Item_IDYearSalesCostsUnitsSold
36319622​
2010​
712​
372​
586​
36319622​
2011​
593​
123​
601​
36319622​
2012​
458​
214​
896​
36319622​
2013​
191​
163​
690​
36319622​
2014​
355​
935​
662​
36319622​
2015​
820​
498​
481​
49355938​
2010​
708​
109​
447​
49355938​
2011​
597​
714​
524​
49355938​
2012​
602​
144​
485​
49355938​
2013​
405​
633​
705​
49355938​
2014​
202​
521​
333​
49355938​
2015​
220​
190​
236​

What is the best way to go about cleaning the data to more like the bottom graph?

Thank you!
 

Excel Facts

Quick Sum
Select a range of cells. The total appears in bottom right of Excel screen. Right-click total to add Max, Min, Count, Average.
Have you looked into Power Query? With a few steps, you can transform the source table into the desired output table. Here is the M Code I used:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Item_ID"}, "Attribute", "Value"),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv), {"Attribute.1", "Attribute.2"}),
    #"Merged Columns" = Table.CombineColumns(#"Split Column by Delimiter1",{"Item_ID", "Attribute.1"},Combiner.CombineTextByDelimiter(":", QuoteStyle.None),"Merged"),
    #"Pivoted Column" = Table.Pivot(#"Merged Columns", List.Distinct(#"Merged Columns"[Attribute.2]), "Attribute.2", "Value", List.Sum),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Pivoted Column", "Merged", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Merged.1", "Merged.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter2",{{"Merged.1", "Item_ID"}, {"Merged.2", "Year"}})
in
    #"Renamed Columns"

Source table:
Book6
ABCDEFGHIJKLMNOPQRS
1Item_ID2010_Sales2010_Costs2010_UnitsSold2011_Sales2011_Costs2011_UnitsSold2012_Sales2012_Costs2012_UnitsSold2013_Sales2013_Costs2013_UnitsSold2014_Sales2014_Costs2014_UnitsSold2015_Sales2015_Costs2015_UnitsSold
241988098448689410505761821822689871597837625795159688760541440
385479389930930967732951429764340304272540424285298626718158740
463981760785366874569818347462775759415302294134232395194708683
546774076452526387885220470248563767851120490947622874742554352
668727776500821204675817296214929136545604413147283917710109718
713199311805219830587452731674972375612142979442595662981218563
87026817971983368636644978357486376101387276636307744389705251
968650619460344428397196594407490564872260217393276530532804197
1091968456541389423786824153419311856445710855691928770730994875
116295037269475463537154621997410730315948577503706150389220509
1299957946144213319928631881227757758674770779484234711766592363
1338507206766697888344176289342819753341119289969744795998282891
1420902921256981942836311981772840161591562376102896355844766386
1593167988109388970720259220855748300392814801670637677197368981
1643881861662819156627649877143986498983304365378634167622492323
1797293992237368416386236487328629551627119903996320222779438747
1888816807521219928440778219583128243761523365755195449279969605
1915380507958204633379522609746175816227812399695105475216288631
2014420025718232781142680610230549787801960167597804146586732224
Sheet1

Small excerpt of Output table:
Book6
ABCDE
1Item_IDYearSalesCostsUnitsSold
2131993112010805219830
3131993112011587452731
4131993112012674972375
5131993112013612142979
6131993112014442595662
7131993112015981218563
8144200252010718232781
9144200252011142680610
10144200252012230549787
11144200252013801960167
12144200252014597804146
13144200252015586732224
14153805072010958204633
15153805072011379522609
16153805072012746175816
17153805072013227812399
18153805072014695105475
19153805072015216288631
20209029212010256981942
21209029212011836311981
22209029212012772840161
Table1
 
Upvote 0
I have the following data:

Item_ID2010_Sales2010_Costs2010_UnitsSold2011_Sales2011_Costs2011_UnitsSold2012_Sales2012_Costs2012_UnitsSold2013_Sales2013_Costs2013_UnitsSold2014_Sales2014_Costs2014_UnitsSold2015_Sales2015_Costs2015_UnitsSold
41988098​
448​
689​
410​
505​
761​
821​
822​
689​
871​
597​
837​
625​
795​
159​
688​
760​
541​
440​
85479389​
930​
930​
967​
732​
951​
429​
764​
340​
304​
272​
540​
424​
285​
298​
626​
718​
158​
740​
63981760​
785​
366​
874​
569​
818​
347​
462​
775​
759​
415​
302​
294​
134​
232​
395​
194​
708​
683​
46774076​
452​
526​
387​
885​
220​
470​
248​
563​
767​
851​
120​
490​
947​
622​
874​
742​
554​
352​
68727776​
500​
821​
204​
675​
817​
296​
214​
929​
136​
545​
604​
413​
147​
283​
917​
710​
109​
718​
13199311​
805​
219​
830​
587​
452​
731​
674​
972​
375​
612​
142​
979​
442​
595​
662​
981​
218​
563​
7026817​
971​
983​
368​
636​
644​
978​
357​
486​
376​
101​
387​
276​
636​
307​
744​
389​
705​
251​
68650619​
460​
344​
428​
397​
196​
594​
407​
490​
564​
872​
260​
217​
393​
276​
530​
532​
804​
197​
91968456​
541​
389​
423​
786​
824​
153​
419​
311​
856​
445​
710​
855​
691​
928​
770​
730​
994​
875​
6295037​
269​
475​
463​
537​
154​
621​
997​
410​
730​
315​
948​
577​
503​
706​
150​
389​
220​
509​
99957946​
144​
213​
319​
928​
631​
881​
227​
757​
758​
674​
770​
779​
484​
234​
711​
766​
592​
363​
38507206​
766​
697​
888​
344​
176​
289​
342​
819​
753​
341​
119​
289​
969​
744​
795​
998​
282​
891​
20902921​
256​
981​
942​
836​
311​
981​
772​
840​
161​
591​
562​
376​
102​
896​
355​
844​
766​
386​
93167988​
109​
388​
970​
720​
259​
220​
855​
748​
300​
392​
814​
801​
670​
637​
677​
197​
368​
981​
43881861​
662​
819​
156​
627​
649​
877​
143​
986​
498​
983​
304​
365​
378​
634​
167​
622​
492​
323​
97293992​
237​
368​
416​
386​
236​
487​
328​
629​
551​
627​
119​
903​
996​
320​
222​
779​
438​
747​
88816807​
521​
219​
928​
440​
778​
219​
583​
128​
243​
761​
523​
365​
755​
195​
449​
279​
969​
605​
15380507​
958​
204​
633​
379​
522​
609​
746​
175​
816​
227​
812​
399​
695​
105​
475​
216​
288​
631​
14420025​
718​
232​
781​
142​
680​
610​
230​
549​
787​
801​
960​
167​
597​
804​
146​
586​
732​
224​

The issue is that there is no year column in this dataset, which I want to add manually so that I can I graph the data. To do so I need to normalize the data and to have only 3 columns for each value (sales, costs, unitssold).

In the end I would like something like this:
Item_IDYearSalesCostsUnitsSold
36319622​
2010​
712​
372​
586​
36319622​
2011​
593​
123​
601​
36319622​
2012​
458​
214​
896​
36319622​
2013​
191​
163​
690​
36319622​
2014​
355​
935​
662​
36319622​
2015​
820​
498​
481​
49355938​
2010​
708​
109​
447​
49355938​
2011​
597​
714​
524​
49355938​
2012​
602​
144​
485​
49355938​
2013​
405​
633​
705​
49355938​
2014​
202​
521​
333​
49355938​
2015​
220​
190​
236​

What is the best way to go about cleaning the data to more like the bottom graph?

Thank you!
A solution using VBA.

Remember to change the worksheet names where indicated.

VBA Code:
Public Sub subNormaliseData()
Dim intLastColumn As Integer
Dim intLastRow As Integer
Dim intNextRow As Integer
Dim i As Integer
Dim WsExisting As Worksheet
Dim WsDestination As Worksheet
Dim rngData As Range
Dim intRows As Integer

  ActiveWorkbook.Save

  ' INDICATE YOUR EXISTING DATA WORKSHEET HERE.
  Set WsExisting = Worksheets("Existing")
  
  ' INDICATE YOUR DESTINATION WORKSHEET HERE.
  Set WsDestination = Worksheets("Destination")
    
  With WsExisting
    intLastRow = .Cells(Rows.Count, 1).End(xlUp).Row
    intLastColumn = .Cells(1, Columns.Count).End(xlToLeft).Column
    intRows = .Cells(Rows.Count, 1).End(xlUp).Row - 1
    Set rngData = .Range("A2:A" & intLastRow).Resize(intLastRow - 1, intLastColumn)
  End With
    
  With WsDestination
  
    .Cells.Clear
  
    .Range("A1:E1").Value = Array("Year", "Item_ID", "Sales", "Costs", "UnitsSold")

    For i = 2 To intLastColumn Step 3
      
      intNextRow = .Cells(Rows.Count, 1).End(xlUp).Row + 1
      With .Range(.Cells(intNextRow, 1), .Cells(intNextRow + intRows - 1, 1))
        .Value = rngData.Columns(1).Value
        .Offset(0, 1).Value = Val(Left(WsExisting.Cells(1, i), 4))
        .Offset(0, 2).Resize(intRows, 3).Value = rngData.Columns(i).Resize(intRows, 3).Value
      End With
      
    Next i
  
  End With
  
  ActiveWorkbook.Save
  
End Sub
 
Upvote 0
Solution
A solution using VBA.

Remember to change the worksheet names where indicated.

VBA Code:
Public Sub subNormaliseData()
Dim intLastColumn As Integer
Dim intLastRow As Integer
Dim intNextRow As Integer
Dim i As Integer
Dim WsExisting As Worksheet
Dim WsDestination As Worksheet
Dim rngData As Range
Dim intRows As Integer

  ActiveWorkbook.Save

  ' INDICATE YOUR EXISTING DATA WORKSHEET HERE.
  Set WsExisting = Worksheets("Existing")
 
  ' INDICATE YOUR DESTINATION WORKSHEET HERE.
  Set WsDestination = Worksheets("Destination")
   
  With WsExisting
    intLastRow = .Cells(Rows.Count, 1).End(xlUp).Row
    intLastColumn = .Cells(1, Columns.Count).End(xlToLeft).Column
    intRows = .Cells(Rows.Count, 1).End(xlUp).Row - 1
    Set rngData = .Range("A2:A" & intLastRow).Resize(intLastRow - 1, intLastColumn)
  End With
   
  With WsDestination
 
    .Cells.Clear
 
    .Range("A1:E1").Value = Array("Year", "Item_ID", "Sales", "Costs", "UnitsSold")

    For i = 2 To intLastColumn Step 3
     
      intNextRow = .Cells(Rows.Count, 1).End(xlUp).Row + 1
      With .Range(.Cells(intNextRow, 1), .Cells(intNextRow + intRows - 1, 1))
        .Value = rngData.Columns(1).Value
        .Offset(0, 1).Value = Val(Left(WsExisting.Cells(1, i), 4))
        .Offset(0, 2).Resize(intRows, 3).Value = rngData.Columns(i).Resize(intRows, 3).Value
      End With
     
    Next i
 
  End With
 
  ActiveWorkbook.Save
 
End Sub
Thank you this is helpful. However, if I wanted to only track 2 columns instead of 3? For example, I want to only track Sales and Costs and do not have units sold. Would I just remove the "UnitsSold" from the array function?
 
Upvote 0
Thank you this is helpful. However, if I wanted to only track 2 columns instead of 3? For example, I want to only track Sales and Costs and do not have units sold. Would I just remove the "UnitsSold" from the array function?

Just substitute this line:
.Range("A1:E1").Value = Array("Year", "Item_ID", "Sales", "Costs", "UnitsSold")
for this line:
.Range("A1:D1").Value = Array("Year", "Item_ID", "Sales", "Costs")

Also substitute this line:
.Offset(0, 2).Resize(intRows, 3).Value = rngData.Columns(i).Resize(intRows, 3).Value
for this line:
.Offset(0, 2).Resize(intRows, 2).Value = rngData.Columns(i).Resize(intRows, 2).Value
 
Upvote 0

Forum statistics

Threads
1,223,246
Messages
6,170,988
Members
452,373
Latest member
TimReeks

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top