Transposing data from multiple csv files into one xlsx file

Rnkhch

Well-known Member
Joined
Apr 28, 2018
Messages
578
Office Version
  1. 365
Platform
  1. Windows
Hello,

I have data in many csv files (more than 500, each corresponding to one person's scores), and I need to combine the data into one file, in transposed form (i.e. rows instead of columns), for easier visualization and processing. Each file contains 5 columns with the same headings (course names), and there is a one row gap between the headings and the data.

Over the weekend, I tried to merge the data manually by creating a blank xlsx file with five sheets (each corresponding to one course). I typed the name of each file (i.e. person) in the first cell of each row and transposed the data right next to it. Considering that there are over 500 files and 5 columns in each, I need to open+select+copy+transpose more than 2500 times :) and I was only able to do this for 50 people before getting way too tired :)

Is there any way to automate this process? I would highly appreciate any insights! Thanks a lot!

P.S. Here is an example of how the data looks like:
Person1.csv
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Physics
[/TD]
[TD]Chemistry
[/TD]
[TD]Math
[/TD]
[TD]Biology
[/TD]
[TD]Statistics
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]98
[/TD]
[TD]94
[/TD]
[TD]99
[/TD]
[TD]89
[/TD]
[TD]93
[/TD]
[/TR]
[TR]
[TD]95
[/TD]
[TD]88
[/TD]
[TD]92
[/TD]
[TD]91
[/TD]
[TD]96
[/TD]
[/TR]
[TR]
[TD]etc
[/TD]
[TD]etc
[/TD]
[TD]etc
[/TD]
[TD]etc
[/TD]
[TD]etc
[/TD]
[/TR]
</tbody>[/TABLE]


And here is how I want the data to look like in the final file:
e.g. for Physics sheet:

[TABLE="class: outer_border, width: 500"]
<tbody>[TR]
[TD]Person1
[/TD]
[TD]98
[/TD]
[TD]95
[/TD]
[TD]etc
[/TD]
[/TR]
[TR]
[TD]Person2
[/TD]
[TD]78
[/TD]
[TD]81
[/TD]
[TD]etc
[/TD]
[/TR]
[TR]
[TD]etc
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
And there are 4 other sheets in the file for the other courses.
 
How many rows on average / max are there in the person csv's?
My guess is max 20
 
Upvote 0

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
On average around 20. The max is probably around 30. If you set it to 30, perhaps I can tweak the code in future if I encounter more than 30?

Thanks a lot!
 
Upvote 0
Finally the code new version
Code:
Sub collect()  'collect version 2 handling a variable number of course names
    Static fso As FileSystemObject
    
    Static dict As Dictionary  'key, item=array()   'scores per course,person
    Dim dictKey As String   'key=course,person
    
    Static dictCourses As Dictionary 'all course names encountered
        
    Const maxCourses As Integer = 10    '<== çhange if too small/big
    Dim course As String, courseColumns(1 To maxCourses) As String, _
        numberOfCourses As Integer, co As Variant
        
    Const maxScores As Integer = 30 '<== change if too small/big
            
    Dim csvFolder As Folder, csvFolderName As String, csvFile As File
    Dim content As TextStream, csvText As String, person As String
    Dim size As Integer, lines() As String, c As Integer, r As Integer, i As Integer
    Dim v() As String, delim As String, scores() As Integer, shtRowScores() As Integer
    Dim sht As Worksheet
    
    If fso Is Nothing Then Set fso = New FileSystemObject
    If dict Is Nothing Then Set dict = New Dictionary
    If dictCourses Is Nothing Then Set dictCourses = New Dictionary
    
    csvFolderName = shtControl.Range("csvFolder")
    
    If Not fso.FolderExists(csvFolderName) Then
        MsgBox "Folder " & csvFolderName & " not found"
        Exit Sub
    End If
    
    delim = shtControl.Range("seperatorChar")
    
    Set csvFolder = fso.GetFolder(csvFolderName)
    size = csvFolder.Files.Count - 1    '1 file per person, exclude .xlsx file
    
    For Each csvFile In csvFolder.Files
    If csvFile.name Like "*.csv" Then
        person = Left(csvFile.name, InStr(csvFile.name, ".") - 1)
        Set content = csvFile.OpenAsTextStream(ForReading, TristateUseDefault)
        csvText = WorksheetFunction.Substitute(content.ReadAll, vbLf, "") 'remove Lf's
        content.Close
        
        lines = Split(csvText, vbCr)
        If UBound(lines) - 2 > maxScores Then
            MsgBox "More than " & maxScores & " scores  for person " & person
            Exit Sub
        End If
        
        v = Split(lines(0), delim)  'row 1 contains course names
        'setup an item per course per person in the dictionary
        numberOfCourses = UBound(v) + 1 'first entry is index 0
        If numberOfCourses > maxCourses Then
            MsgBox "more than " & maxCourses & " courses (" & _
                    numberOfCourses & ") for " & person
            Exit Sub
        End If
        
        '----- initialize course entries for this person -----
        ReDim scores(1 To maxScores) 'empty scores array
        For c = 1 To numberOfCourses
            course = v(c - 1)
            If dictCourses.Exists(course) Then
                dictCourses(course) = dictCourses(course) + 1
            Else
                dictCourses.Add course, 1
            End If
            'setup the course/column relation for this person
            courseColumns(c) = course
            dictKey = course & "," & person
            dict(dictKey) = scores
        Next c
                
        For c = 1 To numberOfCourses 'loop through all the course columns
            course = courseColumns(c)
            ReDim scores(1 To maxScores)
            For r = 2 To UBound(lines)  'row 1 is course names, row 2 is empty
                If lines(r) = "" Then Exit For 'no more data for this course
                
                v = Split(lines(r), delim)
                scores(r - 1) = IIf(v(c - 1) = "", 0, v(c - 1))
            Next r
            dictKey = course & "," & person
            dict(dictKey) = scores
        Next c
    End If  'csvFile.Name Like "*.csv"
    Next csvFile
    
    '----- results to course sheets -----
    For Each co In dictCourses.Keys
        
        '----- set sht to course sheet -----
        Set sht = Nothing
        course = co
        For Each sht In ThisWorkbook.Worksheets
            If sht.name = course Then
                Exit For
            End If
        Next sht
        If sht Is Nothing Then
            'no sheet yet for this course; create it
            With ThisWorkbook
                Set sht = Worksheets.Add(After:=.Worksheets(.Worksheets.Count))
                sht.name = course
            End With
        End If
        
        sht.UsedRange.Clear 'erase sht
        sht.Cells.NumberFormat = "0;;"

        '----- move data to sht -----
        r = 1
        For c = 1 To dict.Count
            dictKey = dict.Keys(c - 1)  'keys index start at 0
            
            If dictKey Like course & ",*" Then 'select all persons for this course
                person = Mid(dictKey, InStr(dictKey, ",") + 1)
                sht.Cells(r, 1) = person
                scores = dict.Items(c - 1)
                
                'transform scores to 2-dimensional array
                ReDim shtRowScores(1 To 1, 1 To maxScores)
                For i = 1 To maxScores
                    shtRowScores(1, i) = scores(i)
                Next i
                
                sht.Cells(r, 2).Resize(, maxScores) = shtRowScores
                r = r + 1
            End If
        Next c
        
    Next co
End Sub 'collect2
 
Upvote 0

Forum statistics

Threads
1,224,828
Messages
6,181,217
Members
453,024
Latest member
Wingit77

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top