CSV Parsing: Stray Comma Conundrum

nigelandrewfoster

Well-known Member
Joined
May 27, 2009
Messages
747
Hiya

I'm using the following function to open a UTF-8 csv file and save it safely as an xlsx file. I use this to preserve the foreign characters from corruption (which happens when I open it directly in Excel):

Code:
Function wsUCF8_CSV_Worksheet(strPathFilename As String, Optional strDestinationFilename As String) As Worksheet ' wb is set to the opened workbook

    Dim wb As Workbook
    
    Set wb = Workbooks.Add
    Set wsUCF8_CSV_Worksheet = wb.Worksheets(1)
    
    With wsUCF8_CSV_Worksheet
        With .QueryTables.Add(Connection:="TEXT;" & strPathFilename, Destination:=.Range("A1"))
             .TextFileParseType = xlDelimited
             .TextFilePlatform = 65001
             .TextFileCommaDelimiter = True
             .Refresh
        End With
    End With
    
    Remove_Workbook_Connections wb
    
    If Not IsMissing(strDestinationFilename) Then
        MakePath "C:\Temp"
        wb.SaveAs "C:\Temp\" & strDestinationFilename, FileFormat:=xlWorkbookDefault
    End If
    
End Function

It works great ... until I encounter an exported CSV that contains commas WITHIN a field (eg. a text field), then it of course splits the line at that point. I cannot see any way around this. How CAN one parse a csv file when a field in that file contains commas AS commas, rather than as a delimiting symbol? I guess the answer is that programmers shouldn't permit fields of csv files to contain (text) commas without some kind of wrapping? I'm talking about the file exports of some sizeable companies.
 
Last edited:

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
IF you add the contents of the file to an array do you know precisely which comma might cause you issues and the number of elements that would be considered abnormal? If so then you do something like the following
Example
Code:
   For i = 0 To UBound(strData)    
        If Len(strData(i)) - Len(Replace(strData(i), ",", "")) >= 191 Then 'checks for the number of columns[-1 less than total since array starts at 0]
            
            strData(i) = ReplaceN(strData(i), ",", "", 186) '(nth comma of interest - 1 ) is 186
        
        End If
    
    Next i
'Can't remember where I found this function
Code:
Function ReplaceN(ByVal str1 As Variant, strFind As String, strReplace As String, N As Long, Optional Count As Long) As String
Dim i As Long, j As Long
Dim strM As String
strM = str1
If Count <= 0 Then Count = 1
For i = 1 To N - 1
    j = InStr(1, strM, strFind) 'char position of the first occurence
    strM = Mid(strM, j + Len(strFind), Len(strM)) 'string that omits the characters before and including j but has the length of the original string
Next i 'this loop will continue to section the string until it has done so N-1 times
If N <= 0 Then
    ReplaceN = str1
Else
    ReplaceN = Mid(str1, 1, Len(str1) - Len(strM)) & Replace(strM, strFind, strReplace, Start:=1, Count:=Count)

End If

End Function

now split the string
 
Last edited:
Upvote 0
Thanks very much for reply and input. I will let you know once I have had a chance to examine the code. I think I need to examine the troublesome file in greater detail using a different method to find out which combination of characters exactly is causing the additional CRs.
 
Upvote 0
Thanks very much for reply and input. I will let you know once I have had a chance to examine the code. I think I need to examine the troublesome file in greater detail using a different method to find out which combination of characters exactly is causing the additional CRs.
Just to be sure is it a .csv or a comma delimited .txt file?
 
Upvote 0
Thanks very much for reply and input. I will let you know once I have had a chance to examine the code. I think I need to examine the troublesome file in greater detail using a different method to find out which combination of characters exactly is causing the additional CRs.

If you don't want to open the .csv in the background and read the contents from the sheet and if the elements inside the .csv file when you use notepad that are supposed to have commas are contained within quotation marks then the following untested function when given a string should return a string with commas that aren't in between quotation marks replaced with question marks.
Code:
Function Change_Comma_Delimiter_Not_Between_Quote(Current_String As String)

Dim Current_String As String, Comma_Here As New Dictionary, Location_Key As Long, Comma_Location As Long, T As Long, _
Comma_Location_Outside_Quotation As Long, Empty_Array_Element_Count As Long


Set Comma_Here = CreateObject("Scripting.Dictionary")


Array_String = Split(Current_String, Chr(34)) 'Split text with a Quotation mark


Location_Key = 1


For X = LBound(Array_String) To UBound(Array_String)
            
        If Array_String(X) = Chr(44) Then 'if the array element is a comma


            For T = 0 To X
                
                If IsEmpty(Array_String(T)) = True Then 'if element is empty
                
                    Empty_Array_Element_Count = Empty_Array_Element_Count + 1
                
                Else
                'gets the length of each preceding element and the current one to determine the location
                Comma_Location_Outside_Quotation = Len(Array_String(T)) + Comma_Location_Outside_Quotation + Empty_Array_Element_Count
                                                                           
                End If
            
            Next T
            
            Comma_Here.Add Location_Key, Comma_Location_Outside_Quotation
            
            Location_Key = Location_Key + 1
            
        End If
    
Next X


For Each element In Comma_Here.Keys


    Current_String = Replace(Current_String, Chr(44), "?", Comma_Here.Items(element), 1)
    'replace 1 comma starting at the location indicated by the dictionary item
    
Next element


Change_Comma_Delimiter_Not_Between_Quote = Current_String




End Function
 
Last edited:
Upvote 0
I'm using the following function to open a UTF-8 csv file and save it safely as an xlsx file. I use this to preserve the foreign characters from corruption (which happens when I open it directly in Excel):

It works great ... until I encounter an exported CSV that contains commas WITHIN a field (eg. a text field), then it of course splits the line at that point. I cannot see any way around this.
I don't work with UTF-8 text myself, so I don't know if the code in my mini-blog article here...

http://www.excelfox.com/forum/showt...Ignores-Delimiters-Located-Inside-Quote-Marks

will work with that type of text or not, but it does work with "normal" text. Give it a try and let me know. Oh, I should mention, the function posted in that article returns a String array split on the specified delimiter but which ignores splitting on the specified delimiter if it is inside quote marks.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,225,073
Messages
6,182,703
Members
453,132
Latest member
nsnodgrass73

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top