600 MB text files to Access - how to do it quickly?

Jaymond Flurrie

Well-known Member
Joined
Sep 22, 2008
Messages
921
Office Version
  1. 365
Platform
  1. Windows
So the idea is that I have now split up my 6GB flatfile into ten smaller ones, they're just text files with one ID number, four 1-10 numbers and 2400 0/1s separated by tabs there. As an example data (using only four of those 2400):

ID One Two Three A B C D
1 4 5 6 0 1 0 1
12 5 6 7 1 1 0 1

I would need to get these to Access as quickly as possible and I already have VBA code to create tables with Integer-Integer-Integer-Integer-BIT-BIT-BIT-BIT fields, but what's the smartest way to get the data there?

Let's assume one of my 600MB files is in C:\fromhere1.txt

Also, would it be faster/better if I would use ACCDB instead of MDB as my databases for this kind of data?
 
I'm pretty much in the dark - just willing to try some experiments. I've virtually no experience with anything that's over a few hundred thousand rows whether its in Access or text files (probably not much even in SQL Server this large either).

I think that with text file processing it doesn't matter much whether you go through through all the rows or choose 100,000 of them "randomly", since you'd have to access the rows sequentially. Unless you can use a put/get routine with some old VB6 methods, or if there's enough consistency to read the file as a binary stream (i.e., if each line is exacty the same number of bytes). If ADO can handle 2400 columns then that may be a shortcut. But I am just thinking out loud.

I don't think I understand your filtering down process (with selections, user input, etc. works) well enough to comment on that.

Is your text file really 600MB. I filled up almost 500MB with only 100,000 rows of test data.
 
Upvote 0

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
I'm pretty much in the dark - just willing to try some experiments. I've virtually no experience with anything that's over a few hundred thousand rows whether its in Access or text files (probably not much even in SQL Server this large either).

I think that with text file processing it doesn't matter much whether you go through through all the rows or choose 100,000 of them "randomly", since you'd have to access the rows sequentially. Unless you can use a put/get routine with some old VB6 methods, or if there's enough consistency to read the file as a binary stream (i.e., if each line is exacty the same number of bytes). If ADO can handle 2400 columns then that may be a shortcut. But I am just thinking out loud.

I don't think I understand your filtering down process (with selections, user input, etc. works) well enough to comment on that.

ADO's limit seems to be 255 columns. This is pretty new for me too, even my original project of 100k rows seemed huge to me and then my employer went to made it ten times as big... not to talk about multiplying the number of columns by 80.
 
Upvote 0
Hmmm....that may be a limitation of the JET database engine, if you are using a JET provider. Wonder if there's any other providers out there you can use...? Not sure.
 
Upvote 0
Hmmm....that may be a limitation of the JET database engine, if you are using a JET provider. Wonder if there's any other providers out there you can use...? Not sure.

As far as I found with Google, that 255 is both ADO and JET problem. But the good news is that as I have nothing else than 0s and 1s in my columns beyond #4, they can be combined and processed from there.
 
Upvote 0
Here's the check-function I quickly built to check whether the row should be added to the sum or not.

Code:
Function bCheckCriteria(vTxtRow As Variant, vCriteria As Variant) As Boolean
    Dim i As Integer, j As Integer
    Dim iOne As Integer
    Dim iTwo As Integer
    Dim iThree As Integer
    Dim iCriteria(0 To 2) As Integer
    Dim bCheck(0 To 2) As Boolean
    
    Dim bOne As Boolean
    Dim bTwo As Boolean
    Dim bThree As Boolean
    
    iCriteria(0) = vTxtRow(1)
    iCriteria(1) = vTxtRow(2)
    iCriteria(2) = vTxtRow(3)
    
    bCheck(0) = False
    bCheck(1) = False
    bCheck(2) = False
    
    For j = LBound(bCheck) To UBound(bCheck)
        For i = LBound(vCriteria, 2) To UBound(vCriteria, 2)
            If vCriteria(j, i) = iCriteria(j) Then
                bCheck(j) = True
                Exit For
            End If
        Next i
        
        If Not bCheck(j) Then
            bCheckCriteria = False
            Exit Function
        End If
    Next j

    bCheckCriteria = True

End Function
 
Upvote 0
What version of Excel do you have access to?

Do you have access to anything other than Excel/Access?
 
Upvote 0
What version of Excel do you have access to?

Do you have access to anything other than Excel/Access?

I need to make this work with Excel 2007. I think I'm pretty much forced to use Excel / Access / basic txt-editors. SQL Server etc. are out of question.
 
Upvote 0
xenou, how does the join function work? I mean, when you make, say, 2500 joins, does it do like
joinedalready = ""
tobejoined = 1
joinedalready = 1
tobejoined = 1
joining = 1
joining = 11
joinedalready = 11
tobejoined = 1
joining = 1
joining = 11
joining = 111
joinedalready = 111

When it gets closer to that 2500, does it need to use 2500 internal commands to get there?

I have done this kind of "joining in loop" with an array, something like:

Code:
    ReDim strSample(0 To 50)

    strLine = vbNullString
    
    For i = 0 To iRows
        k = 0
        For j = 0 To iCols
            Select Case j
                Case 0
                    strLine = strLine & i & Chr(9)
                Case 1, 2, 3
                    strLine = strLine & Int(10 * Rnd + 1) & Chr(9)
                Case Else
                    strLine = strLine & Int(2 * Rnd) & Chr(9)
            End Select
            
            If j Mod 50 = 0 Or j = iCols Then
                k = k + 1
                strSample(k) = strLine
                strLine = vbNullString
            End If
        
        Next j
Next i

        For n = LBound(strSample) To UBound(strSample)
           strLine = strLine & strSample(n)
        Next n

So my question is, why to use my version or your version, is there some critical thing one does and one doesn't?
 
Upvote 0
Whats the domain of columns 1,2,3, and 4 respectively. I.e., what are the valid (range of) values for each of those four columns? Or, is that information relevant in any way, or can it be effectively "discarded".
 
Upvote 0
Whats the domain of columns 1,2,3, and 4 respectively. I.e., what are the valid (range of) values for each of those four columns? Or, is that information relevant in any way, or can it be effectively "discarded".

I'll PM you the details.
 
Upvote 0

Forum statistics

Threads
1,225,149
Messages
6,183,194
Members
453,151
Latest member
Lizamaison

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top