I have an xml file that is about 80MB. I constantly have to import it in excel and do filtering in it. Sadly, it takes too long to open and sometimes crash. So I have decided to manually break it down into pieces.
My steps are usually:
1-Use a software that split the files into multiple pieces. the
2-Track the starting and ending tags. The schema looks like this:
3- Since each new file will be imported independently into excel, I have to make sure the opening and closing tags are included as well as the value for each tag. Because the files are split randomly at any lines.
4- Import the files into excel and filter each of them to get the highest pfd value.
Manually I have done that but it is time consuming. I have decided to automate that process. but all I am missing at this point to is to split the xml file into smaller pieces. I tried but failed. Now I am thinking If I turn the xml file into a text file, then would that be easier to achieve my goal. At this point what I am trying to achieve is:
1- Open the original text file and copy the first 5000 lines or so to a text file
2- add the closing tags at the end of the file :
</by_a>
</pfd_mask>
</satellite_system>
3- Save it as xxx_part1.txt
4- Somehow keep in memory the value for by_a a and by_b b
3-Copy the next 5000 lines.
5- add the opening tags
<satellite_system = Intelsat>
<pfd_mask = integer_value>
then add the last known value for by_a a and by_ b b
paste the 5000 lines
then add the closing tags
</by_a>
</pfd_mask>
</satellite_system>
6-Do that until the end of the original file.
This is the code I have started with:
My steps are usually:
1-Use a software that split the files into multiple pieces. the
2-Track the starting and ending tags. The schema looks like this:
Rich (BB code):
<satellite_system = Intelsat>
<pfd_mask = integer_value>
<by_a a = 50>
<by_b b = 0>
<pfd c="0">-177.1175</pfd>
<pfd c="10">-177.3175</pfd>
</by_b>
<by_b b = 1>
<pfd c="0">-174.617</pfd>
<pfd c="10">-174.817</pfd>
</by_b>
</by_a>
</pfd_mask>
</satellite_system>
3- Since each new file will be imported independently into excel, I have to make sure the opening and closing tags are included as well as the value for each tag. Because the files are split randomly at any lines.
4- Import the files into excel and filter each of them to get the highest pfd value.
Manually I have done that but it is time consuming. I have decided to automate that process. but all I am missing at this point to is to split the xml file into smaller pieces. I tried but failed. Now I am thinking If I turn the xml file into a text file, then would that be easier to achieve my goal. At this point what I am trying to achieve is:
1- Open the original text file and copy the first 5000 lines or so to a text file
2- add the closing tags at the end of the file :
</by_a>
</pfd_mask>
</satellite_system>
3- Save it as xxx_part1.txt
4- Somehow keep in memory the value for by_a a and by_b b
3-Copy the next 5000 lines.
5- add the opening tags
<satellite_system = Intelsat>
<pfd_mask = integer_value>
then add the last known value for by_a a and by_ b b
paste the 5000 lines
then add the closing tags
</by_a>
</pfd_mask>
</satellite_system>
6-Do that until the end of the original file.
This is the code I have started with:
VBA Code:
Sub Text_file_parser()
Dim FSO As FileSystemObject
Dim txtStream As TextStream
Dim DetFile As String
Dim sFile As String
Dim New_file As Object
Dim file_path, New_filename, New_filepath As String
Dim i, j, k As Integer
Dim header, body, footer As Variant
Dim File_size As Long
DetFile = "C:\\Documents\ RDTE\test"
sFile = Dir(DetFile & "\*.txt")
file_path = DetFile & "\" & sFile
'MsgBox file_path
New_filename = "New_" & sFile
New_filepath = DetFile & "\" & New_filename
'MsgBox New_filepath
'MsgBox file_path
Set FSO = New FileSystemObject
Set txtStream = FSO.OpenTextFile(file_path, ForReading, False)
'Set the number of File
File_size = FileLen(file_path) ' remove in function
Debug.Print File_size
If Not FSO.FileExists(New_filepath) Then
Set New_file = FSO.CreateTextFile(New_filepath, False, True)
'Do Until txtStream.AtEndOfStream
i = 0
j = 0
k = 0
For i = 1 To 3
ThisLine = txtStream.ReadLine
'i = i + 1
'Debug.Print "Line " & i, ThisLine
New_file.WriteLine (ThisLine)
Next
Else
'Set txtStream = FSO.OpenTextFile(New_filepath, ForWriting, False)
Set New_file = FSO.OpenTextFile(New_filepath, ForWriting, False)
For i = 1 To 3
ThisLine = txtStream.ReadLine
'i = i + 1
'Debug.Print "Line " & i, ThisLine
New_file.WriteLine (ThisLine)
Next
' Loop
End If
Do While Not txtStream.AtEndOfStream
'FooterLine = txtStream.ReadLine
' Debug.Print FooterLine
Loop
txtStream.Close
End Sub