Hello everyone,
i would like your opinion on the following:
I need to extract data from several word documents (about 80 pages each). These documents have somewhat of a standard formatting that enables the structured extraction. Example:
Each line/paragraph in word doc will have unique row in excel. Initially the whole extraction would take about 20 mins, and after some experimentation, I managed to reduce processing time to 45-60 seconds.
this is the loop used to read the word document, and I wanted to know what could be faster than this (other loop, or other way to read/write?)
used loop to read the document
and this is how i write it into excel and structure the data:
to migrate colors in word doc to excel i have:
thank you all
i would like your opinion on the following:
I need to extract data from several word documents (about 80 pages each). These documents have somewhat of a standard formatting that enables the structured extraction. Example:
- word in upper case in document will go to column B
- words with alignment type = right with go to column D etc...
Each line/paragraph in word doc will have unique row in excel. Initially the whole extraction would take about 20 mins, and after some experimentation, I managed to reduce processing time to 45-60 seconds.
this is the loop used to read the word document, and I wanted to know what could be faster than this (other loop, or other way to read/write?)
used loop to read the document
Code:
For Each p In word_doc.Paragraphs
w_h_cor = p.Range.HighlightColorIndex
w_font_cor = p.Range.Font.ColorIndex
w_bold = p.Range.Font.Bold
w_list = p.Range.ListFormat.ListType
w_length = Len(p.Range.Text)
w_descr = p.Range.Text
w_align = p.Alignment
Call import_export(w_h_cor, w_font_cor, w_bold, w_list, w_length, w_descr, w_align, i)
i = i + 1
Next p
and this is how i write it into excel and structure the data:
Code:
Public Sub import_export(ByVal var_h_cor As Integer, _
ByVal var_font_cor As Integer, _
ByVal var_bold As Long, _
ByVal var_list As Integer, _
ByVal var_length As Long, _
ByVal var_descr As String, _
ByVal var_align As Integer, _
ByVal var_i As Long)
ReDim v_array(0 To ubound_import_export, 1 To 5, 1 To 3) ' ubound_import_export is the number of lines in doc
Dim w_tipo, w_cor As Integer
If var_length > 2 Then 'to ignore certain lines 'Or var_list <> 2
If var_descr = UCase(var_descr) Then
w_tipo = 1 'type 1
End If
If (var_bold <> -1 Or var_bold = 0) And var_descr <> UCase(var_descr) And var_list <> 2 Then
w_tipo = 3 'type 3
End If
If (var_bold = -1 Or var_bold <> 0) And var_descr <> UCase(var_descr) And var_list <> 2 And var_align <> wdAlignParagraphRight Then
w_tipo = 4 'type 4
End If
If (var_font_cor = wdRed Or var_h_cor = wdRed) Or var_font_cor = 6 Or var_h_cor = 6 Then 'Or var_font_cor = 11 Or var_h_cor = 11
w_cor = 1 'red
Else: w_cor = 3
End If
If (var_font_cor <> wdRed And var_font_cor <> wdAuto And var_font_cor <> wdBlack) Or _
var_h_cor <> wdRed And var_h_cor <> wdAuto And var_h_cor <> wdBlack Then
w_cor = 2 'green
'Else: w_cor = 3
End If
If var_align = wdAlignParagraphRight Then
w_tipo = 2 'juizo
End If
If var_list = 2 And var_align <> wdAlignParagraphRight Then
w_tipo = 5 ' type 5
End If
v_array(var_i, w_tipo, w_cor) = Trim(WorksheetFunction.Clean(var_descr))
ActiveWorkbook.Worksheets("PADRAO").Cells(var_i - cnt_null, w_tipo * 2) = v_array(var_i, w_tipo, w_cor)
Call atribui_cor(w_cor, var_i - cnt_null, w_tipo)
Else: cnt_null = cnt_null + 1
End If
End Sub
to migrate colors in word doc to excel i have:
Code:
Sub atribui_cor(ByVal var_cor As Integer, _
ByVal var_row As Long, _
ByVal var_col As Integer)
Select Case var_cor
Case 1
ActiveWorkbook.Worksheets("PADRAO").Cells(var_row, var_col * 2).Interior.Color = vbRed
Case 2
ActiveWorkbook.Worksheets("PADRAO").Cells(var_row, var_col * 2).Interior.Color = vbGreen
End Select
End Sub
thank you all