Converting Lists from Word to tag-formated plaintext

joeyeti

New Member
Joined
Apr 11, 2007
Messages
22
Hi fellow VB knowers (I am but a learner still).

I have a question for you which I struggle with. I need to convert nested Lists in MS WORD (whether numbered or bulleted or mixed) from their original format to a tag-formatted text (as used for instance for Wiki articles or phpBB Forums and such). In my particular case I need the text to have basic HTML tags for the basic formatting - e.g. text or italics - and this final text is used for a Drupal web-system which does the article formatting based on these simplified (or rather reduced) HTML tags.

The basics as Bold or Italics or Headings is not the problem, as I found the Word2MediaWiki macro for Word and edited it to my purposes (mainly giving pre- and post-tags to blocks of given formatting). I also have figured basic Lists, using non-nested items.

What I have problem figuring out how to do are the nested Lists, either numbered (mixed), bulleted (mixed) or totally mixed. The final output should be something like this:


Code:
[list=1]
<LI>List Item 1
[list=a]
<LI>Nested List Item 1
<LI>Nested List Item 2
[/list]
<LI>List Item 2
<UL>
<LI>Nested List Item 1[/list]
<LI>List Item 3
<LI>List Item 4
[/list]


The macro part for Lists that I am using (and which is working fine with normal, non-nested lists) is this:


Code:
Private Sub ConvertLists()
    
    Dim zoznam As List
    
    For Each zoznam In ActiveDocument.Lists
        With zoznam.Range
            If .ListFormat.ListType = wdListBullet Then
                .InsertAfter "[/list]"
            Else
                .InsertAfter "[/list]"
            End If
        End With
    Next zoznam
    
    Dim para As Paragraph
    For Each para In ActiveDocument.ListParagraphs
        With para.Range
            For i = 1 To .ListFormat.ListLevelNumber
                .InsertBefore "[*]"
                .InsertAfter ""
            Next i
        End With
     Next para
 
    For Each zoznam In ActiveDocument.Lists
        With zoznam.Range
            If .ListFormat.ListType = wdListBullet Then
                .InsertBefore "<ul>"
            Else
                .InsertBefore "[list=1]"
            End If
            
            .ListFormat.RemoveNumbers
        
        End With
    Next zoznam
    
End Sub


If you can help me in any way to get also nested Lists working with this macro, I would be very grateful.

Thx!

Joe
 

Excel Facts

Fastest way to copy a worksheet?
Hold down the Ctrl key while dragging tab for Sheet1 to the right. Excel will make a copy of the worksheet.
To give a clear hint:

What I have as the source (directly seen in a Word 2003 document)


  1. <LI>List Item 1

    • <LI>Nested List Item 1
      <LI>Nested List Item 2
    <LI>List Item 2
    <UL>
    <LI>Nested List Item 1
<LI>List Item 3
<LI>List Item 4
[/list]

and what I need the text to look like (for copying into the Drupal system)

Code:
[list=1] 
<LI>List Item 1 
[list=a] 
<LI>Nested List Item 1 
<LI>Nested List Item 2 
[/list] 
<LI>List Item 2 
<UL> 
<LI>Nested List Item 1[/list] 
<LI>List Item 3 
<LI>List Item 4 
[/list]

You may say it is the same (in HTML terms) and I should probably export the whole page into HTML and paste that (as Drupal uses HTML tags by himself). BUT, there are other things besides the Lists in the whole Macro I use for the Drupal-format-conversion and I want the text to be clean of HTML garbage when I paste it into Drupal and contain only supported Tags.
 
Upvote 0
This is quite another approach (from a guy on a different fora) and I would like to know your oppinions on the below questions:

Code:
Sub listconv()

Dim p As Long, i As Long
Dim a(100) As String
Dim zoznam As List
    
For Each zoznam In ActiveDocument.Lists
With zoznam.Range

Dim bBullet As Boolean, bPrev As Boolean, bEnd As Boolean
Dim sType1, sType As String
For p = 1 To .Paragraphs.Count
    bEnd = False
    'Determine what kind of list you have
    sType1 = Replace(.Paragraphs(p).Range.ListFormat.ListString, ".", "")
    sType = Left$(sType1, 1)
    If IsNumeric(sType) Then
        sType = "1"
    ElseIf sType = "" Then
        'not in a list
    ElseIf Asc(sType) = 63 Then
        bBullet = True
    ElseIf sType = "i" Then
        sType = "i"
    Else
        sType = "a"
    End If
    'Determine indentation compared to last indentation
    If i = .Paragraphs(p).LeftIndent Then
        a(p) = a(p) & "<LI>" & .Paragraphs(p).Range.Text
    ElseIf i > .Paragraphs(p).LeftIndent Then
        If bPrev Then
            a(p) = a(p) & "[/list]"
            If Len(sType) Then
                a(p) = a(p) & "<LI>" & .Paragraphs(p).Range.Text
            End If
        Else
            a(p) = a(p) & "[/list]"
            If Len(sType) Then
                a(p) = a(p) & "<LI>" & .Paragraphs(p).Range.Text
            End If
        End If
        bEnd = True
    Else
        If bBullet Then
            a(p) = "<UL>"
            a(p) = a(p) & "<LI>" & .Paragraphs(p).Range.Text
        Else
            a(p) = "[list=1]" & vbNewLine
            a(p) = a(p) & "<LI>" & .Paragraphs(p).Range.Text ' & vbNewLine
        End If
    End If
    i = .Paragraphs(p).LeftIndent
    bPrev = bBullet
    bBullet = False
Next
'see if ended properly
If Not bEnd Then
    If bPrev Then
        a(p) = a(p) & "[/list]"
    Else
        a(p) = a(p) & "[/list]"
    End If
End If

ParagraphCount = .Paragraphs.Count

.Delete
For p = 1 To ParagraphCount
.InsertAfter a(p)
Next
For p = 1 To ParagraphCount
a(p) = ""
Next

i = 0

End With
Next

End Sub


1. is there an easier way to get the output to replace given Lists than to store those values in a(p) and print them out at the end?

2. There is a problem when the next Line of the list is two or more indents smaller than the previous one - the macro just treats the Line as one indent smaller. I guess a counter could be used to keep track of the level of indentation?

Joe
 
Upvote 0

Forum statistics

Threads
1,225,354
Messages
6,184,459
Members
453,233
Latest member
bgmb

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top