Text analysis with VBA

  • Thread starter Thread starter Legacy 512973
  • Start date Start date
L

Legacy 512973

Guest
hello,

i need a vba macro that do text analysis under some constraints

let's take a workbook with 2 tabs: text analysis tab and Linguistic analysis tab

on the "text analysis" tab, you will enter a text on A1 cell

on the "Linguistic analysis" tab you will get statistical figures like number of letters, vowels, most used words in the text etc...

i made an example file of final result i will like to get :

there is constraints to follow that you can find and read in the 3rd tab of the workbook linked above and some purple text on "Linguistic analysis" tab

code with line comments will help much

Thank you for your help and Merry Christmas !
 

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
Hello,

I hope this works for you:
VBA Code:
Option Explicit
Sub test()
  Dim wordDic As Object, ltrDic As Object, doubleWordDic As Object
  Dim myString As String, ltr As String, tmp As String
  Dim words As Variant, key As Variant, doubleWords(1 To 2, 1 To 20) As Variant, letters(1 To 2, 1 To 20) As Variant
  Dim wordCount As Long, ltrCount As Long, strLen As Long, i As Integer, j As Integer, vowels As Long
 
  myString = Application.Trim(Worksheets("Text to analyze").Range("A1").Value)
  strLen = Len(myString)
  words = Split(myString, " ")
  wordCount = UBound(words)
 
  Set wordDic = CreateObject("Scripting.Dictionary")
  For i = 0 To wordCount
    For j = 1 To Len(words(i))
      ltr = Mid(words(i), j, 1)
      If UCase(ltr) <> LCase(ltr) Then
        tmp = tmp & ltr
      End If
    Next
    If Not wordDic.Exists(LCase(tmp)) Then
      wordDic.Add LCase(tmp), 1
    Else
      wordDic(LCase(tmp)) = wordDic(LCase(tmp)) + 1
    End If
    tmp = ""
  Next
 
  Set doubleWordDic = CreateObject("Scripting.Dictionary")
  For i = 0 To UBound(words) - 1
    j = i + 1
    tmp = words(i) & " " & words(j)
    If Not doubleWordDic.Exists(tmp) Then
      doubleWordDic.Add tmp, 1
    Else
      doubleWordDic(tmp) = doubleWordDic(tmp) + 1
    End If
  Next
 
  ReDim words(1 To 2, 1 To 20)
  For Each key In wordDic
    For i = 1 To 20
      If CInt(wordDic(key)) > CInt(IIf(words(2, i) = "", 0, words(2, i))) Then
        For j = 19 To i Step -1
          words(1, j + 1) = words(1, j)
          words(2, j + 1) = words(2, j)
        Next
        words(1, i) = key
        words(2, i) = wordDic(key)
        Exit For
      End If
    Next
  Next
 
 For Each key In doubleWordDic
    For i = 1 To 20
      If CInt(doubleWordDic(key)) > CInt(IIf(doubleWords(2, i) = "", 0, doubleWords(2, i))) Then
        For j = 19 To i Step -1
          doubleWords(1, j + 1) = doubleWords(1, j)
          doubleWords(2, j + 1) = doubleWords(2, j)
        Next
        doubleWords(1, i) = key
        doubleWords(2, i) = doubleWordDic(key)
        Exit For
      End If
    Next
  Next
 
  Set ltrDic = CreateObject("Scripting.Dictionary")
  For i = 1 To strLen
    ltrCount = ltrCount + 1
    ltr = LCase(Mid(myString, i, 1))
    If UCase(ltr) <> ltr Then
      Select Case True
      Case ltr Like "[âàäéèêëïîôûùaeiou]"
        vowels = vowels + 1
        If ltr Like "[âàäéèêëïîôûù]" Then
          ltr = converstSpecialCharacter(ltr)
        End If
      Case ltr Like "[ç]"
        ltr = converstSpecialCharacter(ltr)
      End Select
      If ltr Like "[a-z]" Then
        If Not ltrDic.Exists(ltr) Then
          ltrDic.Add ltr, 1
        Else
          ltrDic(ltr) = ltrDic(ltr) + 1
        End If
      End If
    End If
  Next
 
  For Each key In ltrDic
    For i = 1 To 20
      If CInt(ltrDic(key)) > CInt(IIf(letters(2, i) = "", 0, letters(2, i))) Then
        For j = 19 To i Step -1
          letters(1, j + 1) = letters(1, j)
          letters(2, j + 1) = letters(2, j)
        Next
        letters(1, i) = key
        letters(2, i) = ltrDic(key)
        Exit For
      End If
    Next
  Next

  With Worksheets("Linguistic Analysis")
    .Range("C1").Value = ltrCount
    .Range("C2").Value = vowels
    .Range("C3").Value = wordCount
    .Range("B6").Resize(2, 20).Value = letters
    .Range("B11").Resize(2, 20).Value = words
    .Range("B16").Resize(2, 20).Value = doubleWords
  End With
End Sub
Function converstSpecialCharacter(letter As String) As String
  Select Case True
  Case letter Like "[âàä]"
  converstSpecialCharacter = "a"
  Case letter Like "[éèêë]"
  converstSpecialCharacter = "e"
  Case letter Like "[ïî]"
  converstSpecialCharacter = "i"
  Case letter Like "[ô]"
  converstSpecialCharacter = "o"
  Case letter Like "[ûù]"
  converstSpecialCharacter = "u"
  Case letter Like "[Ç]"
  converstSpecialCharacter = "c"
  End Select
End Function
 
Upvote 0
Oh you have requested with comments:
VBA Code:
Option Explicit
Sub test()
  'Declare variables
  Dim wordDic As Object, ltrDic As Object, doubleWordDic As Object
  Dim myString As String, ltr As String, tmp As String
  Dim words As Variant, key As Variant, doubleWords(1 To 2, 1 To 20) As Variant, letters(1 To 2, 1 To 20) As Variant
  Dim wordCount As Long, ltrCount As Long, strLen As Long, i As Integer, j As Integer, vowels As Long
 
  myString = Application.Trim(Worksheets("Text to analyze").Range("A1").Value) 'Get the string without extra spaces
  strLen = Len(myString) 'Get the length of the string
  words = Split(myString, " ") 'Get the words to an array. Keep in mind that they will include neighbouring punctuation like "," or "."
  wordCount = UBound(words) 'Get word count
 
  Set wordDic = CreateObject("Scripting.Dictionary")
  For i = 0 To wordCount
    For j = 1 To Len(words(i))
      ltr = Mid(words(i), j, 1)
      If UCase(ltr) <> LCase(ltr) Then 'Get rid of punctuation
        tmp = tmp & ltr
      End If
    Next
   'Insert unique words to a dictionary with counts
    If Not wordDic.Exists(LCase(tmp)) Then
      wordDic.Add LCase(tmp), 1
    Else
      wordDic(LCase(tmp)) = wordDic(LCase(tmp)) + 1
    End If
    tmp = ""
  Next
 
  Set doubleWordDic = CreateObject("Scripting.Dictionary")
  'Find word couples
  For i = 0 To UBound(words) - 1
    j = i + 1
    tmp = words(i) & " " & words(j)
    'Insert word couples to a dictionary with counts
    If Not doubleWordDic.Exists(tmp) Then
      doubleWordDic.Add tmp, 1
    Else
      doubleWordDic(tmp) = doubleWordDic(tmp) + 1
    End If
  Next
 
  ReDim words(1 To 2, 1 To 20)
  'Sort words in an array according to their count (decreasing order)
  For Each key In wordDic
    For i = 1 To 20
      If CInt(wordDic(key)) > CInt(IIf(words(2, i) = "", 0, words(2, i))) Then
        For j = 19 To i Step -1
          words(1, j + 1) = words(1, j)
          words(2, j + 1) = words(2, j)
        Next
        words(1, i) = key
        words(2, i) = wordDic(key)
        Exit For
      End If
    Next
  Next

  'Sort word couples in an array according to their count (decreasing order)
  For Each key In doubleWordDic
    For i = 1 To 20
      If CInt(doubleWordDic(key)) > CInt(IIf(doubleWords(2, i) = "", 0, doubleWords(2, i))) Then
        For j = 19 To i Step -1
          doubleWords(1, j + 1) = doubleWords(1, j)
          doubleWords(2, j + 1) = doubleWords(2, j)
        Next
        doubleWords(1, i) = key
        doubleWords(2, i) = doubleWordDic(key)
        Exit For
      End If
    Next
  Next
 
  Set ltrDic = CreateObject("Scripting.Dictionary")
  For i = 1 To strLen
    ltr = LCase(Mid(myString, i, 1)) 'Get the character
    If UCase(ltr) <> ltr Then 'If character is a letter
      ltrCount = ltrCount + 1 'Add character count
      Select Case True
      Case ltr Like "[âàäéèêëïîôûùaeiou]" 'If vowel
        vowels = vowels + 1 'Add to vowels
        If ltr Like "[âàäéèêëïîôûù]" Then
          ltr = converstSpecialCharacter(ltr) 'If special charater
        End If
      Case ltr Like "[ç]" 'If special character
        ltr = converstSpecialCharacter(ltr)
      End Select
      If ltr Like "[a-z]" Then
        'Add unique letters to a dictionary with counts
        If Not ltrDic.Exists(ltr) Then
          ltrDic.Add ltr, 1
        Else
          ltrDic(ltr) = ltrDic(ltr) + 1
        End If
      End If
    End If
  Next

  'Sort letters in an array according to their count (decreasing order)
  For Each key In ltrDic
    For i = 1 To 20
      If CInt(ltrDic(key)) > CInt(IIf(letters(2, i) = "", 0, letters(2, i))) Then
        For j = 19 To i Step -1
          letters(1, j + 1) = letters(1, j)
          letters(2, j + 1) = letters(2, j)
        Next
        letters(1, i) = key
        letters(2, i) = ltrDic(key)
        Exit For
      End If
    Next
  Next
  'Write results to Analysis sheet
  With Worksheets("Linguistic Analysis")
    .Range("C1").Value = ltrCount
    .Range("C2").Value = vowels
    .Range("C3").Value = wordCount
    .Range("B6").Resize(2, 20).Value = letters
    .Range("B11").Resize(2, 20).Value = words
    .Range("B16").Resize(2, 20).Value = doubleWords
  End With
End Sub
'Funtion that returns speacial characters as English character
Function converstSpecialCharacter(letter As String) As String
  Select Case True
  Case letter Like "[âàä]"
  converstSpecialCharacter = "a"
  Case letter Like "[éèêë]"
  converstSpecialCharacter = "e"
  Case letter Like "[ïî]"
  converstSpecialCharacter = "i"
  Case letter Like "[ô]"
  converstSpecialCharacter = "o"
  Case letter Like "[ûù]"
  converstSpecialCharacter = "u"
  Case letter Like "[ç]"
  converstSpecialCharacter = "c"
  End Select
End Functions
 
Upvote 0
Hello Flashbond,

Thank you for your tremendous work and for your help !!!

i tried the macro and it works like a charm except some formatting missing words and borders and order number on the tables for most used letters, words, double words as you can see on the test file i uploaded with .xlsx extension for safety and allowing you to see what's the result on my side :


it's weird because i can read the formatting block with/end with being clearly stated on the code. Why it doesn't work ?

i appreciated the effort for comments and trying to respect the constraints and especially calling function. It helps me to understand the code and following guidelines and strict rules for my works. But there is some statements and functions i can't use :

- CreateObject("Scripting.Dictionary")
- redim
- is it possible to replace IIf() by If() in the whole code ?

for split function, i will be able to justify its use even if it's not in constraints.

maybe you have ideas to change those lines regarding listed constraints arguments.

I am suprized that you didn't use function replace in the whole code but it probably lightens the whole code.

Thank you again Flashbond and wish you a Merry Christmas !
 
Upvote 0
- CreateObject("Scripting.Dictionary")
- redim
- is it possible to replace IIf() by If() in the whole code ?

Actually i can't use split function also
I am sorry. It is quite tedious work without using the tools above. Also I am not sure which words are missing.

The code presumes the Analysis sheet is preformatred.

Maybe someone will show up with a more convenient solution.

Merry Chtistmass too :)
 
Upvote 0
ok thank you Flashbond,

That is a pity because it worked well.

For the missing words it just lacks a with/end with block i could add.
 
Upvote 0
Only corrections to my solution will be to change this line:
VBA Code:
wordCount = UBound(words)
to this:
VBA Code:
wordCount = UBound(words)+1

And also this one
VBA Code:
For i = 0 To wordCount
to this:
VBA Code:
For i = 0 To wordCount-1
 
Upvote 0
hello,

so i did all the code trying to follow strictly the constraints and it works on my side

here is the vba code :

to apply on this test file, you have just to copy-paste code above in the vba Excel coding sheets on the test file linked below and run it after you save the test file as .xlsm to allow macro :

But as you can see there is 3 macro :
first macro answer question 1 to 3 thanks to functions/end functions
the second macro answer question 4
the third macro answer question 5

can you help me to make one macro of this three ones by transforming macro 2 and macro 3 into functions ? Thus i will have only one macro button that will answer the 5 questions at once.

Thank you !
 
Upvote 0

Forum statistics

Threads
1,223,877
Messages
6,175,138
Members
452,614
Latest member
MRSWIN2709

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top