# Using keywords to match a part to a category



## Mhaddy

I have over 17,000 parts for a classic car company that I need to assign to categories based on a variety of criteria. I'd like to be able to assign keywords to the categories as seen below, and then check to see if any piece of the part name matches any one of the keywords.

For example, a part named "Grille Emblem" would be assigned category "100". Another example, a part named "Set of 4 bolts" would be assigned category "102".

Categories:


CategoryDescriptionKeywords100Accessories and Grafittiscrew, fastener, emblem
101Audiostereo, cd, player, speaker, speaker wire
102Bed & Related
wood paneling, bolts


<tbody>

</tbody>
Any thoughts on how to accomplish this?


----------



## Peter_SSs

Welcome to the MrExcel board!

1. Would it ever be possible to have a part name like "Speaker Emblem"? If so, what to do in that case?

2. Do you care if you get a formula solution or a macro solution?


----------



## Mhaddy

Thanks for the warm welcome and prompt reply @Peter_SSs! Either formula or macro would be fine.

I'm hoping that when I run into the problem you identified with #1 (because yes, it would be possible), I could look at a new set of criteria to determine what to do. For example, all parts in the "Audio" category would be < $100 cost; therefore, if a part matches keywords in more than one category, it would look at its cost and then determine if it should fall into one or the other.

If there's still a conflict, returning "INVESTIGATE" would be great and I could deal with those situations as they come.


----------



## Peter_SSs

This lists each category if more than one. Would that suit?

(Note that keyword "speaker wire" is redundant since if a part name contained those words, the part would already be picked up in category 101 by the keyword "speaker")



		Rich (BB code):
__


Sub AssignCategory()
  Dim aPatterns, aCategories, aResults
  Dim i As Long, p As Long
  
  With Range("C2", Range("C" & Rows.Count).End(xlUp))
    aPatterns = Split("\b(" & Replace(Replace(Join(Application.Transpose(.Value), ")\b#\b("), ", ", ","), ",", "|") & ")\b", "#")
    aCategories = .Offset(, -2).Value
  End With
  aResults = Range("E2:F" & Range("E" & Rows.Count).End(xlUp).Row).Value
  With CreateObject("VBScript.RegExp")
    .Global = True
    .IgnoreCase = True
    For p = 1 To UBound(aResults, 1)
      For i = 0 To UBound(aPatterns)
        .Pattern = aPatterns(i)
        If .Test(aResults(p, 1)) Then aResults(p, 2) = aResults(p, 2) & ", " & aCategories(i + 1, 1)
      Next i
      aResults(p, 2) = Mid(aResults(p, 2), 3)
    Next p
  End With
  Range("E2").Resize(UBound(aResults, 1), 2).Value = aResults
End Sub


Excel WorkbookABCDEF1CategoryDescriptionKeywordsPartCategory2100Accessories and Grafittiscrew, fastener, emblemGrille Emblem3101Audiostereo, cd, player, speaker, speaker wireSet of 4 bolts4102Bed & Relatedwood paneling, boltsSpeaker emblemBefore macro


Excel WorkbookABCDEF1CategoryDescriptionKeywordsPartCategory2100Accessories and Grafittiscrew, fastener, emblemGrille Emblem1003101Audiostereo, cd, player, speaker, speaker wireSet of 4 bolts1024102Bed & Relatedwood paneling, boltsSpeaker emblem100, 101After macro


----------



## Mhaddy

Oh my goodness, this is almost exactly what I've been looking for, you are amazing!

Could you help me understand what's actually happening in the macro? I'd like to make a few modifications:
1. Run multiple times (i.e., clears out all values in column F from your example)
2. My categories are listed on another tab, so I need to reference column C of the "Cats" tab and return the category value from column A of the "Cats" tab


----------



## Mhaddy

Figured it out . Thanks again for the inspiration!



		Code:
__


Sub AssignCategory()  Dim aPatterns, aCategories, aResults
  Dim i As Long, p As Long
  
  'Clear aResults so we can re-run this puppy
  Worksheets("Parts").Range("B2:B5").ClearContents
  
  'Clear the contents of the new category column so this code can be re-run
  With Worksheets("Parts").Range("B2", Worksheets("Parts").Range("B" & Rows.Count).End(xlUp))
    Worksheets("Parts").Range("B2").Value = ""
  End With
  
  'Now get all of the keywords and split by commas
  With Worksheets("Cats").Range("C2", Worksheets("Cats").Range("C" & Rows.Count).End(xlUp))
    aPatterns = Split("\b(" & Replace(Replace(Join(Application.Transpose(.Value), ")\b#\b("), ", ", ","), ",", "|") & ")\b", "#")
    aCategories = .Offset(, -2).Value
  End With
  
  aResults = Worksheets("Parts").Range("A2:B" & Worksheets("Parts").Range("A" & Rows.Count).End(xlUp).Row).Value
  
  'Update row by row with new category ID
  With CreateObject("VBScript.RegExp")
    .Global = True
    .IgnoreCase = True
    For p = 1 To UBound(aResults, 1)
      For i = 0 To UBound(aPatterns)
        .Pattern = aPatterns(i)
        If .Test(aResults(p, 1)) Then aResults(p, 2) = aResults(p, 2) & ", " & aCategories(i + 1, 1)
      Next i
      aResults(p, 2) = Mid(aResults(p, 2), 3)
    Next p
  End With
  
  Worksheets("Parts").Range("A2").Resize(UBound(aResults, 1), 2).Value = aResults
  
End Sub


----------



## Peter_SSs

Mhaddy said:


> Figured it out . Thanks again for the inspiration!


Always more satisfying if you can work it out yourself. 

I have suggested a couple of small tidy-ups you may want to consider.


		Rich (BB code):
__


Sub AssignCategory_v2()
  Dim aPatterns, aCategories, aResults
  Dim i As Long, p As Long
  
  'Clear the contents of the new category column so this code can be re-run & setup Results array
  With Worksheets("Parts").Range("A2:B" & Worksheets("Parts").Range("A" & Rows.Count).End(xlUp).Row)
      .Columns(2).ClearContents
      aResults = .Value
  End With
  
  'Get all of the keywords and split by commas, & matching categories
  With Worksheets("Cats").Range("C2", Worksheets("Cats").Range("C" & Rows.Count).End(xlUp))
    aPatterns = Split("\b(" & Replace(Replace(Join(Application.Transpose(.Value), ")\b#\b("), ", ", ","), ",", "|") & ")\b", "#")
    aCategories = .Offset(, -2).Value
  End With
  
  'Update row by row with new category ID
  With CreateObject("VBScript.RegExp")
    .Global = True
    .IgnoreCase = True
    For p = 1 To UBound(aResults, 1)
      For i = 0 To UBound(aPatterns)
        .Pattern = aPatterns(i)
        If .Test(aResults(p, 1)) Then aResults(p, 2) = aResults(p, 2) & ", " & aCategories(i + 1, 1)
      Next i
      aResults(p, 2) = Mid(aResults(p, 2), 3)
    Next p
  End With
  
  'Put Results array back on worksheet
  Worksheets("Parts").Range("A2").Resize(UBound(aResults, 1), 2).Value = aResults
  
End Sub


----------



## Mhaddy

Thank you for the improvements! Unfortunately I'm getting a "type mismatch" error - but I've copied your subroutine exactly ... I'm thinking it has to do with the keywords I've been using? I'm not able to trace it through despite several days of trying >_<. Could you have a look at my keywords attached and let me know if this is the issue? I've tried removing all non-alpha characters but to no avail.

https://dl.dropboxusercontent.com/u/695137/cats-only.xlsx


----------



## Peter_SSs

Mhaddy said:


> .. I'm getting a "type mismatch" error - ... I'm thinking it has to do with the keywords I've been using?


It is. The primary issue is that some of the cells in column C contain large amounts of text. The code is failing when a cell contains more than 255 characters. Cell C10 is the first such cell.

An additional issue that could arise now that I've seen the Keywords is that some of the Keywords contain, or could in the future, characters that have special meaning for Regular Expressions - the method that my code uses.
At the moment the only such characters that I could see are "(" and ")" but others would include, but would not be limited to: "[\^$.|?*+"

Of course the above issues were not evident from the small amount of simplified data provided to start with. 

In the modified code below I have tried to address the issue of cells containing > 255 characters and the "(" and ")" issue. I wasn't able to test very well as I didn't have a realistic 'Parts' list to test against.

Anyway, give this version a try.



		Rich (BB code):
__


Sub AssignCategory_v3()
  Dim aPatterns, aCategories, aResults
  Dim i As Long, p As Long
  
  'Clear the contents of the new category column so this code can be re-run & setup Results array
  With Worksheets("Parts").Range("A2:B" & Worksheets("Parts").Range("A" & Rows.Count).End(xlUp).Row)
      .Columns(2).ClearContents
      aResults = .Value
  End With
  
  'Get all of the keywords and split by commas, & matching categories
  With Worksheets("Cats").Range("C2", Worksheets("Cats").Range("C" & Rows.Count).End(xlUp))
    aPatterns = .Value
    For i = 1 To UBound(aPatterns, 1)
      If Len(aPatterns(i, 1)) Then
        aPatterns(i, 1) = "\b(" & Replace(Replace(Replace(Replace(aPatterns(i, 1), ", ", ","), ",", "|"), "(", "\("), ")", "\)") & ")\b"
      Else
        aPatterns(i, 1) = String(10, "A")
      End If
    Next i
    aCategories = .Offset(, -2).Value
  End With
  
  'Update row by row with new category ID
  With CreateObject("VBScript.RegExp")
    .Global = True
    .IgnoreCase = True
    For p = 1 To UBound(aResults, 1)
      For i = 1 To UBound(aPatterns, 1)
        .Pattern = aPatterns(i, 1)
        If .test(aResults(p, 1)) Then aResults(p, 2) = aResults(p, 2) & ", " & aCategories(i, 1)
      Next i
      aResults(p, 2) = Mid(aResults(p, 2), 3)
    Next p
  End With
  
  'Put Results array back on worksheet
  Worksheets("Parts").Range("A2").Resize(UBound(aResults, 1), 2).Value = aResults
  
End Sub


----------



## Mhaddy

I'm no stranger to acknowledging the impact of failing to provide clear, complete and concise requirements so I really must apologize for that . The code now runs and is working on (something) but does not assign any categories to any parts. I've tried removing any keywords with non-alpha characters as well as those cells with > 256 characters but to no avail. I think the problem now lies with my part names (linked below) - which I have also not given you complete requirements on hah - they're not just alpha characters! I am working on cleaning these up, but this isn't something I can get done in short order.

https://dl.dropboxusercontent.com/u/695137/parts-only.xlsx


----------



## Mhaddy

I have over 17,000 parts for a classic car company that I need to assign to categories based on a variety of criteria. I'd like to be able to assign keywords to the categories as seen below, and then check to see if any piece of the part name matches any one of the keywords.

For example, a part named "Grille Emblem" would be assigned category "100". Another example, a part named "Set of 4 bolts" would be assigned category "102".

Categories:


CategoryDescriptionKeywords100Accessories and Grafittiscrew, fastener, emblem
101Audiostereo, cd, player, speaker, speaker wire
102Bed & Related
wood paneling, bolts


<tbody>

</tbody>
Any thoughts on how to accomplish this?


----------



## Peter_SSs

Mhaddy said:


> I'm no stranger to acknowledging the impact of failing to provide clear, complete and concise requirements so I really must apologize for that . The code now runs and is working on (something) but does not assign any categories to any parts.


It is not helping that things appear to keep moving.

Until now you had not actually provided any information about the layout of the 'Parts' sheet. However, the code you 'figured out' in post #6 read the aResults array from columns A & B (A with descriptions and B to contain the new categories.
The sheet you have now provided has the descriptions in column D, not column A, so not much surprise that it isn't working. 

Also, the Parts Description column (D) has quite a lot of #VALUE! entries. They would have caused the code to error anyway even if we were pointing at the correct column. I'm assuming we would just skip those rows.

Here is another try. Be patient as that is quite a large amount of data to process.



		Rich (BB code):
__


Sub AssignCategory_v4()
  Dim aPatterns, aCategories, aResults
  Dim i As Long, p As Long
  
  'Clear the contents of the new category column so this code can be re-run & setup Results array
  With Worksheets("Parts").Range("B2", Worksheets("Parts").Range("D" & Rows.Count).End(xlUp))
      .Columns(1).ClearContents
      aResults = .Value
  End With
  
  'Get all of the keywords and split by commas, & matching categories
  With Worksheets("Cats").Range("C2", Worksheets("Cats").Range("C" & Rows.Count).End(xlUp))
    aPatterns = .Value
    For i = 1 To UBound(aPatterns, 1)
      If Len(aPatterns(i, 1)) Then
        aPatterns(i, 1) = "\b(" & Replace(Replace(Replace(Replace(aPatterns(i, 1), ", ", ","), ",", "|"), "(", "\("), ")", "\)") & ")\b"
      Else
        aPatterns(i, 1) = String(10, "A")
      End If
    Next i
    aCategories = .Offset(, -2).Value
  End With
  
  'Update row by row with new category ID
  With CreateObject("VBScript.RegExp")
    .Global = True
    .IgnoreCase = True
    For p = 1 To UBound(aResults, 1)
      If Not IsError(aResults(p, 3)) Then
        For i = 1 To UBound(aPatterns, 1)
          .Pattern = aPatterns(i, 1)
          If .test(aResults(p, 3)) Then aResults(p, 1) = aResults(p, 1) & ", " & aCategories(i, 1)
        Next i
        aResults(p, 1) = Mid(aResults(p, 1), 3)
      End If
    Next p
  End With
  
  'Put Results array back on worksheet
  Worksheets("Parts").Range("B2").Resize(UBound(aResults, 1), 3).Value = aResults
End Sub


----------



## Mhaddy

Before I created the thread, I was debating posting my spreadsheet so the community could see exactly what I was doing. But, I refrained and thought there was too much info and it would overwhelm and deter people from helping (I've experienced this in other communities). So I oversimplified but I did so without my coder's hat on! I'm sorry for the four revisions but good news ... it works! I'm so pleased with this, I can't thank you enough!


----------



## Peter_SSs

Mhaddy said:


> Before I created the thread, I was debating posting my spreadsheet so the community could see exactly what I was doing. But, I refrained and thought there was too much info and it would overwhelm and deter people from helping (I've experienced this in other communities). So I oversimplified but I did so without my coder's hat on! I'm sorry for the four revisions but good news ... it works! I'm so pleased with this, I can't thank you enough!


The dilemma between simplifying data to try to make the concept clear and posting full data info can be a difficult one. Glad we got there in the end.


----------



## KarlLee

Hi, 

I'm impressed by this codes, and trying to add in one additional function to it:

In case the keyword is not match, I would like to return a symbol (says ".") to the cell. I tried to insert Else command like this:



		VBA Code:
__


Sub AssignCategory_v4()
  Dim aPatterns, aCategories, aResults
  Dim i As Long, p As Long
  
  'Clear the contents of the new category column so this code can be re-run & setup Results array
  With Worksheets("Parts").Range("B2", Worksheets("Parts").Range("D" & Rows.Count).End(xlUp))
      .Columns(1).ClearContents
      aResults = .Value
  End With
  
  'Get all of the keywords and split by commas, & matching categories
  With Worksheets("Cats").Range("C2", Worksheets("Cats").Range("C" & Rows.Count).End(xlUp))
    aPatterns = .Value
    For i = 1 To UBound(aPatterns, 1)
      If Len(aPatterns(i, 1)) Then
        aPatterns(i, 1) = "\b(" & Replace(Replace(Replace(Replace(aPatterns(i, 1), ", ", ","), ",", "|"), "(", "\("), ")", "\)") & ")\b"
      Else
        aPatterns(i, 1) = String(10, "A")
      End If
    Next i
    aCategories = .Offset(, -2).Value
  End With
  
  'Update row by row with new category ID
  With CreateObject("VBScript.RegExp")
    .Global = True
    .IgnoreCase = True
    For p = 1 To UBound(aResults, 1)
      If Not IsError(aResults(p, 3)) Then
        For i = 1 To UBound(aPatterns, 1)
          .Pattern = aPatterns(i, 1)
   
  'New Else here
          If .test(aResults(p, 3)) Then aResults(p, 1) = aResults(p, 1) & ", " & aCategories(i, 1) Else aResults(p,1)=aResults(p,1) & "."
        Next i
        aResults(p, 1) = Mid(aResults(p, 1), 3)
      End If
    Next p
  End With
  
  'Put Results array back on worksheet
  Worksheets("Parts").Range("B2").Resize(UBound(aResults, 1), 3).Value = aResults
End Sub


The result turns out to be funny. There are more dots that I expected, and those matched and assigned categories also come with dots.
May I get your advice on how to include such function? 

Thank you in advance.


----------



## Peter_SSs

Welcome to the MrExcel board!

Perhaps you could provide a small set of sample dummy data, keywords and the expected results with XL2BB?


----------



## KarlLee

Peter_SSs said:


> Welcome to the MrExcel board!
> 
> Perhaps you could provide a small set of sample dummy data, keywords and the expected results with XL2BB?


Thank you for your reply.

I'm unable to install xl2bb as I'm still using Excel 2003. Hence I'll include screenshots for illustration.

A correction to my previous post: my trial is actually based on version 3, which is as follow:


		VBA Code:
__


Sub AssignCategory_v3Test()
  Dim aPatterns, aCategories, aResults
  Dim i As Long, p As Long
  
  'Clear the contents of the new category column so this code can be re-run & setup Results array
  With Worksheets("Parts").Range("A2:B" & Worksheets("Parts").Range("A" & Rows.Count).End(xlUp).Row)
      .Columns(2).ClearContents
      aResults = .Value
  End With
  
  'Get all of the keywords and split by commas, & matching categories
  With Worksheets("Cats").Range("C2", Worksheets("Cats").Range("C" & Rows.Count).End(xlUp))
    aPatterns = .Value
    For i = 1 To UBound(aPatterns, 1)
      If Len(aPatterns(i, 1)) Then
        aPatterns(i, 1) = "\b(" & Replace(Replace(Replace(Replace(aPatterns(i, 1), ", ", ","), ",", "|"), "(", "\("), ")", "\)") & ")\b"
      Else
        aPatterns(i, 1) = String(10, "A")
      End If
    Next i
    aCategories = .Offset(, -2).Value
  End With
  
  'Update row by row with new category ID
  With CreateObject("VBScript.RegExp")
    .Global = True
    .IgnoreCase = True
    For p = 1 To UBound(aResults, 1)
      For i = 1 To UBound(aPatterns, 1)
        .Pattern = aPatterns(i, 1)
        If .test(aResults(p, 1)) Then aResults(p, 2) = aResults(p, 2) & ", " & aCategories(i, 1) Else aResults(p, 2) = aResults(p, 2) & "."
      Next i
      aResults(p, 2) = Mid(aResults(p, 2), 3)
    Next p
  End With
  
  'Put Results array back on worksheet
  Worksheets("Parts").Range("A2").Resize(UBound(aResults, 1), 2).Value = aResults
  
End Sub


The design of Category is as follow:





And the expected result is as follow:




Since "Hammer" does not match any keywords in "Cats" tab, a dot "." is returned to cell B4.

My amendment was made here:




The result turns out to be funny:




Hope to get your help. Thank you in advance.


----------

