Power Query matching each element of a list of terms with a free text field and returning list postion or list value

GraH · May 28, 2020

Hi all,

My data is structured in 2 Excel tables: tComments and tKeyTerms. Both are loaded in Power Query and I converted the tKeyTerms to a list {tKeyTerms}.
The goal is to detect if any of the terms, which are not single words but a group of words, are used in any of the comments. When there is a match I need to extract the key term from the list.

In the real life situation there are going to be 200K comments and potentially 100 to 150 key terms. The code I have so far works fast enough for 15K rows.

Currently I'm only able to detect if there is a matching value with the following M-code, but I can't figure out how I can extract or find the position from the list of the matching value:

Code:

let
    Source = Excel.CurrentWorkbook(){[Name="tComments"]}[Content],
    ListKeyTerms = List.Buffer(tKeyTerms),
    AddCol_AddListToSource = Table.AddColumn(Source,"KeyTerms",each ListKeyTerms),
    AddColAsFx_TextContainsTerms = Table.AddColumn(AddCol_AddListToSource, "MatchKeyTerm", (C) => List.AnyTrue(List.Transform(C[KeyTerms], each Text.Contains(C[Comment], _))))
in
    AddColAsFx_TextContainsTerms

Any guidance in the right direction is highly appreciated. Do bare in mind I'm not a programmer/code writer as 99% of what I do with PQ is done using the UI. So I also welcome some notes on how the code works.

Sample data looks like this:

Book1 (version 1).xlsb

A

B

C

D

1

Comment

KeyTerms

2

er staan wat inleidende woorden voor dit zijn mijn sleuteltermen die dan weer gevolgd worden door een hoop overbodige woorden.

dit zijn mijn sleuteltermen

3

in this phase there are a few words preceding what I'm actually looking for, my personal key term of interest and then again there are words, words and more words no-one cares about.

key term of interest

4

Ceci est un texte qui ne vaut rien du tout.

Sheet1

GraH · May 28, 2020

Hi all,

After some more research and with a fresh head I finally got it. So posting the solution as reference for those interested.

Code:

let
    Source = Excel.CurrentWorkbook(){[Name="tComments"]}[Content],
    MatchTerms = Table.AddColumn(Source, "MatchTerms", each let CurrentText = [Comment] in Table.SelectRows(TblKeyTerms,each Text.Contains(CurrentText, [KeyTerms])))
in
    MatchTerms

Discovering one can use let or write a function in a formula: totally new stuff for me. Now the hard part: making sense of all this

.

For those who viewed and were looking to solve this: thanks for the effort.

Have a good day.

Power Query matching each element of a list of terms with a free text field and returning list postion or list value

GraH

Well-known Member

GraH

Well-known Member

Share this page

Power Query matching each element of a list of terms with a free text field and returning list postion or list value

GraH

Well-known Member

GraH

Well-known Member

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock