A new fuzzy lookup formula without VBA nor add-in

Excali · Jul 9, 2021

When you want to do a fuzzy lookup without VBA or addin or POWER QUERY, it might be usefull to use dynamic arrays and code a formula that test n-gram substrings to evaluate text similarities.

Hereafter is a detailed view of the computation of string similarities by n-gram substrings :

n-gram similarity.xlsx

B

C

D

E

F

G

1

find_text

within_text

n-gram

2

ELOQUENTLY

INELOQUENT

3

N

4

10

n-gram similarity

0,750

=D6/(N-n_gram+1)

5

single formula

0,750

=SOMME(1*ESTNUM(CHERCHE(STXT($B$2;SEQUENCE(NBCAR($B$2)-n_gram+1);n_gram);within_text)))/(N-n_gram+1)

6

Total

7

8

2

LOQ

1

9

3

OQU

1

10

4

QUE

1

11

5

UEN

1

12

6

ENT

1

13

7

NTL

0

14

8

TLY

0

Detailed single formula

Cell Formulas
Range	Formula
B4	B4	=LEN(B2)
F4	F4	=D6/(N-n_gram+1)
G4:G5	G4	=FORMULATEXT(F4)
F5	F5	=SUM(1*ISNUMBER(SEARCH(MID($B$2,SEQUENCE(LEN($B$2)-n_gram+1),n_gram),within_text)))/(N-n_gram+1)
D6	D6	=SUM(D7#)
B7:B14	B7	=SEQUENCE(LEN(B2)-n_gram+1)
C7:C14	C7	=MID(B2,B7#,n_gram)
D7:D14	D7	=1*ISNUMBER(SEARCH(C7#,within_text))
Dynamic array formulas.

Named Ranges
Name	Refers To	Cells
N	='Detailed single formula'!$B$4	F4:F5
n_gram	='Detailed single formula'!$D$2	B7:C7, F4:F5
within_text	='Detailed single formula'!$C$2	F5, D7

By wraping all string matching in a dynamic array formula, one can build a fuzzy lookup which return the most similar text value :

n-gram similarity.xlsx

B

C

D

1

find_text

n-gram

Matching text

2

JACKSON, Michael

2

Michael JACKSON

3

4

5

6

within_text

Simple formulae

Array formula

7

Michael LORRIE

0,4

8

Jackson FIVE

0,4

9

The Jacksons

0,4

10

Prince of Pop

0

11

Michael JACKSON

0,8

12

Jackson, Stewart

0,533333333

13

Latoya Jackson

0,4

Fuzzy Lookup

Cell Formulas
Range	Formula
D2	D2	=INDEX(within_text_list[within_text],MATCH(MAX( MMULT(1ISNUMBER(SEARCH(MID(find_text,SEQUENCE(1,LEN(find_text)-n_gram+1),n_gram),within_text_list[within_text]))/(LEN(find_text)-n_gram+1), SEQUENCE(LEN(find_text)-n_gram+1,1,1,0)) ), MMULT(1ISNUMBER(SEARCH(MID(find_text,SEQUENCE(1,LEN(find_text)-n_gram+1),n_gram),within_text_list[within_text]))/(LEN(find_text)-n_gram+1), SEQUENCE(LEN(find_text)-n_gram+1,1,1,0)),0))
D7:D13	D7	=MMULT(1*ISNUMBER(SEARCH(MID($B$2,SEQUENCE(1,LEN($B$2)-n_gram+1),n_gram),within_text_list[within_text]))/(LEN($B$2)-n_gram+1), SEQUENCE(LEN($B$2)-n_gram+1,1,1,0))
C7:C13	C7	=SUM(1*ISNUMBER(SEARCH(MID(find_text,SEQUENCE(1,LEN(find_text)-n_gram+1),n_gram),B7)))/(LEN(find_text)-n_gram+1)
Dynamic array formulas.

Named Ranges
Name	Refers To	Cells
'Fuzzy Lookup'!find_text	='Fuzzy Lookup'!$B$2	D7, D2, C7:C13
'Fuzzy Lookup'!n_gram	='Fuzzy Lookup'!$C$2	D7, D2, C7:C13

Cells with Conditional Formatting
Cell	Condition		Cell Format	Stop If True
C7:C13	Cell Value	top 1 values	text	NO
D7:D13	Cell Value	top 1 values	text	NO

Obviously, the formula could be simplified when the LET() function will be widely available

Hope this helps

Excali

Dermot · Jul 9, 2021

Thank you for sharing. That's an interesting utility.

The formulae are pretty complex, and it would be easy to make mistakes when creating them. I'm wondering if it wouldn't be preferable to write a VBA function to make life easier for users.

A new fuzzy lookup formula without VBA nor add-in

Excali

New Member

Excel Facts

Dermot

Board Regular

Similar threads

Forum statistics

Share this page

A new fuzzy lookup formula without VBA nor add-in

Excali

New Member

Excel Facts

Dermot

Board Regular

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock