Hi,
I'm trying to look for some whole-word Unicode words in a range of cells.
My lookup words may look something like these:
"an" "tiếp" "lị" "nguyễn", etc. (they are made up of characters from the Vietnamese alphabet)
and my lookin cells may look something like these:
"an" "ban, tiếp; lịn, ngạn"
In this case, word # 1 is in cell 1 and not in cell 2, word # 2 is in cell 2 and not cell 1. Words 3 and 4 are in neither cells.
Unable to find a good solution with Excel's basic functions, I turned to regular expressions. I ended up using the RegExpFind() function given here: http://www.vbaexpress.com/kb/getarticle.php?kb_id=841
There are a bunch of them floating around but they are mostly the same. I'm not too worried about these functions themselves. Instead, I'm wondering if somebody can help me with the patterns.
For example, if I look up the word "an" and only want to match whole words, I simply use "\ban\b" and the double \b word boundaries take care of it for me. However, when I move onto non-English-character words, such as "tiếp", "\btiếp\b" would not work.
Am I missing something here? Does Excel's regular expression not support unicode characters?
I'm trying to look for some whole-word Unicode words in a range of cells.
My lookup words may look something like these:
"an" "tiếp" "lị" "nguyễn", etc. (they are made up of characters from the Vietnamese alphabet)
and my lookin cells may look something like these:
"an" "ban, tiếp; lịn, ngạn"
In this case, word # 1 is in cell 1 and not in cell 2, word # 2 is in cell 2 and not cell 1. Words 3 and 4 are in neither cells.
Unable to find a good solution with Excel's basic functions, I turned to regular expressions. I ended up using the RegExpFind() function given here: http://www.vbaexpress.com/kb/getarticle.php?kb_id=841
There are a bunch of them floating around but they are mostly the same. I'm not too worried about these functions themselves. Instead, I'm wondering if somebody can help me with the patterns.
For example, if I look up the word "an" and only want to match whole words, I simply use "\ban\b" and the double \b word boundaries take care of it for me. However, when I move onto non-English-character words, such as "tiếp", "\btiếp\b" would not work.
Am I missing something here? Does Excel's regular expression not support unicode characters?