NeilMcCauley
New Member
- Joined
- Aug 5, 2012
- Messages
- 8
Imagine A1:A300000 contains a long list of words, like this:
[TABLE="width: 500"]
<tbody>[TR]
[TD][/TD]
[TD]Column A[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Applesauce22[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]Bananas[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Antiapple[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Applesandpears[/TD]
[/TR]
</tbody>[/TABLE]
And I have a list of words I'd like to find within the above list, like this:
[TABLE="width: 500"]
<tbody>[TR]
[TD][/TD]
[TD]Column B[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]apple[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]pear[/TD]
[/TR]
</tbody>[/TABLE]
The list in column B might be a couple of thousand words long. In other words, I need to see if a couple of thousand words (B1:B2000) appear within about 300,000 cells (A1:A300000).
Right now, the closest I have come is using this formula: =LOOKUP(2^15,SEARCH(B$1:B$2000,A1),B$1:B$2000). However, it only returns one match (it would find 'Apple' in A4, but not notice that A4 also contained 'Pear'). The other key facts about this challenge are that words in column A aren't delimited in any way, they are hidden within single strings of text. That seems to be fine with the formula I just described, but maybe it would affect alternative solutions.
I'm guessing that Excel is not actually the best way to do this kind of task, because of the number of computations needed. However, I'm not technical enough to use Python or R or something. I was hoping somebody might be able to suggest a formula or module that could handle this tricky task. Perhaps the end result for the above example would be something like:
[TABLE="width: 500"]
<tbody>[TR]
[TD][/TD]
[TD]Column A (Long list)[/TD]
[TD]Column B (First word found)[/TD]
[TD]Column C (second word found)[/TD]
[TD]Lookup words (Words I'm looking for)[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Applesauce22[/TD]
[TD]Apple[/TD]
[TD][/TD]
[TD]Apple[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]Bananas[/TD]
[TD][/TD]
[TD][/TD]
[TD]Pear[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Antiapple[/TD]
[TD]Apple[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Applesandpears[/TD]
[TD]Apple[/TD]
[TD]Pear[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
Many thanks!
P.S. I'm using Excel 2013, on Windows 64-bit.data:image/s3,"s3://crabby-images/a0dd6/a0dd67a17ec8b6e6bcb45d7047f3d9bfe87084bb" alt="Smile :) :)"
[TABLE="width: 500"]
<tbody>[TR]
[TD][/TD]
[TD]Column A[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Applesauce22[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]Bananas[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Antiapple[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Applesandpears[/TD]
[/TR]
</tbody>[/TABLE]
And I have a list of words I'd like to find within the above list, like this:
[TABLE="width: 500"]
<tbody>[TR]
[TD][/TD]
[TD]Column B[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]apple[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]pear[/TD]
[/TR]
</tbody>[/TABLE]
The list in column B might be a couple of thousand words long. In other words, I need to see if a couple of thousand words (B1:B2000) appear within about 300,000 cells (A1:A300000).
Right now, the closest I have come is using this formula: =LOOKUP(2^15,SEARCH(B$1:B$2000,A1),B$1:B$2000). However, it only returns one match (it would find 'Apple' in A4, but not notice that A4 also contained 'Pear'). The other key facts about this challenge are that words in column A aren't delimited in any way, they are hidden within single strings of text. That seems to be fine with the formula I just described, but maybe it would affect alternative solutions.
I'm guessing that Excel is not actually the best way to do this kind of task, because of the number of computations needed. However, I'm not technical enough to use Python or R or something. I was hoping somebody might be able to suggest a formula or module that could handle this tricky task. Perhaps the end result for the above example would be something like:
[TABLE="width: 500"]
<tbody>[TR]
[TD][/TD]
[TD]Column A (Long list)[/TD]
[TD]Column B (First word found)[/TD]
[TD]Column C (second word found)[/TD]
[TD]Lookup words (Words I'm looking for)[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Applesauce22[/TD]
[TD]Apple[/TD]
[TD][/TD]
[TD]Apple[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]Bananas[/TD]
[TD][/TD]
[TD][/TD]
[TD]Pear[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Antiapple[/TD]
[TD]Apple[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Applesandpears[/TD]
[TD]Apple[/TD]
[TD]Pear[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
Many thanks!
P.S. I'm using Excel 2013, on Windows 64-bit.
data:image/s3,"s3://crabby-images/a0dd6/a0dd67a17ec8b6e6bcb45d7047f3d9bfe87084bb" alt="Smile :) :)"