leeandoona
New Member
- Joined
- Oct 13, 2016
- Messages
- 45
Hi everyone, I've been struggling with this one for some time and I thought I'd throw it out there to see if anyone has experience of solving this type of problem? Sorry if its already been covered, I have looked and can't find anything quite like it.
The problem I have is I have large datasets that contain amongst other things a column which contains text. Many of the text entries are potential variations of what is essentially the same item. For example:
Column A
.............
Rose Scented Candle Black
Scented Candle Rose Blue
Rose Scented Candle Yellow
Scented Candle (Rose) Red
All of the above could loosely be reffered to as simply 'Rose Scented Candle'. So if I could find which items in my column were potentially the same, I could then go a step further and enter a validation column:
[TABLE="width: 380"]
<tbody>[TR]
[TD]Title[/TD]
[TD]Is Variant[/TD]
[TD]Vanilla Title[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Black[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
[TR]
[TD]Scented Candle Rose Blue[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Yellow[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
[TR]
[TD]Scented Candle (Rose) Red[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
</tbody>[/TABLE]
This would mean I could then identify a SINGLE title, e.g. 'Rose Scented Candle' and have this in an adjacent column. My final step would then be to enter into a further validation column what it's variation actually is, for example:
[TABLE="width: 476"]
<tbody>[TR]
[TD]Title[/TD]
[TD]Is Variant[/TD]
[TD]Vanilla Title[/TD]
[TD]Variant Option[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Black[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Black[/TD]
[/TR]
[TR]
[TD]Scented Candle Rose Blue[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Blue[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Yellow[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Yellow[/TD]
[/TR]
[TR]
[TD]Scented Candle (Rose) Red[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Red[/TD]
[/TR]
</tbody>[/TABLE]
The purpose of this is so I can sift through what is a very large dataset of over 20,000 lines and avoid duplicating products to appear on a website and instead populate with unique records which, where applicable, have variations in colour, size or design.
I've tried using Fuzzy lookup but this seems not to work on one column of data. I've tried using Find, Match, IF but my excel skills are limited beyond this. I know there's bound to be an easier way of doing this. Currently my only work around is to manually skip down each cell removing the 'variant' data leaving just the 'vanilla' data and then using substitute to enter the data I deleted into an adjacent column which I then use as the column for the option.
The problem is of course made worse in that there is not always a pattern to the text I'm supplied with, so the variant can appear at the start, middle or end of the text.
Help?
The problem I have is I have large datasets that contain amongst other things a column which contains text. Many of the text entries are potential variations of what is essentially the same item. For example:
Column A
.............
Rose Scented Candle Black
Scented Candle Rose Blue
Rose Scented Candle Yellow
Scented Candle (Rose) Red
All of the above could loosely be reffered to as simply 'Rose Scented Candle'. So if I could find which items in my column were potentially the same, I could then go a step further and enter a validation column:
[TABLE="width: 380"]
<tbody>[TR]
[TD]Title[/TD]
[TD]Is Variant[/TD]
[TD]Vanilla Title[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Black[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
[TR]
[TD]Scented Candle Rose Blue[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Yellow[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
[TR]
[TD]Scented Candle (Rose) Red[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[/TR]
</tbody>[/TABLE]
This would mean I could then identify a SINGLE title, e.g. 'Rose Scented Candle' and have this in an adjacent column. My final step would then be to enter into a further validation column what it's variation actually is, for example:
[TABLE="width: 476"]
<tbody>[TR]
[TD]Title[/TD]
[TD]Is Variant[/TD]
[TD]Vanilla Title[/TD]
[TD]Variant Option[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Black[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Black[/TD]
[/TR]
[TR]
[TD]Scented Candle Rose Blue[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Blue[/TD]
[/TR]
[TR]
[TD]Rose Scented Candle Yellow[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Yellow[/TD]
[/TR]
[TR]
[TD]Scented Candle (Rose) Red[/TD]
[TD]YES[/TD]
[TD]Rose Scented Candle[/TD]
[TD]Red[/TD]
[/TR]
</tbody>[/TABLE]
The purpose of this is so I can sift through what is a very large dataset of over 20,000 lines and avoid duplicating products to appear on a website and instead populate with unique records which, where applicable, have variations in colour, size or design.
I've tried using Fuzzy lookup but this seems not to work on one column of data. I've tried using Find, Match, IF but my excel skills are limited beyond this. I know there's bound to be an easier way of doing this. Currently my only work around is to manually skip down each cell removing the 'variant' data leaving just the 'vanilla' data and then using substitute to enter the data I deleted into an adjacent column which I then use as the column for the option.
The problem is of course made worse in that there is not always a pattern to the text I'm supplied with, so the variant can appear at the start, middle or end of the text.
Help?