Split out fields after specfic characters and lengths

lbanham

Board Regular
Joined
Feb 17, 2011
Messages
50
Hi
I have been trying to sovle this for a day now, and its frustated me.
I have data in a column (Name) and dependant on where the data is in the string and the number of characters, depends on what column it should sit in. I have pasted a table below showing the possible combinations and how it should split out. The fields have a set number of characters with exception of Set 4 which can be 1 -8 characters in lenth.

Set 1 & 2 are fine, using Mid/left etc that works fine. its getting Set 3 and Set4 to work for each scenario i cant get to work. I have tried various combinations of Mid/LEN/FIND etc with no joy.
Any help would be appreciated
:)

Thanks
Lynsey
[TABLE="width: 666"]
<TBODY>[TR]
[TD="width: 189, bgcolor: transparent"][/TD]
[TD="class: xl2039, width: 175, bgcolor: transparent"]6 Char
[/TD]
[TD="class: xl2039, width: 175, bgcolor: transparent"]5 Char
[/TD]
[TD="class: xl2039, width: 175, bgcolor: transparent"]5 Char
[/TD]
[TD="class: xl2039, width: 175, bgcolor: transparent"]max 8 Char
[/TD]
[/TR]
[TR]
[TD="class: xl2036, bgcolor: yellow"]Name
[/TD]
[TD="class: xl2036, bgcolor: yellow"]Set 1
[/TD]
[TD="class: xl2036, bgcolor: yellow"]Set 2
[/TD]
[TD="class: xl2037, bgcolor: yellow"]Set 3
[/TD]
[TD="class: xl2037, bgcolor: yellow"]Set 4
[/TD]
[/TR]
[TR]
[TD="class: xl2040, bgcolor: transparent"]115113.52010.1ABCD
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]115113
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]52010
[/TD]
[TD="class: xl2039, bgcolor: transparent"]1ABCD
[/TD]
[TD="class: xl2038, bgcolor: transparent"][/TD]
[/TR]
[TR]
[TD="class: xl2040, bgcolor: transparent"]115113.52010.1ABCD.5TEN56
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]115113
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]52010
[/TD]
[TD="class: xl2039, bgcolor: transparent"]1ABCD
[/TD]
[TD="class: xl2041, bgcolor: transparent"]5TEN56
[/TD]
[/TR]
[TR]
[TD="class: xl2039, bgcolor: transparent"]115113.52010.5TEN5678
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]115113
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]52010
[/TD]
[TD="class: xl2039, bgcolor: transparent"][/TD]
[TD="class: xl2041, bgcolor: transparent"]5TEN5678
[/TD]
[/TR]
[TR]
[TD="class: xl2039, bgcolor: transparent"]115113.52010
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]115113
[/TD]
[TD="class: xl2035, bgcolor: transparent, align: right"]52010
[/TD]
[TD="bgcolor: transparent"][/TD]
[TD="bgcolor: transparent"][/TD]
[/TR]
</TBODY>[/TABLE]
 

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
Are these the only possible combinations and exact data? If so, you could actually write a pretty simple formula that spits out the elements of the string in one of 4 different ways with the above 4 options (I'm guessing its not that simple, but if it is, I'd probably go that route for this).

The next option would be to search and then search against the next string for the periods like a nested search/mid formula
 
Upvote 0
There is an error in my message above, Set 1 can range from 4 -6 characters. Thought this may make a difference as any reference to data length cannot be fixed.
 
Upvote 0
Hi
There are another few combinations, but I have tried various formulas but the problem is the information when it has the four sections, as it cant reference a fixed number of cells. Or if the third section of data i.e. 5TEN5678 in row 3 below, actually belongs in SET4 not, Set 3.

The actual data will change (its about 3000 rows of data). I have pasted a new table below.

Hope that makes sense
[TABLE="width: 888"]
<TBODY>[TR]
[TD][/TD]
[TD]6 Char</SPAN>[/TD]
[TD]5 Char</SPAN>[/TD]
[TD]5 Char</SPAN>[/TD]
[TD]max 8 Char</SPAN>[/TD]
[/TR]
[TR]
[TD]Name</SPAN>[/TD]
[TD]Set 1</SPAN>[/TD]
[TD]Set 2</SPAN>[/TD]
[TD]Set 3</SPAN>[/TD]
[TD]Set 4</SPAN>[/TD]
[/TR]
[TR]
[TD]115113.52010.1ABCD</SPAN>[/TD]
[TD="align: right"]115113</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD]1ABCD</SPAN>[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]115113.52010.1ABCD.5TEN56</SPAN>[/TD]
[TD="align: right"]115113</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD]1ABCD</SPAN>[/TD]
[TD]5TEN56</SPAN>[/TD]
[/TR]
[TR]
[TD]115113.52010.5TEN5678</SPAN>[/TD]
[TD="align: right"]115113</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD][/TD]
[TD]5TEN5678</SPAN>[/TD]
[/TR]
[TR]
[TD]115113.52010</SPAN>[/TD]
[TD="align: right"]115113</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]1234.52010.1ABCD</SPAN>[/TD]
[TD="align: right"]1234</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD]1ABCD</SPAN>[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]1234.52010</SPAN>[/TD]
[TD="align: right"]1234</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]1234.52010.1ABCD</SPAN>[/TD]
[TD="align: right"]1234</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD]1ABCD</SPAN>[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]1234.52010.1ABCD.5TEN56</SPAN>[/TD]
[TD="align: right"]1234</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD]1ABCD</SPAN>[/TD]
[TD]5TEN56</SPAN>[/TD]
[/TR]
[TR]
[TD]1234.52010.56238942</SPAN>[/TD]
[TD="align: right"]1234</SPAN>[/TD]
[TD="align: right"]52010</SPAN>[/TD]
[TD][/TD]
[TD]56238942</SPAN>[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</TBODY><COLGROUP><COL><COL span=4></COLGROUP>[/TABLE]
 
Upvote 0
Does set 1 always begin with a 1, set 2 always with a 5, set 3 always with a 1 and set 4 always with a 5?

There are ways to split the pieces into the right buckets, but only if you can clearly define what causes an item to be in one bucket vs another, if that makes sense...?
 
Upvote 0
No the numbers will always vary. What i was doing for Set 1 was, left of the first ".", Set 2 was 5 characters after the second decimel. Set 3 and Set 4 are then defined by if/where the decimal point is and the characters lengths. IF the number of characters after the second "." are more than 5 (and there is not a third ".") then it is Set 4 data, if it is max 5 characters after the second "." it is Set 3, if there is a third "." it should always split it to Set 3 (5 Char) and Set 4 (1-8 Char)

Thanks
 
Upvote 0
[TABLE="width: 344"]
<TBODY>[TR]
[TD]This is a copy of the formula I am currently trying - apologies for the format it wont allow attachements.



Set 3</SPAN>
[/TD]
[TD]Set 4</SPAN>[/TD]
[/TR]
[TR]
[TD]IFERROR(IF(LEN(MID(E1,FIND("^^",SUBSTITUTE(Pivot!E1,".","^^",2))+1,30))=5,MID(E1,FIND("^^",SUBSTITUTE(Pivot!E1,".","^^",2))+1,5),""),"")</SPAN>[/TD]
[TD]IFERROR(IF(H1="",MID(Pivot!E1,FIND("^^",SUBSTITUTE(Pivot!E1,".","^^",2))+1,8),MID(Pivot!E1,FIND("^^",SUBSTITUTE(Pivot!E1,".","^^",3))+1,8)),"")</SPAN>[/TD]
[/TR]
</TBODY><COLGROUP><COL><COL></COLGROUP>[/TABLE]
 
Upvote 0
I was working on a formula and got a headache :-)

Essentially you need to check in set 3 that the data exists at all (since you have some that stop at set 2), probably best accomplished by checking the total len() vs the len() of set 1 + set 2 + a period

Then you need to do a find of the next period starting at the end of the first two strings together and check the length to see if its 5, otherwise that value goes to set 4 (in set 4's own formula)..

and that's about the point I got a headache :-(

I hope its enough to get you back on track, however...
 
Upvote 0
Thanks. I eventually got it to work by using your suggsestion of calculating the differing lengths and by adding two helper columns to calculate if there was 2 or 3 "." and then the subsequent position number of these. It was just Set 3 formula that needed adjusted. It is below incase it helps anyone else FYI . helper columns are J4 (".") and K4 (Position number)

Helper Columns
Firstly calculate how many "." are in a cell - IF($E4="",$J3,LEN($E4)-LEN(SUBSTITUTE($E4,".","")))
Second is the position of the last "." - IF($E4="",$K3,FIND("^^",SUBSTITUTE(Pivot!$E4,".","^^",$J4)))

Main Formula
IF($E4="",$H3,IF($J4=1,"",IF(IF(AND((LEN($E4)-FIND("^^",SUBSTITUTE(Pivot!$E4,".","^^",$J4))=5),$J4=2),MID($E4,$K4+1,5),MID($E4,$K4-5,5))=$G4,"",IF(AND((LEN($E4)-FIND("^^",SUBSTITUTE(Pivot!$E4,".","^^",$J4))=5),$J4=2),MID($E4,$K4+1,5),MID($E4,$K4-5,5)))))

This now works for all length variations and combinations of data sets. Phew! No doubt there is a cleaner way to do this, but it works for me just now.

Thanks again
:)
 
Upvote 0

Forum statistics

Threads
1,223,236
Messages
6,170,917
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top