Help with a formula about splitting compound names and last names

RonOliver

Board Regular
Joined
Aug 30, 2022
Messages
99
Office Version
  1. 365
Platform
  1. Windows
Hi, everybody!

I got a solution for a weird problem I had when creating a tool that would split compound names in Spanish. Please, read this (Problems when splitting strings of text with compound names.) to fully grasp the gist of the problem.

I got this formula:

Excel Formula:
=IF(ISBLANK(C4)," ",IF(LEN(TRIM(C4))-LEN(SUBSTITUTE(C4," ",""))+1=2,LET(arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C4," y "," y-")," Y "," Y-")," i "," i-")," I "," I-")," dels "," dels-")," Dels "," Dels-")," de les "," de-les")," De les "," De-les")," els "," els-")," Els "," Els-")," les ","les-")," Les "," Les-")," De La "," De-La-")," De Las "," De-Las-")," De Los "," De-Los-")," Del "," Del-")," De "," De-")," El "," El-")," La "," La-")," Los "," Los-")," Las "," Las-")," de la "," de-la-")," de las "," de-las-")," de los "," de-los-")," de "," de-")," del "," del-")," el "," el-")," la "," la-")," los "," los-")," las "," las-")," "),"-"," "),last,TAKE(arr,,-2),HSTACK(last)),LET(arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C4," y "," y-")," Y "," Y-")," i "," i-")," I "," I-")," dels "," dels-")," Dels "," Dels-")," de les "," de-les")," De les "," De-les")," els "," els-")," Els "," Els-")," les ","les-")," Les "," Les-")," De La "," De-La-")," De Las "," De-Las-")," De Los "," De-Los-")," Del "," Del-")," De "," De-")," El "," El-")," La "," La-")," Los "," Los-")," Las "," Las-")," de la "," de-la-")," de las "," de-las-")," de los "," de-los-")," de "," de-")," del "," del-")," el "," el-")," la "," la-")," los "," los-")," las "," las-")," "),"-"," "),first,TEXTJOIN(" ",1,DROP(arr,,-2)),last,TAKE(arr,,-2),HSTACK(first,last))))

... which is a bit bulky, but it is basically able to separate the first and last names (we have two last names in Spanish), while also factoring in prepositions and articles (which also happen in Spanish). This way, if C4 is "Javier de la Puebla Gómez", it would be split as "Javier", "de la Puebla", "Gómez" — "Juan Antonio Yubero Martín" would return "Juan Antonio", "Yubero", "Martín" — "Maria de los Milagros Morante del Pino" would return "María de los Milagros", "Morante", "del Pino", etc. And, in the case I got a person with only one last name, that would also split it efficiently as "Mark", "Jones". This all works perfectly.

My problem now is I didn't realize that it is also customary to write names as lastname1 lastname2, firstname (for example, "Yubero Martín, Juan Antonio" and "Morante del Pino, María de los Milagros") in Spanish. I've tried tweaking the formula here and there, but I only get partial results that work only when I adapt the formula for each example individually. This might be because I am not yet entirely familiar with TEXTJOIN, DROP and HSTACK.

I need a formula that is able to detect if the name has a comma (maybe ISNUMBER(FIND(", ",C4)), I guess), and if that is the case, I would like for it to split the name as in the formula above (that is, if our name is "Ginés de la Merced, María Lucía", I need it to split the name into three columns as "Maria Lucía", "Ginés", "de la Merced" — however, if our name is presented as José María de Oliveira Martínez, it should split it into "José María", "de Oliveira", "Martínez"). Likewise, if the person has only one last name (Jones, Mark), it should be split into "Mark", "Jones". I must apologize for the long explanation. Please, ask me any questions you may have.

(P. D.: The formula is especially clunky because it has to check for every possible preposition and article combination so it splits each name and last name properly. If there is a way to make the text in the nested substitute sequence case insensitive, I would also really appreciate it. My formula looks a bit too long.)
 

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
Let me add this in case it helps:

Excel Formula:
=CONCAT(RIGHT(C4,LEN(C4)-FIND(", ",C4)-1)," ",LEFT(C4,FIND(", ",C4)-1))

... would turn the full name from lastname1 lastname2, firstname around to firstname lastname1 lastname2, which is exactly what I need, but I can't figure out the way to get whatever there is in the LET function to operate on that instead of what's in the original C4 cell.
 
Upvote 0
Nevermind, I got it.

Excel Formula:
=LET(comma,CONCAT(RIGHT(C4,LEN(C4)-FIND(", ",C4)-1)," ",LEFT(C4,FIND(", ",C4)-1)),arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(comma," y "," y-")," Y "," Y-")," i "," i-")," I "," I-")," dels "," dels-")," Dels "," Dels-")," de les "," de-les")," De les "," De-les")," els "," els-")," Els "," Els-")," les ","les-")," Les "," Les-")," De La "," De-La-")," De Las "," De-Las-")," De Los "," De-Los-")," Del "," Del-")," De "," De-")," El "," El-")," La "," La-")," Los "," Los-")," Las "," Las-")," de la "," de-la-")," de las "," de-las-")," de los "," de-los-")," de "," de-")," del "," del-")," el "," el-")," la "," la-")," los "," los-")," las "," las-")," "),"-"," "),first,TEXTJOIN(" ",1,DROP(arr,,-2)),last,TAKE(arr,,-2),HSTACK(first,last))
... does the trick. If you find a better, simpler way to do this, please let me know. :)
 
Upvote 0
What about this modification to your post #1 formula to cover either circumstance?

22 11 20.xlsm
CDEF
4Javier de la Puebla GómezJavierde la PueblaGómez
5Juan Antonio Yubero MartínJuan AntonioYuberoMartín
6Maria de los Milagros Morante del PinoMaria de los MilagrosMorantedel Pino
7Mark JonesMarkJones
8 
9Yubero Martín, Juan AntonioJuan AntonioYuberoMartín
10Morante del Pino, María de los MilagrosMaría de los MilagrosMorantedel Pino
11Jones, MarkMarkJones
Names
Cell Formulas
RangeFormula
D11:E11,D9:F10,D8,D7:E7,D4:F6D4=LET(t,TRIM(C4),nm,IF(ISNUMBER(FIND(",",t)),MID(t&" "&t,FIND(",",t)+2,LEN(t)-1),t),arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(nm," y "," y-")," Y "," Y-")," i "," i-")," I "," I-")," dels "," dels-")," Dels "," Dels-")," de les "," de-les")," De les "," De-les")," els "," els-")," Els "," Els-")," les ","les-")," Les "," Les-")," De La "," De-La-")," De Las "," De-Las-")," De Los "," De-Los-")," Del "," Del-")," De "," De-")," El "," El-")," La "," La-")," Los "," Los-")," Las "," Las-")," de la "," de-la-")," de las "," de-las-")," de los "," de-los-")," de "," de-")," del "," del-")," el "," el-")," la "," la-")," los "," los-")," las "," las-")," "),"-"," "),last,TAKE(arr,,-2),first,TEXTJOIN(" ",1,DROP(arr,,-2)),IF(nm="","",IF(LEN(nm)-LEN(SUBSTITUTE(nm," ",""))+1=2,HSTACK(last),HSTACK(first,last))))
Dynamic array formulas.
 
Upvote 0
That is indeed much, much shorter. I like how clean that looks. You guys rock! Thanks!
 
Upvote 0
You're welcome. Thanks for the follow-up. :)
Hate to bother you again, but the formula you offered is causing problem with last names (foreign or otherwise) that are hyphenated. For example, "Betsy Bakker-Nort" is now "Betsy" / "Bakker Nort". There is probably no easy way to go around this, but I have to ask. Could you please give me a hand with this?
 
Upvote 0
the formula you offered is causing problem with last names (foreign or otherwise) that are hyphenated.
To be fair, it is actually the formula that you offered that causes that result. My formula just copied what yours does in that regard. ;)

I think that the issue is that Excel cannot distinguish a "-" that hyphenates a name and a "-" that results from all those SUBSTITUTE functions. I don't have much to test with and am not familiar with Spanish name forms but see if changing the character used in the SUBSTITUTE functions helps.

RonOliver.xlsm
CDEF
4Javier de la Puebla GómezJavierde la PueblaGómez
5Juan Antonio Yubero MartínJuan AntonioYuberoMartín
6Maria de los Milagros Morante del PinoMaria de los MilagrosMorantedel Pino
7Mark JonesMarkJones
8 
9Yubero Martín, Juan AntonioJuan AntonioYuberoMartín
10Morante del Pino, María de los MilagrosMaría de los MilagrosMorantedel Pino
11Jones, MarkMarkJones
12Betsy Bakker-NortBetsyBakker-Nort
13Julie-Ann Smith-Jones-LongJulie-AnnSmith-Jones-Long
14Smith-Jones-Long, Julie-AnnJulie-AnnSmith-Jones-Long
Names
Cell Formulas
RangeFormula
D11:E14,D9:F10,D8,D7:E7,D4:F6D4=LET(t,TRIM(C4),nm,IF(ISNUMBER(FIND(",",t)),MID(t&" "&t,FIND(",",t)+2,LEN(t)-1),t),arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(nm," y "," y#")," Y "," Y#")," i "," i#")," I "," I#")," dels "," dels#")," Dels "," Dels#")," de les "," de#les")," De les "," De#les")," els "," els#")," Els "," Els#")," les ","les#")," Les "," Les#")," De La "," De#La#")," De Las "," De#Las#")," De Los "," De#Los#")," Del "," Del#")," De "," De#")," El "," El#")," La "," La#")," Los "," Los#")," Las "," Las#")," de la "," de#la#")," de las "," de#las#")," de los "," de#los#")," de "," de#")," del "," del#")," el "," el#")," la "," la#")," los "," los#")," las "," las#")," "),"#"," "),last,TAKE(arr,,-2),first,TEXTJOIN(" ",1,DROP(arr,,-2)),IF(nm="","",IF(LEN(nm)-LEN(SUBSTITUTE(nm," ",""))+1=2,HSTACK(last),HSTACK(first,last))))
Dynamic array formulas.
 
Upvote 0
Solution
To be fair, it is actually the formula that you offered that causes that result. My formula just copied what yours does in that regard. ;)

I think that the issue is that Excel cannot distinguish a "-" that hyphenates a name and a "-" that results from all those SUBSTITUTE functions. I don't have much to test with and am not familiar with Spanish name forms but see if changing the character used in the SUBSTITUTE functions helps.

RonOliver.xlsm
CDEF
4Javier de la Puebla GómezJavierde la PueblaGómez
5Juan Antonio Yubero MartínJuan AntonioYuberoMartín
6Maria de los Milagros Morante del PinoMaria de los MilagrosMorantedel Pino
7Mark JonesMarkJones
8 
9Yubero Martín, Juan AntonioJuan AntonioYuberoMartín
10Morante del Pino, María de los MilagrosMaría de los MilagrosMorantedel Pino
11Jones, MarkMarkJones
12Betsy Bakker-NortBetsyBakker-Nort
13Julie-Ann Smith-Jones-LongJulie-AnnSmith-Jones-Long
14Smith-Jones-Long, Julie-AnnJulie-AnnSmith-Jones-Long
Names
Cell Formulas
RangeFormula
D11:E14,D9:F10,D8,D7:E7,D4:F6D4=LET(t,TRIM(C4),nm,IF(ISNUMBER(FIND(",",t)),MID(t&" "&t,FIND(",",t)+2,LEN(t)-1),t),arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(nm," y "," y#")," Y "," Y#")," i "," i#")," I "," I#")," dels "," dels#")," Dels "," Dels#")," de les "," de#les")," De les "," De#les")," els "," els#")," Els "," Els#")," les ","les#")," Les "," Les#")," De La "," De#La#")," De Las "," De#Las#")," De Los "," De#Los#")," Del "," Del#")," De "," De#")," El "," El#")," La "," La#")," Los "," Los#")," Las "," Las#")," de la "," de#la#")," de las "," de#las#")," de los "," de#los#")," de "," de#")," del "," del#")," el "," el#")," la "," la#")," los "," los#")," las "," las#")," "),"#"," "),last,TAKE(arr,,-2),first,TEXTJOIN(" ",1,DROP(arr,,-2)),IF(nm="","",IF(LEN(nm)-LEN(SUBSTITUTE(nm," ",""))+1=2,HSTACK(last),HSTACK(first,last))))
Dynamic array formulas.
You're absolutely right. I was thinking about this just yesterday and I didn't realize I had used hyphens in the substitute function. Again, thank you for everything! :)
 
Upvote 0
You're welcome. Thanks for the follow-up. :)
(Should the solution post be changed?)
 
Upvote 0

Forum statistics

Threads
1,223,230
Messages
6,170,883
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top