Hi, everybody!
I got a solution for a weird problem I had when creating a tool that would split compound names in Spanish. Please, read this (Problems when splitting strings of text with compound names.) to fully grasp the gist of the problem.
I got this formula:
... which is a bit bulky, but it is basically able to separate the first and last names (we have two last names in Spanish), while also factoring in prepositions and articles (which also happen in Spanish). This way, if C4 is "Javier de la Puebla Gómez", it would be split as "Javier", "de la Puebla", "Gómez" — "Juan Antonio Yubero Martín" would return "Juan Antonio", "Yubero", "Martín" — "Maria de los Milagros Morante del Pino" would return "María de los Milagros", "Morante", "del Pino", etc. And, in the case I got a person with only one last name, that would also split it efficiently as "Mark", "Jones". This all works perfectly.
My problem now is I didn't realize that it is also customary to write names as lastname1 lastname2, firstname (for example, "Yubero Martín, Juan Antonio" and "Morante del Pino, María de los Milagros") in Spanish. I've tried tweaking the formula here and there, but I only get partial results that work only when I adapt the formula for each example individually. This might be because I am not yet entirely familiar with TEXTJOIN, DROP and HSTACK.
I need a formula that is able to detect if the name has a comma (maybe ISNUMBER(FIND(", ",C4)), I guess), and if that is the case, I would like for it to split the name as in the formula above (that is, if our name is "Ginés de la Merced, María Lucía", I need it to split the name into three columns as "Maria Lucía", "Ginés", "de la Merced" — however, if our name is presented as José María de Oliveira Martínez, it should split it into "José María", "de Oliveira", "Martínez"). Likewise, if the person has only one last name (Jones, Mark), it should be split into "Mark", "Jones". I must apologize for the long explanation. Please, ask me any questions you may have.
(P. D.: The formula is especially clunky because it has to check for every possible preposition and article combination so it splits each name and last name properly. If there is a way to make the text in the nested substitute sequence case insensitive, I would also really appreciate it. My formula looks a bit too long.)
I got a solution for a weird problem I had when creating a tool that would split compound names in Spanish. Please, read this (Problems when splitting strings of text with compound names.) to fully grasp the gist of the problem.
I got this formula:
Excel Formula:
=IF(ISBLANK(C4)," ",IF(LEN(TRIM(C4))-LEN(SUBSTITUTE(C4," ",""))+1=2,LET(arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C4," y "," y-")," Y "," Y-")," i "," i-")," I "," I-")," dels "," dels-")," Dels "," Dels-")," de les "," de-les")," De les "," De-les")," els "," els-")," Els "," Els-")," les ","les-")," Les "," Les-")," De La "," De-La-")," De Las "," De-Las-")," De Los "," De-Los-")," Del "," Del-")," De "," De-")," El "," El-")," La "," La-")," Los "," Los-")," Las "," Las-")," de la "," de-la-")," de las "," de-las-")," de los "," de-los-")," de "," de-")," del "," del-")," el "," el-")," la "," la-")," los "," los-")," las "," las-")," "),"-"," "),last,TAKE(arr,,-2),HSTACK(last)),LET(arr,SUBSTITUTE(TEXTSPLIT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C4," y "," y-")," Y "," Y-")," i "," i-")," I "," I-")," dels "," dels-")," Dels "," Dels-")," de les "," de-les")," De les "," De-les")," els "," els-")," Els "," Els-")," les ","les-")," Les "," Les-")," De La "," De-La-")," De Las "," De-Las-")," De Los "," De-Los-")," Del "," Del-")," De "," De-")," El "," El-")," La "," La-")," Los "," Los-")," Las "," Las-")," de la "," de-la-")," de las "," de-las-")," de los "," de-los-")," de "," de-")," del "," del-")," el "," el-")," la "," la-")," los "," los-")," las "," las-")," "),"-"," "),first,TEXTJOIN(" ",1,DROP(arr,,-2)),last,TAKE(arr,,-2),HSTACK(first,last))))
... which is a bit bulky, but it is basically able to separate the first and last names (we have two last names in Spanish), while also factoring in prepositions and articles (which also happen in Spanish). This way, if C4 is "Javier de la Puebla Gómez", it would be split as "Javier", "de la Puebla", "Gómez" — "Juan Antonio Yubero Martín" would return "Juan Antonio", "Yubero", "Martín" — "Maria de los Milagros Morante del Pino" would return "María de los Milagros", "Morante", "del Pino", etc. And, in the case I got a person with only one last name, that would also split it efficiently as "Mark", "Jones". This all works perfectly.
My problem now is I didn't realize that it is also customary to write names as lastname1 lastname2, firstname (for example, "Yubero Martín, Juan Antonio" and "Morante del Pino, María de los Milagros") in Spanish. I've tried tweaking the formula here and there, but I only get partial results that work only when I adapt the formula for each example individually. This might be because I am not yet entirely familiar with TEXTJOIN, DROP and HSTACK.
I need a formula that is able to detect if the name has a comma (maybe ISNUMBER(FIND(", ",C4)), I guess), and if that is the case, I would like for it to split the name as in the formula above (that is, if our name is "Ginés de la Merced, María Lucía", I need it to split the name into three columns as "Maria Lucía", "Ginés", "de la Merced" — however, if our name is presented as José María de Oliveira Martínez, it should split it into "José María", "de Oliveira", "Martínez"). Likewise, if the person has only one last name (Jones, Mark), it should be split into "Mark", "Jones". I must apologize for the long explanation. Please, ask me any questions you may have.
(P. D.: The formula is especially clunky because it has to check for every possible preposition and article combination so it splits each name and last name properly. If there is a way to make the text in the nested substitute sequence case insensitive, I would also really appreciate it. My formula looks a bit too long.)