Hello everyone,
Big fan of the Mr. Excel forum, and first time posting. Thanks in advance for any advice!
A small nonprofit that I am helping gave me a dataset that tracks their donations and I am trying to clean it up for them so we can run analytics. However, the dates for each row (1 row per donation) are entered in a variety of numeric and text formats. Below is a sample of the data I have and how I will need it at the end. I prefer formulas, as I don't know VBA, but am willing to follow VBA instructions if provided. I am using Excel from the Microsoft 365 Pro Plus suite.
Also please note, that I have been discussing this over on the ExcelForum.com with partial success (link below). I will copy over formulas that got partway to the end goal outputs.
How to clean up and format very messy date data (mix of text and date)
Here is a sample of my data, where for our reference the first row are the Excel column letters and first column are the Excel rows.
Column
A) a unique identifier number (not shown in table below) so I can merge my changes back to my master dataset.
B) Date: my original date data
C) IsText: If the date is text (true) or numeric (false)
D) ShouldBe: The date in A, as an Excel date formatted here as DD-MM-YY for readability.
E) ShouldBe2: Any non-date text in A is extracted into this column
F) FixDate: Best attempt so far to convert the date, written in unformatted Excel date
G) FixDate2: FixDate reformatted to compare result
H) FixNonDate: Any non-date extraction attempts thus far into a new column.
I) Comments
For dates that are not complete (e.g. missing MM or DD), I don't mind replacing the missing values with 01.
Some of the original dates have date and non-date information. I need these in extracted into separate columns, as shows below. In a few cases, the solution formulas so far extract only part of the non-date text. If we cannot fully fix that formula, that is ok; I can manually find and fix these few instances.
[TABLE="class: grid, width: 1100"]
<tbody>[TR]
[TD][/TD]
[TD]B[/TD]
[TD]C[/TD]
[TD]D[/TD]
[TD]E[/TD]
[TD]F[/TD]
[TD]G[/TD]
[TD]H[/TD]
[TD]I[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Date[/TD]
[TD]IsText[/TD]
[TD]ShouldBe[/TD]
[TD]NonDate[/TD]
[TD]FixDate[/TD]
[TD]FixDate2[/TD]
[TD]FixNonDate[/TD]
[TD]Comments[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]04-10-16[/TD]
[TD]False[/TD]
[TD]10-Apr-16[/TD]
[TD][/TD]
[TD]38994[/TD]
[TD]4-Oct-16[/TD]
[TD][/TD]
[TD]FixDate2 MM & DD flipped.[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]09-16-2004[/TD]
[TD]True[/TD]
[TD]16-Sep-04[/TD]
[TD][/TD]
[TD]42614[/TD]
[TD]1-Sep-16[/TD]
[TD][/TD]
[TD]FixDate2 DD as year, YY dropped, and 1 as new DD.[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Nov. ‘02[/TD]
[TD]True[/TD]
[TD]01-Nov-02[/TD]
[TD][/TD]
[TD]37561[/TD]
[TD]1-Nov-02[/TD]
[TD][/TD]
[TD]This works.[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]Jan/Feb 2011[/TD]
[TD]True[/TD]
[TD]01-Feb-11[/TD]
[TD][/TD]
[TD]40575[/TD]
[TD]1-Feb-11[/TD]
[TD][/TD]
[TD]This works.[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD]8/12/04 (scholarship)[/TD]
[TD]True[/TD]
[TD]12-Aug-04[/TD]
[TD](scholarship)[/TD]
[TD]38329[/TD]
[TD]8-Dec-04[/TD]
[TD](scholarship)[/TD]
[TD]FixDate2 MM & DD flipped.
FixNonDate column ok.[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Paid[/TD]
[TD]True[/TD]
[TD][/TD]
[TD]Paid[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Col E to G are empty ("").
Previous solutions had these as #N/A
Paid should appear in FixNonDate with FixDate empty ("").[/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]03/27/06[/TD]
[TD]False[/TD]
[TD]27-Mar-06[/TD]
[TD][/TD]
[TD]46447[/TD]
[TD]1-Mar-27[/TD]
[TD][/TD]
[TD]FixDate2 DD as year, YY dropped,
and 1 as new DD.[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]02/22/04[/TD]
[TD]True[/TD]
[TD]22-Feb-04[/TD]
[TD][/TD]
[TD]44593[/TD]
[TD]1-Feb-22[/TD]
[TD][/TD]
[TD]FixDate2 DD as year, YY dropped,
and 1 as new DD[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD]2005[/TD]
[TD]False[/TD]
[TD]01-Jan-05[/TD]
[TD][/TD]
[TD]38353[/TD]
[TD]1-Jan-05[/TD]
[TD][/TD]
[TD]This works.[/TD]
[/TR]
[TR]
[TD]11[/TD]
[TD]Deposited in
Bratislava in 2001[/TD]
[TD]True[/TD]
[TD]01-Jan-01[/TD]
[TD][/TD]
[TD]2001[/TD]
[TD]23-Jun-05[/TD]
[TD]in Bratislava in 2001[/TD]
[TD]Found YY in Date, but entered it as the unformatted Excel date.
Partial extraction of non-date text into FixNonDate.[/TD]
[/TR]
</tbody>[/TABLE]
Thanks to AlKey and FlameRetired over on the ExcelForum.com thread. FlameRetired's formulas got me the closest to what I need:
Formula to extract and fix the date
Formula to extract the non-date text
Thanks again for the time and consideration!
Big fan of the Mr. Excel forum, and first time posting. Thanks in advance for any advice!
A small nonprofit that I am helping gave me a dataset that tracks their donations and I am trying to clean it up for them so we can run analytics. However, the dates for each row (1 row per donation) are entered in a variety of numeric and text formats. Below is a sample of the data I have and how I will need it at the end. I prefer formulas, as I don't know VBA, but am willing to follow VBA instructions if provided. I am using Excel from the Microsoft 365 Pro Plus suite.
Also please note, that I have been discussing this over on the ExcelForum.com with partial success (link below). I will copy over formulas that got partway to the end goal outputs.
How to clean up and format very messy date data (mix of text and date)
Here is a sample of my data, where for our reference the first row are the Excel column letters and first column are the Excel rows.
Column
A) a unique identifier number (not shown in table below) so I can merge my changes back to my master dataset.
B) Date: my original date data
C) IsText: If the date is text (true) or numeric (false)
D) ShouldBe: The date in A, as an Excel date formatted here as DD-MM-YY for readability.
E) ShouldBe2: Any non-date text in A is extracted into this column
F) FixDate: Best attempt so far to convert the date, written in unformatted Excel date
G) FixDate2: FixDate reformatted to compare result
H) FixNonDate: Any non-date extraction attempts thus far into a new column.
I) Comments
For dates that are not complete (e.g. missing MM or DD), I don't mind replacing the missing values with 01.
Some of the original dates have date and non-date information. I need these in extracted into separate columns, as shows below. In a few cases, the solution formulas so far extract only part of the non-date text. If we cannot fully fix that formula, that is ok; I can manually find and fix these few instances.
[TABLE="class: grid, width: 1100"]
<tbody>[TR]
[TD][/TD]
[TD]B[/TD]
[TD]C[/TD]
[TD]D[/TD]
[TD]E[/TD]
[TD]F[/TD]
[TD]G[/TD]
[TD]H[/TD]
[TD]I[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Date[/TD]
[TD]IsText[/TD]
[TD]ShouldBe[/TD]
[TD]NonDate[/TD]
[TD]FixDate[/TD]
[TD]FixDate2[/TD]
[TD]FixNonDate[/TD]
[TD]Comments[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]04-10-16[/TD]
[TD]False[/TD]
[TD]10-Apr-16[/TD]
[TD][/TD]
[TD]38994[/TD]
[TD]4-Oct-16[/TD]
[TD][/TD]
[TD]FixDate2 MM & DD flipped.[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]09-16-2004[/TD]
[TD]True[/TD]
[TD]16-Sep-04[/TD]
[TD][/TD]
[TD]42614[/TD]
[TD]1-Sep-16[/TD]
[TD][/TD]
[TD]FixDate2 DD as year, YY dropped, and 1 as new DD.[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Nov. ‘02[/TD]
[TD]True[/TD]
[TD]01-Nov-02[/TD]
[TD][/TD]
[TD]37561[/TD]
[TD]1-Nov-02[/TD]
[TD][/TD]
[TD]This works.[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]Jan/Feb 2011[/TD]
[TD]True[/TD]
[TD]01-Feb-11[/TD]
[TD][/TD]
[TD]40575[/TD]
[TD]1-Feb-11[/TD]
[TD][/TD]
[TD]This works.[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD]8/12/04 (scholarship)[/TD]
[TD]True[/TD]
[TD]12-Aug-04[/TD]
[TD](scholarship)[/TD]
[TD]38329[/TD]
[TD]8-Dec-04[/TD]
[TD](scholarship)[/TD]
[TD]FixDate2 MM & DD flipped.
FixNonDate column ok.[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Paid[/TD]
[TD]True[/TD]
[TD][/TD]
[TD]Paid[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Col E to G are empty ("").
Previous solutions had these as #N/A
Paid should appear in FixNonDate with FixDate empty ("").[/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]03/27/06[/TD]
[TD]False[/TD]
[TD]27-Mar-06[/TD]
[TD][/TD]
[TD]46447[/TD]
[TD]1-Mar-27[/TD]
[TD][/TD]
[TD]FixDate2 DD as year, YY dropped,
and 1 as new DD.[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]02/22/04[/TD]
[TD]True[/TD]
[TD]22-Feb-04[/TD]
[TD][/TD]
[TD]44593[/TD]
[TD]1-Feb-22[/TD]
[TD][/TD]
[TD]FixDate2 DD as year, YY dropped,
and 1 as new DD[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD]2005[/TD]
[TD]False[/TD]
[TD]01-Jan-05[/TD]
[TD][/TD]
[TD]38353[/TD]
[TD]1-Jan-05[/TD]
[TD][/TD]
[TD]This works.[/TD]
[/TR]
[TR]
[TD]11[/TD]
[TD]Deposited in
Bratislava in 2001[/TD]
[TD]True[/TD]
[TD]01-Jan-01[/TD]
[TD][/TD]
[TD]2001[/TD]
[TD]23-Jun-05[/TD]
[TD]in Bratislava in 2001[/TD]
[TD]Found YY in Date, but entered it as the unformatted Excel date.
Partial extraction of non-date text into FixNonDate.[/TD]
[/TR]
</tbody>[/TABLE]
Thanks to AlKey and FlameRetired over on the ExcelForum.com thread. FlameRetired's formulas got me the closest to what I need:
Formula to extract and fix the date
Code:
=IFERROR(IF(ISNUMBER(B2)*(LEN(B2)=4),DATE(B2,1,1),IF(LEN(B2)>1,IFERROR(LOOKUP(1E+306,--LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,".",""),CHAR(145),""),"//","/"),"?",""),"(",""),")",""),"$",""),ROW($1:$30))),LOOKUP(1E+306,--RIGHT(B2,ROW($1:$30)))),"")),"")
Formula to extract the non-date text
Code:
=IFERROR(IF(LEN(MID(B2,FIND(" ",B2)+1,99))>4,MID(B2,FIND(" ",B2)+1,99),""),"")
Thanks again for the time and consideration!