Summary non numeric data

GeolAssistant

New Member
Joined
May 29, 2020
Messages
2
Office Version
  1. 2013
Platform
  1. Windows
Hi Community,

I´m working on a summary sheet (by date and within a time span (between 2 dates)) and all fine with numeric values, but i can´t figure out how to aproach the columns with cells/columns containing text. For some columns i have to get all disctinct values in one cell for a given day, on others get the first and last value.

The source data can have tens of thousands rows and several entries for a single day, adicionally the text values can repeat within a day.

I could loop through the range evaluating each row to see if it is the correct date and get the value, avaluate it again to see if i already have it for that specific date, and then add it to an array, but whit the amount of data and all the times i would have to run it (1 time for each day within the time span) it could take a while, i think there´s have to be a better way.

What is the fastest/efficient way to acommplish that?

This is the source sheet:

FDespacho.png


This is what i have to accomplish:
Summary.png


The summary has one entry for day, i have to get rows B, C, D and E, and that came from sorce sheet in columns S, Q and A respectively.

Thanks in advance
 

Excel Facts

What is the fastest way to copy a formula?
If A2:A50000 contain data. Enter a formula in B2. Select B2. Double-click the Fill Handle and Excel will shoot the formula down to B50000.
Welcome to the Board! I wanted to confirm that the Excel version shown in your profile is correct...Excel 2013? I ask because that will eliminate a convenient option for combining multiple matches into one cell (I'm thinking of the TEXTJOIN function). Instead, you'll either be looking at finding IDs that match your date criteria and placing each in some helper columns that can be hidden somewhere off to the right, and then performing a concatenation of those cells, or using a VBA solution involving a User-Defined Function. Is either one of those approaches not acceptable?

Some questions about your data:
1) What is the largest number of IDs that would ever occupy one cell in the summary table...for example, for May 9, 2020, you show two BT... batch entries, and for May 10th, you show two lot ID entries. Would there ever be more than 5, 7, 10?
2) Are the sample IDs always in sequential order and do all of them begin with "GPO"?
 
Upvote 0
This is what I would do:

1. copy column Q along with the date column to a new sheet.
2. select both columns and remove duplicates on Q. This should remove 99.99% of the rows.

Now, it should be quite easy to find what batch numbers are in a day.

Do the same for column S.

For column A, copy it and the date column to a new sheet, make the two columns a table and filter by date. Now, it should be easy to find the first and last sample ID in a day.

All these can be done using VBA (though I haven't tried them).
 
Upvote 0
Welcome to the Board! I wanted to confirm that the Excel version shown in your profile is correct...Excel 2013? I ask because that will eliminate a convenient option for combining multiple matches into one cell (I'm thinking of the TEXTJOIN function). Instead, you'll either be looking at finding IDs that match your date criteria and placing each in some helper columns that can be hidden somewhere off to the right, and then performing a concatenation of those cells, or using a VBA solution involving a User-Defined Function. Is either one of those approaches not acceptable?

Some questions about your data:
1) What is the largest number of IDs that would ever occupy one cell in the summary table...for example, for May 9, 2020, you show two BT... batch entries, and for May 10th, you show two lot ID entries. Would there ever be more than 5, 7, 10?
2) Are the sample IDs always in sequential order and do all of them begin with "GPO"?
KRice thank´s for your welcome and reply,

Yes, the version we have at work is 2013 at the moment. I´m already working on VBA to summarize the numerical data like count of entries for a given day because the objetive of this work is to prevent user errors in this reports. So VBA Sub or Function is what i want to do with this kind of data (i won´t be doing the summary myself).

1) It won't be too many, in the historical data i've seen 3 or 4, but there are no restriction about it, in a exceptional case it could be more (if for any reason we couldn't dispatch the samples for a few days).
2) It should be always sequential, but then again its not a restriction i guess, and yes they always begin with "GPO".

This is what I would do:

1. copy column Q along with the date column to a new sheet.
2. select both columns and remove duplicates on Q. This should remove 99.99% of the rows.

Now, it should be quite easy to find what batch numbers are in a day.

Do the same for column S.

For column A, copy it and the date column to a new sheet, make the two columns a table and filter by date. Now, it should be easy to find the first and last sample ID in a day.

All these can be done using VBA (though I haven't tried them).

yky thanks for your reply,

I´m tying to manage all the data inside the code, because i won´t be doing the reports myself, and in addition some reports could include a couple years, so i think what you mention might not work for me.

So i guess that what i'm looking for is a way to do it with out looping in all the records with a For Next, is there a built in function for this, or what would be the optimal way to do it, so i don´t hang the computer for 10 minutes (exaggeraton). haha.

i'll do it with a For loop and see how much time it takes, i´ll keep you posted.

THANKS Guys!
 
Upvote 0
For text columns you could create a frequency statistic: which text is occurring how often?
You can count how often each character is used in all text fields of a column, too.
This way you can spot text outliers or special (unwanted) characters easily.
If you have row identifiers for a file, you can also apply change statistics over time (from one period to the next one, for example). You sort the two files by the identifier, and then you can spot which rows were added or deleted or changed (from which text to which different one).
You could finally define limit files which specify legal text values for each column. Now you could warn on illegal values.
I have done this some years ago (just one possible approach):
 
Upvote 0
@GeolAssistant, you might want to give this a try. This relies on a User-Defined Function...install this in a Module in your workbook. This provides some of the same functionality of the TEXTJOIN function that is not available in your version of Excel.

VBA Code:
Public Function TJoin(Sep As String, ParamArray TxtRng() As Variant) As String
On Error Resume Next
'Sep is the separator, set to "" if you don't want any separator. Separator must be string or single cell, not cell range
'TxtRng is the content you want to join. TxtRng can be string, single cell, cell range or array returned from an array function. Empty content will be ignored
Dim OutStr As String 'the output string
Dim i, j, k, l As Integer 'counters
Dim FinArr(), element As Variant 'the final array and a temporary element when transfering between the two arrays

'Go through each item of TxtRng(),  depending on the item type, transform and put it into FinArray()
i = 0 'the counter for TxtRng
j = 0 'the counter for FinArr
k = 0: l = 0 'the counters for the case of array from Excel array formula
Do While i < UBound(TxtRng) + 1
    If TypeName(TxtRng(i)) = "String" Then 'specified string like "t"
        ReDim Preserve FinArr(0 To j)
        FinArr(j) = "blah"
        FinArr(j) = TxtRng(i)
        j = j + 1
    ElseIf TypeName(TxtRng(i)) = "Range" Then 'single cell or range of cell like A1, A1:A2
        For Each element In TxtRng(i)
            ReDim Preserve FinArr(0 To j)
            FinArr(j) = element
            j = j + 1
        Next
    ElseIf TypeName(TxtRng(i)) = "Variant()" Then 'array returned from an Excel array formula
         For k = LBound(TxtRng(0), 1) To UBound(TxtRng(0), 1)
            For l = LBound(TxtRng(0), 2) To UBound(TxtRng(0), 2)
                ReDim Preserve FinArr(0 To j)
                FinArr(j) = TxtRng(0)(k, l)
                j = j + 1
            Next
         Next
    Else
        TJoin = CVErr(xlErrValue)
        Exit Function
    End If
i = i + 1
Loop

'Put each element of the new array into the join string
For i = LBound(FinArr) To UBound(FinArr)
    If FinArr(i) <> "" Then 'Remove this line if you want to include empty strings
    OutStr = OutStr & FinArr(i) & Sep
    End If
Next
 TJoin = Left(OutStr, Len(OutStr) - Len(Sep)) 'remove the ending separator

End Function

Putting this function to use then, I've inserted a formula inside the TJOIN function that will return an array of unique values matching the date criterion...and TJOIN then concatenates those values. I'm also assuming that the lowest and highest row numbers that match the date criterion will hold the GPO code numbers of interest in the From/To columns D & E.
Book1
ABCDEFGHIJKLMNOPQRS
1
2GPO1296525/9/2020BT-025883436
3GPO1296535/9/2020BT-025883436
4GPO1296545/9/2020BT-025883436
5GPO1296555/9/2020BT-025893436
6GPO1296565/9/2020BT-025893436
7GPO1296575/9/2020BT-025893436
8GPO1296585/9/2020BT-025893436
9GPO1296595/9/2020BT-025893436
10GPO1296605/10/2020BT-025903437
11GPO1296615/10/2020BT-025903437
12GPO1296625/10/2020BT-025903437
13GPO1296635/10/2020BT-025903438
14GPO1296645/10/2020BT-025903439
15GPO1296655/11/2020BT-025913439
16GPO1296665/11/2020BT-025923439
17GPO1296675/12/2020BT-025933440
18
19BCDE
20Identificación de las Muestras
21DateLote/ Envio IDBatch IDFromToCantidad de Muestras
225/9/20203436BT-02588, BT-02589GPO129652GPO1296598
235/10/20203437, 3438, 3439BT-02590GPO129660GPO1296645
245/11/20203439BT-02591, BT-02592GPO129665GPO1296662
255/12/20203440BT-02593GPO129667GPO1296671
Sheet3
Cell Formulas
RangeFormula
B22:B25B22=tjoin(", ",IF(FREQUENCY(IF($M$2:$M$17=$A22,MATCH(IF($M$2:$M$17=$A22,$S$2:$S$17),IF($M$2:$M$17=$A22,$S$2:$S$17),0)),ROW($M$2:$M$17)-ROW($M$1))>0,$S$2:$S$17,""))
C22:C25C22=tjoin(", ",IF(FREQUENCY(IF($M$2:$M$17=$A22,MATCH(IF($M$2:$M$17=$A22,$Q$2:$Q$17),IF($M$2:$M$17=$A22,$Q$2:$Q$17),0)),ROW($M$2:$M$17)-ROW($M$1))>0,$Q$2:$Q$17,""))
D22:D25D22=IFERROR(INDEX($A$2:$A$17,AGGREGATE(15,6,(ROW($M$2:$M$17)-ROW($M$1))/($M$2:$M$17=$A22),1)),"")
E22:E25E22=IFERROR(INDEX($A$2:$A$17,AGGREGATE(14,6,(ROW($M$2:$M$17)-ROW($M$1))/($M$2:$M$17=$A22),1)),"")
F22:F25F22=IFERROR(RIGHT(E22,LEN(E22)-3)-RIGHT(D22,LEN(D22)-3),"")+1
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.
 
Upvote 0
yky thanks for your reply,

I´m tying to manage all the data inside the code, because i won´t be doing the reports myself, and in addition some reports could include a couple years, so i think what you mention might not work for me.
I don't see why that would exclude my approach. As I said, my approach can be done using VBA. You can programmatically add a sheet (or just programmatically copy column Q and the date column to an unused area, say, column ZW and ZX, in the source sheet), programmatically copy column Q and the date column to the new sheet, programmatically remove duplicates, leaving unique batch IDs. You then use a FOR loop to extract, by date, the batch IDs. Now, once the duplicates are removed, the FOR loop would run thousands of times faster because the number of rows is now thousands of times less.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,225,122
Messages
6,182,971
Members
453,142
Latest member
Konstako

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top