Efficient formula to extract unique values from large list (> 10k records)

rubthebuddha

New Member
Joined
Nov 18, 2009
Messages
32
Does anyone have a formula for extracting unique values from a single column of a large (>10k records) list that doesn't bog down Excel?

Formulas are desired, as replacing duplicates and using macros/custom functions are beyond most of my end users.

The formula I usually use is similar to this:

=IFERROR(INDEX($B$2:$B$100,MATCH(0,COUNTIF($A$1:$A1,$B$2:$B$100),0)),"")

in which the data set is in B and the values are returned in A. It works for a few hundred rows just fine, but when I get in the thousands, Excel gets bogged down and becomes almost non-functional.
 

Excel Facts

What is the last column in Excel?
Excel columns run from A to Z, AA to AZ, AAA to XFD. The last column is XFD.
I realize it is not always an option, but a pivot table is one of the best ways to get a list of unique values
 
Upvote 0
Hi.

Certainly commencing the construction with IFERROR is not going to help matters.

To how many rows do you usually copy this formula down so as to ensure that you capture all possible returns? What governs your choice of number here? Do you, for example, choose a suitably large upper value which you are pretty sure/certain will always be sufficient? Could this sometimes result in many rows with formulas in which are returning blanks?

See here for more on the danger of using IFERROR in this way:

microsoft excel - Look up a value in a list and return ALL multiple corresponding values - Super User

Secondly, it's possible that the alternative with FREQUENCY/MATCH is more efficient than the set-up with COUNTIF that you are using, though I would think that the issue may be more related to the point I just made re IFERROR.

Regards
 
Upvote 0
in an adjacent column, a formula like

=IF(COUNTIF($A$1:A1,A1)=1,1,0)

then use it to filter for the 1's,


or by "unique", do you mean the values that only occur once?
 
Upvote 0
Does anyone have a formula for extracting unique values from a single column of a large (>10k records) list that doesn't bog down Excel?

Formulas are desired, as replacing duplicates and using macros/custom functions are beyond most of my end users.

The formula I usually use is similar to this:

=IFERROR(INDEX($B$2:$B$100,MATCH(0,COUNTIF($A$1:$A1,$B$2:$B$100),0)),"")

in which the data set is in B and the values are returned in A. It works for a few hundred rows just fine, but when I get in the thousands, Excel gets bogged down and becomes almost non-functional.

hello!
This is super fast formula
try;

Book1
BCD
1IDcount uniqueunique list
2anna5anna
3joejoe
4frankfrank
5joejulie
6annageorge
7joe
8frank
9julie
10george
11anna
12anna
13joe
14frank
15joe
16anna
17joe
18frank
6
Cell Formulas
RangeFormula
D2=IF(ROWS($D$2:D2)>$C$2,"",INDEX(rangedup,SMALL(IF(FREQUENCY(IF(rangedup<>"",MATCH(rangedup,rangedup,0)),ROW(rangedup)-ROW(B2)+1),ROW(rangedup)-ROW(B2)+1),ROWS($J$2:J2))))
C2{=SUM(IF(FREQUENCY(IF(rangedup<>"",MATCH(rangedup,rangedup,0)),ROW(rangedup)-ROW(B2)+1),1))}
Press CTRL+SHIFT+ENTER to enter array formulas.
Named Ranges
NameRefers ToCells
range='1'!$A$2:$A$22
rangedup='6'!$B$2:$B$18
 
Upvote 0
Thank You guys For Yours I LIKE
I 'm glad you like my formula but still i need to learn a lot expecially from my teacher XOR LX since I joined to its website I'm getting magic Thank You XOR LX
 
Upvote 0
Thank You guys For Yours I LIKE
I 'm glad you like my formula but still i need to learn a lot expecially from my teacher XOR LX since I joined to its website I'm getting magic Thank You XOR LX

You're welcome! :)

Just one thing, though. Are you sure that the B2 part in ROW(B2) shouldn't be B$2 (or $B$2)?

Regards

[TABLE="width: 943"]
<tbody>[TR]
[TD]Worksheet Formulas[TABLE="width: 932"]
<tbody>[TR="bgcolor: #DAE7F5"]
[TH]Cell[/TH]
[TH]Formula[/TH]
[/TR]
[TR]
[TH="bgcolor: #DAE7F5"]D2[/TH]
[TD]=IF(ROWS($D$2:D2)>$C$2,"",INDEX(rangedup,SMALL(IF(FREQUENCY(IF(rangedup<>"",MATCH(rangedup,rangedup,0)),ROW(rangedup)-ROW(B2)+1),ROW(rangedup)-ROW(B2)+1),ROWS($J$2:J2))))[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]

[TABLE="width: 761"]
<tbody>[TR]
[TD]Array Formulas[TABLE="width: 750"]
<tbody>[TR="bgcolor: #DAE7F5"]
[TH]Cell[/TH]
[TH]Formula[/TH]
[/TR]
[TR]
[TH="bgcolor: #DAE7F5"]C2[/TH]
[TD]{=SUM(IF(FREQUENCY(IF(rangedup<>"",MATCH(rangedup,rangedup,0)),ROW(rangedup)-ROW(B2)+1),1))}[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
@ XOR LX Thank you for pointing out it, sometimes I' m concentrated with the hard part of the formula so I forget about cell references which are also very important tank you
Regards
 
Upvote 0
This is perhaps another reason why, despite its extra function call, I tend to favour:

ROW(rangedup)-MIN(ROW(rangedup))+1

over:

ROW(rangedup)-ROW(B2)+1

especially when working with Named Ranges.

The only reference in the first is the range rangedup, whereas the second explicitly involves a second reference (B2). Of course, you could also adapt that one to:

ROW(rangedup)-ROW(INDEX(rangedup,1))+1

if you wanted, though I personally tend to find that version a touch less intuitive than the MIN construction.

Regards
 
Upvote 0
hello!
This is super fast formula
try;
BCD
annaanna
joejoe
frankfrank
joejulie
annageorge
joe
frank
julie
george
anna
anna
joe
frank
joe
anna
joe
frank

<colgroup><col style="width: 25pxpx"><col><col><col></colgroup><thead>
</thead><tbody>
[TD="align: center"]1[/TD]
[TD="bgcolor: #FF0000"]ID[/TD]
[TD="bgcolor: #FFFF00"]count unique[/TD]
[TD="bgcolor: #FFFF00"]unique list[/TD]

[TD="align: center"]2[/TD]

[TD="align: right"]5[/TD]

[TD="align: center"]3[/TD]

[TD="align: right"][/TD]

[TD="align: center"]4[/TD]

[TD="align: right"][/TD]

[TD="align: center"]5[/TD]

[TD="align: right"][/TD]

[TD="align: center"]6[/TD]

[TD="align: right"][/TD]

[TD="align: center"]7[/TD]

[TD="align: right"][/TD]

[TD="align: center"]8[/TD]

[TD="align: right"][/TD]

[TD="align: center"]9[/TD]

[TD="align: right"][/TD]

[TD="align: center"]10[/TD]

[TD="align: right"][/TD]

[TD="align: center"]11[/TD]

[TD="align: right"][/TD]

[TD="align: center"]12[/TD]

[TD="align: right"][/TD]

[TD="align: center"]13[/TD]

[TD="align: right"][/TD]

[TD="align: center"]14[/TD]

[TD="align: right"][/TD]

[TD="align: center"]15[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]16[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]17[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]18[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

</tbody>
6

[TABLE="width: 85%"]
<tbody>[TR]
[TD]Worksheet Formulas[TABLE="width: 100%"]
<thead>[TR="bgcolor: #DAE7F5"]
[TH="width: 10px"]Cell[/TH]
[TH="align: left"]Formula[/TH]
[/TR]
</thead><tbody>[TR]
[TH="width: 10px, bgcolor: #DAE7F5"]D2[/TH]
[TD="align: left"]=IF(ROWS($D$2:D2)>$C$2,"",INDEX(rangedup,SMALL(IF(FREQUENCY(IF(rangedup<>"",MATCH(rangedup,rangedup,0)),ROW(rangedup)-ROW(B2)+1),ROW(rangedup)-ROW(B2)+1),ROWS($J$2:J2))))[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]

[TABLE="width: 85%"]
<tbody>[TR]
[TD]Array Formulas[TABLE="width: 100%"]
<thead>[TR="bgcolor: #DAE7F5"]
[TH="width: 10px"]Cell[/TH]
[TH="align: left"]Formula[/TH]
[/TR]
</thead><tbody>[TR]
[TH="width: 10px, bgcolor: #DAE7F5"]C2[/TH]
[TD="align: left"]{=SUM(IF(FREQUENCY(IF(rangedup<>"",MATCH(rangedup,rangedup,0)),ROW(rangedup)-ROW(B2)+1),1))}[/TD]
[/TR]
</tbody>[/TABLE]
Entered with Ctrl+Shift+Enter. If entered correctly, Excel will surround with curly braces {}.
Note: Do not try and enter the {} manually yourself[/TD]
[/TR]
</tbody>[/TABLE]

[TABLE="width: 85%"]
<tbody>[TR]
[TD]Workbook Defined Names[TABLE="width: 100%"]
<thead>[TR="bgcolor: #DAE7F5"]
[TH="width: 10px"]Name[/TH]
[TH="align: left"]Refers To[/TH]
[/TR]
</thead><tbody>[TR]
[TH="width: 10px, bgcolor: #DAE7F5"]range[/TH]
[TD="align: left"]='1'!$A$2:$A$22[/TD]
[/TR]
[TR]
[TH="width: 10px, bgcolor: #DAE7F5"]rangedup[/TH]
[TD="align: left"]='6'!$B$2:$B$18[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]

Hello All,

Sorry to resurrect an old thread, but I am desperate!

This is my first post, but I was prompted to register as an efficient formula to get unique items from a list would be a real game changer for me. Like the OP, I have a large dataset and methods I have used previously are only usable up to a certain point.

I have tried to impliment GerryZ's solution (with the ROW(B$2) modification suggested by XOR LX) however I still can't get it to work.

A few things I would like to clarify:

1) What does the '1'! and '6'! at the start of the named ranges refer to?
2) where is the named range 'range' used? I can't find it being used in any of the formulae? And moreover why does it refer to column A when that isn't populated in the table shown
3) What is the reference to $J$2:J2 ? Is this simply to reference something blank?

I made a guess to try to get this to work (although I may be stabbing in the dark): I removed the
'1'! and '6'! prefixes before the named ranges.

At the moment my sheet does show the correct number of unique records in C2, and the first unique item is shown in D2, but all the subsequent cells in column D show #NUM!

PLEASE PLEASE HELP IF YOU CAN!

Many thanks in advance,

Russell

 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top