Identify duplicate titles

dininaus

New Member
Joined
Jul 27, 2016
Messages
22
Hi All,

Could someone please assist with the below query as I couldn't figure out the way to do it.

I have two columns(A&B) with document numbers (same numbers exist in column A due to multiple versions- draft, version 1, final etc) and Titles. Some document numbers have same titles, which has to be identified (in a separate column) and also corrected by adding _a,_b and so on.. to the end of the title.

[TABLE="width: 627"]
<tbody>[TR]
[TD]Document number[/TD]
[TD]Existing Titles[/TD]
[TD]Correct Title[/TD]
[/TR]
[TR]
[TD]N-1000-05-TS-1001[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[/TR]
[TR]
[TD]N-1000-05-TS-1001[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[/TR]
[TR]
[TD]N-1000-05-TS-1001[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[/TR]
[TR]
[TD]N-1000-10-DH-003[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY_A[/TD]
[/TR]
[TR]
[TD]N-1000-10-DH-003[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY_A[/TD]
[/TR]
[TR]
[TD]N-5000-40-TP-0024_20[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY_B[/TD]
[/TR]
[TR]
[TD]N-5000-40-TP-0024_20[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY_B[/TD]
[/TR]
[TR]
[TD]N-5000-40-TP-0024_20[/TD]
[TD]TC FOR STANDARD SURVEY[/TD]
[TD]TC FOR STANDARD SURVEY_B[/TD]
[/TR]
</tbody>[/TABLE]

Thanks in advance!
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
This only works if the same document numbers are adjacent to each other

Excel 2016 (Windows) 32 bit
[Table="width:, class:head"][tr=bgcolor:#E0E0F0][th] [/th][th]
A
[/th][th]
B
[/th][th]
C
[/th][/tr]
[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
1
[/td][td]Document Number[/td][td]Title[/td][td] Formula in B2 copied down[/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
2
[/td][td]N-1000-05-TS-1001[/td][td]TC FOR STANDARDS SURVEY[/td][td] =SUBSTITUTE("TC FOR STANDARDS SURVEY"&IFERROR("_"&CHAR(SUMPRODUCT(1/COUNTIF(A$2:A2,A$2:A2&""))+63),""),"_@","")[/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
3
[/td][td]N-1000-05-TS-1001[/td][td]TC FOR STANDARDS SURVEY[/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
4
[/td][td]N-1000-05-TS-1001[/td][td]TC FOR STANDARDS SURVEY[/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
5
[/td][td]N-1000-10-DH-003[/td][td]TC FOR STANDARDS SURVEY_A[/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
6
[/td][td]N-1000-10-DH-003[/td][td]TC FOR STANDARDS SURVEY_A[/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
7
[/td][td]N-5000-40-TP-0024_20[/td][td]TC FOR STANDARDS SURVEY_B[/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
8
[/td][td]N-5000-40-TP-0024_20[/td][td]TC FOR STANDARDS SURVEY_B[/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
9
[/td][td]N-5000-40-TP-0024_20[/td][td]TC FOR STANDARDS SURVEY_B[/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
10
[/td][td][/td][td][/td][td][/td][/tr]

[tr=bgcolor:#FFFFFF][td=bgcolor:#E0E0F0]
11
[/td][td][/td][td][/td][td][/td][/tr]
[/table]
[Table="width:, class:grid"][tr][td]Sheet: Sheet10[/td][/tr][/table]
 
Last edited:
Upvote 0
another way to write the formula in B2

="TC FOR STANDARDS SURVEY"&SUBSTITUTE(IFERROR("_"&CHAR(SUMPRODUCT(1/COUNTIF(A$2:A2,A$2:A2&""))+63),""),"_@","")


Logic of both formulas


count unique values in column A from A2 to current row
SUMPRODUCT(1/COUNTIF(A$2:A2,A$2:A2&"")

add 63 to that count

convert that value to alpha (note - this formula only works if there are fewer than 27 unique document numbers in column A)
CHAR(SUMPRODUCT(1/COUNTIF(A$2:A2,A$2:A2&""))+63)

CHAR(64) is @
CHAR(65) - CHAR(90) are alpha characters A - Z

Remove _@ to leave the first document number with title exluding alpha suffix
SUBSTITUTE(
IFERROR("_"&CHAR(SUMPRODUCT(1/COUNTIF(A$2:A2,A$2:A2&""))+63),""),"_@","")
 
Last edited:
Upvote 0
Thanks Yongle, I think I didn't explain the requirement well. I have around 600000+ document numbers with different titles. Multiple document numbers exist in a column because of multiple versions of same document. Title's have to be unique for document numbers, if any title duplicate exists in column B we decided to add _a _b_c and so on to make the title unique for that number. Thank you. Please see the result after trying your formula
YRSCG9z


https://ibb.co/YRSCG9z
 
Last edited:
Upvote 0
In that case, your solution is likely to require array formula (not one of my strengths :-()
 
Upvote 0

Forum statistics

Threads
1,224,822
Messages
6,181,165
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top