Formula to remove duplicate co-ordinate pairs

eejwo · Nov 29, 2018

I have a list of x,y co-ordinate pairs (x1,y1 x2,y2) that describe the two end points of vertices for adjoining polygonal cells.

However within this list I have duplicates, as adjoining cells share vertices. Where the co-ordinate pairs are direct duplicates, obviously they are very easy to remove, but the x,y pairs can also be reversed and describe the same vertex. e.g. Vertex "x" for cell "x" maybe described in the list by x1,y1 x2,y2, whilst vertex "x" for cell "y" (i.e. the same vertex) might be in the list as x2,y2 x1,y1, therefore appearing as unique in the list but describing the same vertex.

I want to be left with a list of co-ordinate pairs that don't contain any duplicate vertices.

I can easily identify which pairs have duplicates by reversing the order of the x,y pairs and comparing to the original. But once identified, does someone have a method for removing and leaving unique values?

Thanks

AlanY · Nov 29, 2018

could you post a small sample of the data here?

eejwo · Nov 29, 2018

Hi,

Some example data to hopefully make the problem clearer:

[TABLE="width: 500"]
<tbody>[TR]
[TD]x1[/TD]
[TD]y1[/TD]
[TD]x2[/TD]
[TD]y2[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]2[/TD]
[TD]3[/TD]
[TD]4[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]4[/TD]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]6[/TD]
[TD]7[/TD]
[TD]8[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]6[/TD]
[TD]7[/TD]
[TD]8[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]3[/TD]
[TD]5[/TD]
[TD]7[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]4[/TD]
[TD]6[/TD]
[TD]8[/TD]
[/TR]
</tbody>[/TABLE]

I can easily remove row 4, which is a direct duplicate of row 3. However rows 1 & 2 also describe the same vertex, so would like to be able to remove the duplicate here also.

AlanY · Nov 29, 2018

this will pick up the matched co-ords to the other pair, adjust ranges to suit

Book1

A

B

C

D

E

F

1

x1

y1

x2

y2

2

1

2

3

4

TRUE

3

4

1

2

TRUE

4

5

6

7

8

5

6

7

8

6

1

3

5

7

2

4

6

8

Sheet5

Cell Formulas
Range	Formula
F2	{=IFERROR(AND(MATCH(A2&"/"&B2,C$2:C$100&"/"&D$2:D$100,0),MATCH(C2&"/"&D2,A$2:A$100&"/"&B$2:B$100,0)),"")}
Press CTRL+SHIFT+ENTER to enter array formulas.

eejwo · Nov 29, 2018

Thanks, this looks to correctly identify the matched pairs.

How would you then go about removing the duplicate pairs. i.e. remove row 2 or 3 in your example above. Is it possible to somehow mark and keep just the first occurrence?

I have ~ 35,000 rows of data with ~ 10,000 duplicates.

AlanY · Nov 29, 2018

this will mark up the first match then you can use a filter to get them

Book1

A

B

C

D

E

F

1

x1

y1

x2

y2

2

1

2

3

4

3

4

1

2

4

5

6

7

8

TRUE

5

6

7

8

6

1

3

5

7

2

4

6

8

3

4

7

8

Sheet5

Cell Formulas
Range	Formula
F4	{=IFERROR(AND(MATCH(A4&"/"&B4,A5:$A$10&"/"&B5:$B$10,0)=MATCH(C4&"/"&D4,$C5:C$10&"/"&$D5:D$10,0)),"")}
Press CTRL+SHIFT+ENTER to enter array formulas.

AlanY · Nov 29, 2018

Cols F and G find the paired co-ord, and Col H combined them together

Book1

A

B

C

D

E

F

G

H

1

x1

y1

x2

y2

2

1

2

3

4

TRUE

3

4

1

2

FALSE

4

5

6

7

8

TRUE

5

6

7

8

FALSE

6

1

3

5

7

FALSE

7

2

4

6

8

FALSE

8

3

4

7

8

FALSE

Sheet5

Cell Formulas
Range	Formula
F2	{=IFERROR(AND(MATCH(A2&"/"&B2,A3:$A$10&"/"&B3:$B$10,0)=MATCH(C2&"/"&D2,$C3:C$10&"/"&$D3:D$10,0)),"")}
G2	{=IFERROR(AND(MATCH(A2&"/"&B2,C2:C10&"/"&D2:D10,0)=MATCH(C2&"/"&D2,A2:A10&"/"&B2:B10,0)),"")}
H2	{=OR(IFERROR(AND(MATCH(A2&"/"&B2,A3:$A$10&"/"&B3:$B$10,0)=MATCH(C2&"/"&D2,$C3:C$10&"/"&$D3:D$10,0)),"")=TRUE,IFERROR(AND(MATCH(A2&"/"&B2,C2:C10&"/"&D2:D10,0)=MATCH(C2&"/"&D2,A2:A10&"/"&B2:B10,0)),"")=TRUE)}
Press CTRL+SHIFT+ENTER to enter array formulas.

eejwo · Nov 29, 2018

I can't quite see how this works. Well, I see why your formula give the result they do...

However, the only reason that cell G3 isn't reading TRUE is because the lookup array in the MATCH formula isn't absolute? The lookup array needs to be absolute e.g. "$C$2:$C$10&"/"&$D$2:$D$10" otherwise the only reason it's not reading true is because the matched pair isn't within the lookup array?

I'm glad this problem isn't as simple as I feared it might be...

AlanY · Nov 29, 2018

yes, you're right. by using absolute you can pick up the both sets

eejwo · Nov 29, 2018

Sorry, I'm confused. Perhaps my example data was too simplified.

I don't want to pick up both sets. But if I use your suggested formulae over 30,000 rows of data, where the matching pairs of data are not necessarily in adjacent rows, then not having the absolute reference in the MATCH lookup array will mean I miss potential matches.

Formula to remove duplicate co-ordinate pairs

eejwo

New Member

AlanY

Well-known Member

eejwo

New Member

AlanY

Well-known Member

eejwo

New Member

AlanY

Well-known Member

AlanY

Well-known Member

eejwo

New Member

AlanY

Well-known Member

eejwo

New Member

Similar threads

Share this page

Formula to remove duplicate co-ordinate pairs

New Member

Well-known Member

New Member

Well-known Member

New Member

Well-known Member

Well-known Member

New Member

Well-known Member

New Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock