Skip to content
Advertisement

Is there a way to pair entrys by rows to give pairs such that there is maximum overlap by columns?

Sorry, for the poor phrasing. I am struggling to find a good way to describe this problem, so I will go with an example. Say I have a bunch of people that I need to pair up together based on similarities between rows. If my data looked like something below, how would I go about pairing people up by the number of times a name matches in each row. For example, Name 1 and Name 4 match twice but so do Name 1 and Name 2 so returning either sets of pairs is fine as shows below.


Data:

|Name 1 |Name 2 |Name 4
|Name 3 |Name 4 |
|Name 2 |Name 4 |
|Name 1 |Name 2 |Name 3
|Name 1 |Name 4 |
|Name 2 |

Result:

|Name 1 |Name 2 |
|Name 3 |Name 4 |

or

|Name 1 |Name 4 |
|Name 2 |Name 3 |

I have tried looking at “group by” but since the names aren’t perfectly aligned in each column, it didn’t yield any useful results. I have also tried looking online to no avail as well. So, any input would be greatly appreciated. And do let me know if any further clarifications are required. Thank you.


As a side note: I am doing this analysis in Google Sheets using its modified query function set but the same SQL query formatting.

Advertisement

Answer

for 3 columns + everyone with everyone but only from the same row

=ARRAYFORMULA(QUERY(SPLIT(INDEX(QUERY(TRANSPOSE(QUERY(TRANSPOSE("♥"&QUERY(
 IF(INDEX(SPLIT(QUERY(IFNA(
 {A1:A&"♦"&B1:B; B1:B&"♦"&C1:C; A1:A&"♦"&C1:C}),
 "where Col1 is not null"), "♦"),,1)>INDEX(SPLIT(QUERY(IFNA(
 {A1:A&"♦"&B1:B; B1:B&"♦"&C1:C; A1:A&"♦"&C1:C}),
 "where Col1 is not null"), "♦"),,2), {INDEX(SPLIT(QUERY(IFNA(
 {A1:A&"♦"&B1:B; B1:B&"♦"&C1:C; A1:A&"♦"&C1:C}),
 "where Col1 is not null"), "♦"),,2), INDEX(SPLIT(QUERY(IFNA(
 {A1:A&"♦"&B1:B; B1:B&"♦"&C1:C; A1:A&"♦"&C1:C}),
 "where Col1 is not null"), "♦"),,1)}, SPLIT(QUERY(IFNA(
 {A1:A&"♦"&B1:B; B1:B&"♦"&C1:C; A1:A&"♦"&C1:C}),
 "where Col1 is not null"), "♦")), "where Col1<>Col2")),,9^9)), 
 "select Col1,count(Col1) 
  group by Col1 
  order by count(Col1) desc 
  label count(Col1)''"),,1), "♥"), "where Col1 <>' '"))

enter image description here

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement