Skip to content
Advertisement

Select values from one column which share a value in another column

Here is the problem. The database describes symmetrical and transferable relations between objects (genes from different species). If a gene X from species 1 is in relation to gene Y from species 2, and gene Y from species 2 is in relation to gene Z in species 3, then gene X from species 1 is in relation to gene Z in species 3.

Here is an example table:

Now, here is what I want to do. Given species 1 and 3 (both in column species2), find all genes in column gene2 for which there is a common value in column gene1.

Basically, this is the output I would like to have:

… for each pair that satisfies this condition.

Note that each gene id is uniquely present in only one species.

Explanation: column species1 contains a handful of “hub” species (like humans). So if I want to find a rat gene matching a mouse gene Card9, there are two possibilities: (i) there is no match in human genome, and in this case there will be a rat match in column gene1, or the mouse gene will be in gene1 and the rat gene in column gene2; (ii) there is a match in human (or some other) genome, in which case I need first to find the match in the human genome in column gene1, and then find the match to that gene in the rat genome in column gene2.

Clearly, I imagine that I would be able to do it outside of SQL:

  1. Select all genes in column gene2 from species 1
  2. Select all genes in column gene2 from species 3
  3. For each gene from species 1, find matching genes in column gene1 for which column species2 is 3.

However, I am sure that there is a clever SQL statement that could do that.

The database has about 5 millions of rows.

Advertisement

Answer

You can do it with a self join:

See the demo.
Another way to do it with aggregation:

See the demo.
Results:

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement