Here is the problem. The database describes symmetrical and transferable relations between objects (genes from different species). If a gene X from species 1 is in relation to gene Y from species 2, and gene Y from species 2 is in relation to gene Z in species 3, then gene X from species 1 is in relation to gene Z in species 3.
Here is an example table:
species1 gene1 species2 gene2 2 Y 1 X 2 Y 3 Z
Now, here is what I want to do. Given species 1 and 3 (both in column species2), find all genes in column gene2 for which there is a common value in column gene1.
Basically, this is the output I would like to have:
X Z
… for each pair that satisfies this condition.
Note that each gene id is uniquely present in only one species.
Explanation: column species1 contains a handful of “hub” species (like humans). So if I want to find a rat gene matching a mouse gene Card9, there are two possibilities: (i) there is no match in human genome, and in this case there will be a rat match in column gene1, or the mouse gene will be in gene1 and the rat gene in column gene2; (ii) there is a match in human (or some other) genome, in which case I need first to find the match in the human genome in column gene1, and then find the match to that gene in the rat genome in column gene2.
Clearly, I imagine that I would be able to do it outside of SQL:
- Select all genes in column gene2 from species 1
- Select all genes in column gene2 from species 3
- For each gene from species 1, find matching genes in column gene1 for which column species2 is 3.
However, I am sure that there is a clever SQL statement that could do that.
The database has about 5 millions of rows.
Advertisement
Answer
You can do it with a self join:
select t1.gene2 result1, t2.gene2 result2 from tablename t1 inner join tablename t2 on t2.gene1 = t1.gene1 where t1.species2 = 1 and t2.species2 = 3
See the demo.
Another way to do it with aggregation:
select min(gene2) result1, max(gene2) result2 from tablename where species2 in (1, 3) group by gene1 having result1 <> result2;
See the demo.
Results:
| result1 | result2 | | ------- | ------- | | X | Z |