I have an SQLite table called wcvp
constructed from a csv file downloaded from the World Check List of Vascular Plants (see https://wcvp.science.kew.org/ and http://sftp.kew.org/pub/data-repositories/WCVP/). When I run this query:
sqlite> SELECT kew_id, genus, species, infraspecies FROM wcvp WHERE genus = 'Quercus' AND species = 'robur' AND taxonomic_status = 'Accepted';
I get this result:
kew_id | genus | species | infraspecies |
---|---|---|---|
304293-2 | Quercus | robur | |
77189540-1 | Quercus | robur | broteroana |
77189379-1 | Quercus | robur | brutia |
77189383-1 | Quercus | robur | imeretina |
60459295-2 | Quercus | robur | pedunculiflo |
77171868-1 | Quercus | robur | robur |
I want to add a column to the table (which has hundreds of thousands of rows in it) called number_of_infraspecies
which would look like this:
kew_id | genus | species | infraspecies | number_of_infraspecies |
---|---|---|---|---|
304293-2 | Quercus | robur | 5 | |
77189540-1 | Quercus | robur | broteroana | NULL |
77189379-1 | Quercus | robur | brutia | NULL |
77189383-1 | Quercus | robur | imeretina | NULL |
60459295-2 | Quercus | robur | pedunculiflo | NULL |
77171868-1 | Quercus | robur | robur | NULL |
Alternatively I could construct a new table with two columns: kew_id
as a foreign key and number_of_infraspecies
as the other column.
Whichever approach I take I can only think of a procedure which would result in making a separate query for every row of the wcvp
table or at least those rows which do not have a value in the infraspecies column (AND taxonomic_status = ‘Accepted’).
Is there a way of doing it with one or just a few queries?
Advertisement
Answer
Create a VIEW
that returns the column number_of_infraspecies
:
CREATE VIEW my_view AS SELECT kew_id, genus, species, infraspecies, CASE WHEN infraspecies IS NULL THEN COUNT(infraspecies) OVER (PARTITION BY genus, species) END number_of_infraspecies FROM wcvp WHERE taxonomic_status = 'Accepted';
and then select from that VIEW
specific genus
and species
:
SELECT kew_id, genus, species, infraspecies, number_of_infraspecies FROM my_view WHERE genus = 'Quercus' AND species = 'robur';
See the demo.