I have the following code in SQL: I've been trying to rewrite it in python like so: but I keep getting a generic error message. What am I doing wrong? EDIT: added the error message Answer IIUC, you could try the following: The equivalent of SELECT DISTINCT col is drop_duplicates(col) and the equivalent of SELECT col, count(*) is value_counts(col).

Convert SQL commands to Python

I have the following code in SQL:

CREATE OR REPLACE TABLE
  `table2` AS
SELECT nmc_lvl, count(*) AS total_nb
FROM (
  SELECT DISTINCT nmc_ray as nmc_lvl, cli_id, date
  FROM `table1`
)
GROUP BY nmc_lvl

​x
 
CREATE OR REPLACE TABLE  `table2` ASSELECT nmc_lvl, count(*) AS total_nbFROM (  SELECT DISTINCT nmc_ray as nmc_lvl, cli_id, date  FROM `table1`)GROUP BY nmc_lvl​

I’ve been trying to rewrite it in python like so:

table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()

 
table2 = table1['cli_id', 'date', 'nmc_ray'].unique()table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()​

but I keep getting a generic error message. What am I doing wrong?

EDIT: added the error message

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_7989/1297335147.py in <module>
----> 1 table2 = table1['cli_id', 'date', 'nmc_ray'].unique()
      2 table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()

/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: ('cli_id', 'date', 'nmc_ray')

 
KeyError                                  Traceback (most recent call last)/tmp/ipykernel_7989/1297335147.py in <module>----> 1 table2 = table1['cli_id', 'date', 'nmc_ray'].unique()      2 table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()​/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)   3456             if self.columns.nlevels > 1:   3457                 return self._getitem_multilevel(key)-> 3458             indexer = self.columns.get_loc(key)   3459             if is_integer(indexer):   3460                 indexer = [indexer]​/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)   3361                 return self._engine.get_loc(casted_key)   3362             except KeyError as err:-> 3363                 raise KeyError(key) from err   3364    3365         if is_scalar(key) and isna(key) and not self.hasnans:​KeyError: ('cli_id', 'date', 'nmc_ray')​

Answer

IIUC, you could try the following:

table2 = (table1.drop_duplicates(subset=['nmc_ray', 'cli_id', 'date'])[['nmc_ray','cli_id','date']]
          .rename(columns={'nmc_ray':'nmc_lvl'})
          .value_counts('nmc_lvl').reset_index(name='total_nb'))

 
table2 = (table1.drop_duplicates(subset=['nmc_ray', 'cli_id', 'date'])[['nmc_ray','cli_id','date']]          .rename(columns={'nmc_ray':'nmc_lvl'})          .value_counts('nmc_lvl').reset_index(name='total_nb'))​

The equivalent of SELECT DISTINCT col is drop_duplicates(col) and the equivalent of SELECT col, count(*) is value_counts(col).

Advertisement

Answer