I have the following code in SQL:
CREATE OR REPLACE TABLE `table2` AS SELECT nmc_lvl, count(*) AS total_nb FROM ( SELECT DISTINCT nmc_ray as nmc_lvl, cli_id, date FROM `table1` ) GROUP BY nmc_lvl
I’ve been trying to rewrite it in python like so:
table2 = table1['cli_id', 'date', 'nmc_ray'].unique() table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count()
but I keep getting a generic error message. What am I doing wrong?
EDIT: added the error message
KeyError Traceback (most recent call last) /tmp/ipykernel_7989/1297335147.py in <module> ----> 1 table2 = table1['cli_id', 'date', 'nmc_ray'].unique() 2 table2 = table2.groupby('nmc_ray')['cli_id', 'date'].count() /opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key) 3456 if self.columns.nlevels > 1: 3457 return self._getitem_multilevel(key) -> 3458 indexer = self.columns.get_loc(key) 3459 if is_integer(indexer): 3460 indexer = [indexer] /opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: -> 3363 raise KeyError(key) from err 3364 3365 if is_scalar(key) and isna(key) and not self.hasnans: KeyError: ('cli_id', 'date', 'nmc_ray')
Advertisement
Answer
IIUC, you could try the following:
table2 = (table1.drop_duplicates(subset=['nmc_ray', 'cli_id', 'date'])[['nmc_ray','cli_id','date']] .rename(columns={'nmc_ray':'nmc_lvl'}) .value_counts('nmc_lvl').reset_index(name='total_nb'))
The equivalent of SELECT DISTINCT col
is drop_duplicates(col)
and the equivalent of SELECT col, count(*)
is value_counts(col)
.