Convert SAS proc sql to Python(pandas)

Question

I rewrite some code from SAS to Python using Pandas library. I&#8217;ve got such code, and I have no idea what should I do with it? Can you help me, beacase its too complicated for me to do it correct. I&#8217;ve changed the name of columns (for encrypt sensitive data) This is SAS code: This is my try in Pand…

Accepted Answer

First, calling SELECT * in an aggregate GROUP BY query is not valid SQL. SAS may allow it but can yield unknown results. Usually SELECT columns should be limited to columns in GROUP BY clause.With that said, aggregate SQL queries can generally be translated in Pandas with groupby.agg() operations with WHERE (filter before aggregation) or HAVING (filter after aggregation) conditions handled using either .loc or query.SQLSELECT col1, col2, col3,        MIN(col1) AS min_col1,       AVG(col2) AS mean_col2,        MAX(col3) AS max_col3,        COUNT(*)  AS count_obsFROM mydataGROUP BY col1, col2, col3HAVING col1 = min(col1)PandasGeneralagg_data = (mydata.groupby(["col1", "col2", "col3"], as_index=False)                  .agg(min_col1 = ("col1", "min"),                       mean_col2 = ("col2", "mean"),                       max_col3 = ("col3", "max"),                       count_obs = ("col1", "count"))                  .query("col1 == min_col1")           )Specificopk_do_inf_4 = (mydata.groupby(["kat_opk", "kod_ow", "kod_sw", "nr_ks", "nr_ks_pr",                                 "nazwa_zabiegu_icd_9", "nazwa_zabiegu"],                                 as_index=False)                      .agg(opk_do_inf = ("kat_opk", "min"),                           ilsc_opk_do_kosztu_infr = ("nr_ks", "count"))                      .query("kat_opk == opk_do_inf")               )

Advertisement

Answer