I would like to index a DataFrame
(aaxx_df
) according to the values of 2 others (val1_df
for the columns and val2_df
for the rows).
>>> val1_df = pd.DataFrame({"A": ["aa", "bb"], "B": ["bb", "aa"]})
A B 0 aa bb 1 bb aa
>>> val2_df = pd.DataFrame({"A": ["xx", "yy"], "B": ["zz", "yy"]})
A B 0 xx zz 1 yy yy
>>> aaxx_df = pd.DataFrame( ... {"aa": [10, 20, 30], "bb": [-10, -20, -30]}, ... index=["xx", "yy", "zz"] ... )
aa bb xx 10 -10 yy 20 -20 zz 30 -30
>>> comb(val1_df, val2_df, aaxx_df)
A B 0 10.0 -30.0 1 -20.0 20.0
I put below a solution that works for my problem, but I guess, there must be some much cleaner solutions, possibly via SQL (it seems to me to be very similar to a relational database problem).
def comb(df1, df2, link_df): comb_df = df1.copy() for val in link_df: tmp_df = df2.copy() tmp_df = tmp_df.applymap(lambda x: link_df.loc[x, val]) comb_df = comb_df.replace({val: np.nan}).combine_first(tmp_df) return comb_df
Advertisement
Answer
If you want to index aaxx_df using the values from val2_df (rows) and val1_df (cols), use lookup:
vals = aaxx_df.lookup(val2_df.values.flatten(), val1_df.values.flatten()) res = pd.DataFrame(data=vals.reshape(val1_df.shape), columns=val1_df.columns) print(res)
Output
A B 0 10 -30 1 -20 20
Since Dec 26, 2020, with pandas 1.2, the lookup
method is deprecated. Here is a minimal version as a function:
def lookup(df, row_labels, col_labels) -> np.ndarray: """ Label-based "fancy indexing" function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair. Parameters ---------- row_labels : sequence The row labels to use for lookup. col_labels : sequence The column labels to use for lookup. Returns ------- numpy.ndarray The found values. """ row_idx = df.index.get_indexer(row_labels) col_idx = df.columns.get_indexer(col_labels) flat_index = row_idx * len(df.columns) + col_idx return df.values.flat[flat_index]
The code becomes thus:
vals = lookup(aaxx_df, val2_df.values.flatten(), val1_df.values.flatten()) res = pd.DataFrame(data=vals.reshape(val1_df.shape), columns=val1_df.columns) print(res)