Skip to content
Advertisement

Better solution to index a DataFrame according to the values of 2 others

I would like to index a DataFrame (aaxx_df) according to the values of 2 others (val1_df for the columns and val2_df for the rows).

>>> val1_df = pd.DataFrame({"A": ["aa", "bb"], "B": ["bb", "aa"]})
    A   B
0  aa  bb
1  bb  aa
>>> val2_df = pd.DataFrame({"A": ["xx", "yy"], "B": ["zz", "yy"]})
    A   B
0  xx  zz
1  yy  yy
>>> aaxx_df = pd.DataFrame(
...     {"aa": [10, 20, 30], "bb": [-10, -20, -30]}, 
...     index=["xx", "yy", "zz"]
... )
    aa  bb
xx  10 -10
yy  20 -20
zz  30 -30
>>> comb(val1_df, val2_df, aaxx_df)
      A     B
0  10.0 -30.0
1 -20.0  20.0

I put below a solution that works for my problem, but I guess, there must be some much cleaner solutions, possibly via SQL (it seems to me to be very similar to a relational database problem).

def comb(df1, df2, link_df):

    comb_df = df1.copy()
    
    for val in link_df:
        tmp_df = df2.copy()
        tmp_df = tmp_df.applymap(lambda x: link_df.loc[x, val])
        comb_df = comb_df.replace({val: np.nan}).combine_first(tmp_df)

    return comb_df

Advertisement

Answer

If you want to index aaxx_df using the values from val2_df (rows) and val1_df (cols), use lookup:

vals = aaxx_df.lookup(val2_df.values.flatten(), val1_df.values.flatten())
    
res = pd.DataFrame(data=vals.reshape(val1_df.shape), columns=val1_df.columns)
print(res)

Output

    A   B
0  10 -30
1 -20  20

Since Dec 26, 2020, with pandas 1.2, the lookup method is deprecated. Here is a minimal version as a function:

def lookup(df, row_labels, col_labels) -> np.ndarray:
    """
    Label-based "fancy indexing" function for DataFrame.
    Given equal-length arrays of row and column labels, return an
    array of the values corresponding to each (row, col) pair.

    Parameters
    ----------
    row_labels : sequence
        The row labels to use for lookup.
    col_labels : sequence
        The column labels to use for lookup.

    Returns
    -------
    numpy.ndarray
        The found values.
    """
    row_idx = df.index.get_indexer(row_labels)
    col_idx = df.columns.get_indexer(col_labels)
    flat_index = row_idx * len(df.columns) + col_idx
    return df.values.flat[flat_index]

The code becomes thus:

vals = lookup(aaxx_df, val2_df.values.flatten(), val1_df.values.flatten())
    
res = pd.DataFrame(data=vals.reshape(val1_df.shape), columns=val1_df.columns)
print(res)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement