I have data represented in this fashion (not sure I properly call it an adjacency list, but that’s the best description I can think of):
+--------+-------+ | Level | ID | +========+=======+ | Parent | A-123 | +--------+-------+ | Child | B-123 | +--------+-------+ | Child | B-456 | +--------+-------+ | Child | B-789 | +--------+-------+ | Parent | A-456 | +--------+-------+ | Child | C-123 | +--------+-------+ | Child | C-456 | +--------+-------+ | Child | C-789 | +--------+-------+
I need to transform it to this format:
+--------+-------+ | Parent | Child | +========+=======+ | A-123 | B-123 | +--------+-------+ | A-123 | B-456 | +--------+-------+ | A-123 | B-789 | +--------+-------+ | A-456 | C-123 | +--------+-------+ | A-456 | C-456 | +--------+-------+ | A-456 | C-789 | +--------+-------+
I think this can be done with an SQL window function (I would use sqlite
for simplicity) and I’m sure pandas
has some tricks in its bag as well, but in any case, I’m not very familiar with these more advanced techniques.
Any idea?
Advertisement
Answer
Pandas solution – replace rows after Parent
to missing values and forward filling ID
, then remove rows with Parent
and rename columns:
m = df['Level'].eq('Parent') df['Level'] = df['ID'].where(m).ffill() df = df[~m].rename(columns={'Level':'Parent','ID':'Child'}) print (df) Parent Child 1 A123 B123 2 A123 B456 3 A123 B789 5 A456 C123 6 A456 C456 7 A456 C789