Skip to content
Advertisement

merging tables with different structures

I have two tables where I want to find the outer join based on a Ticker variable.

In Table I, I have only one Ticker for each entity (fund), but in table II, I may have multiple records (multiple Ticker) for each “FundID”. The goal is to count the unique funds.

I want to have table III, which is the combination of both tables I and II at the fund level. If a Ticker exists in both tables, that is considered as one record in table III. If a Ticker exists in table II but not in table I, I want it in table III.

However, I don’t want another Ticker from the same fund (“FundID”). In that case, we may pick one representative Ticker at random from each fund group (again, they will have the same “FundID”), and thus we won’t have multiple Ticker for one specific fund.

Table 1:

Ticker
A
B
C
D

Table II:

Ticker FundID
A 1
AA 1
AB 1
B 2
BB 2
E 3
EB 3
EC 3

Table III(combined):

Ticker
A
B
C
D
E

Advertisement

Answer

You can first filter df2 for rows where for each FundID, none of their corresponding “Ticker” is in df1['Ticket']. Then among these FundIDs, sample one Ticker for each FundID and concatenate this to df1:

sub_df2 = df2[~df2['Ticker'].isin(df1['Ticket']).groupby(df2['FundID']).cummax()]
out = pd.concat((df1, sub_df2.groupby('FundID')['Ticker'].sample(n=1).to_frame().rename(columns={'Ticker':'Ticket'})))

Output:

  Ticket
0      A
1      B
2      C
3      D
5      E
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement