Skip to content
Advertisement

Conditional Join Between SQL Table and Pandas Dataframe

I am working on an application and one step of the processing involves joining drug NDC’s from claim data with a table that contains the NDC along with a variety of other information including unit price, multisource code, and more.

The table of claims data looks something like this:

It was read into a Pandas dataframe as follows:

The SQL table of drug pricing information has the following fields

So I need to be able to join them on NDC. I have tried reading in the SQL table as a Pandas dataframe and using that with merge or join to combine them together, but I cannot figure out how to get the appropriate conditions set up so that it joins only the NDC from the SQL table that has a date that is immediately before the date filled in the dataframe.
I also tried merge_asof as it seemed to provide the right paramters and it was setup as follow:

However, it appears to only be returning nan’s for the right frame.

Any advice would be greatly appreciated.

Advertisement

Answer

Hi the answer was to use merge_asof. It was not working previously because I had not correctly casted the datetime. Relying on the infer_datetime_format paramter was causing the problem.

The final merge code was: merged_table = pd.merge_asof(claims, drugs, on=’DATE’, by=’NDC’)

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement