Skip to content

Tag: dataframe

how to Avoid self-join in spark scala

I have a DataFrame called product_relationship_current and I’m doing a self-join to retrieve a new DataFrame like bellow: First I’m giving it an alias so I could consider them like two different dataframes: And then I’m doing a self-join to get a new dataframe: But I’m looking for another way to do that without doing a self-join, so I don’t

What am I getting wrong in this SQL query?

Write a query that retrieves only a ranked list of the most prolific days in October 2020, prolific measured in number of posts per day. Your query should return those days in a single-column table (column name post_day) in the format YYYY-MM-DD. This is my table: This is my query: The problem is that I’m only getting one result, not

Fill NA and update columns from another dataframe

I want to conditionally fill the missing and update the value from another dataframe. I want to fill missing and update the data on column values in dataframe smalldf. The condition is, if the value in B column (large df) is in the range of columns Range_FROM and Range_TO in (small df). Always choose the minimum records in (largedf) to

merging tables with different structures

I have two tables where I want to find the outer join based on a Ticker variable. In Table I, I have only one Ticker for each entity (fund), but in table II, I may have multiple records (multiple Ticker) for each “FundID”. The goal is to count the unique funds. I want to have table III, which is the