Tag: dataframe

how to Avoid self-join in spark scala

apache-spark dataframe scala self-join sql

I have a DataFrame called product_relationship_current and I’m doing a self-join to retrieve a new DataFrame like bellow: First I’m giving it an alias so I could consider them like two different dataframes: And then I’m doing a self-join to get a new dataframe: But I’m looking for another way to do that without doing a self-join, so I don’t

error: its says legacy_id is an invalid identifier

dataframe pandas snowflake-cloud-data-platform sql streamlit

if I try * it fetches all the data but when I mention any of the col names it says it’s invalid. Answer Most likely it was quoted during table creation and should be accessed as such: In Python: Double-quoted Identifiers If an object is created using a double-quoted identifier, when referenced in a query or any other SQL statement,

What am I getting wrong in this SQL query?

data-extraction dataframe mysql sql

Write a query that retrieves only a ranked list of the most prolific days in October 2020, prolific measured in number of posts per day. Your query should return those days in a single-column table (column name post_day) in the format YYYY-MM-DD. This is my table: This is my query: The problem is that I’m only getting one result, not

Fill NA and update columns from another dataframe

dataframe pandas python sql

I want to conditionally fill the missing and update the value from another dataframe. I want to fill missing and update the data on column values in dataframe smalldf. The condition is, if the value in B column (large df) is in the range of columns Range_FROM and Range_TO in (small df). Always choose the minimum records in (largedf) to

Adding counts from one dataframe to another dataframe on corresponding row

dataframe pandas python sql

I would like to count the number of record in dataframe2 and add the count to the corresponding rows in dataframe1. The first one (df1) Road RoadNo Count A 1 0 A 2 0 B 1 0 B 2 0 The second one (df2) Road RoadNo A 1 A 1 A 1 A 2 A 2 B 1 The expected

Oracle SQL find columns with different values

database dataframe oracle sql validation

I have two tables A and B both with some millions rows and around one hundred columns. I want to find which columns have different observations without the need of listing the names of all the columns. For example, suppose column ID is the primary key in both tables. And that table A is while table B is The result

Find total IDs between two dates that satisfies a condition

data-manipulation dataframe python r sql

I have a dataset PosNeg like this. I need to find count of ID’s who have a pattern like this- P N P P or N P N N P N – that is having at least one N (negative) between two P’s(positive). If this pattern occurs at least once, then count that ID. Date is always in ascending order.

sql – how to join on a column that is less than another join key

apache-spark apache-spark-sql dataframe sql

I have two tables as below. What I’m trying to do is to join A and B base on date and id, to get the value from B. The problem is, I want to join using add_month(A.Date, -1) = B.month (find the data in table B from one month earlier). If that’s not available, I want to join using two

merging tables with different structures

dataframe join pandas python sql

I have two tables where I want to find the outer join based on a Ticker variable. In Table I, I have only one Ticker for each entity (fund), but in table II, I may have multiple records (multiple Ticker) for each “FundID”. The goal is to count the unique funds. I want to have table III, which is the

SQL retrieval: Empty Dataframe in IDLE or Visual Studio Code but populated Dataframe in Jupyter Notes

dataframe jupyter-notebook pandas sql visual-studio

I am not a good python coder (beginner) so apologies if the code isn’t up to pythonista’s snuff! Bit of a weird situation and I can not figure this out. I have been wracking my brains trying to fix it out but can’t seem to be able to. I am sure it’s a really simple fix I am overlooking… The