Skip to content

How can I write an SQL query as a template in PySpark?

I want to write a function that takes a column, a dataframe containing that column and a query template as arguments that outputs the result of the query when run on the column. Something like: func_sql(df_tbl,’age’,’select count(distinct {col}) from df_tbl’) Here, {col} should get replace with ‘age’ and output should be the result of the query run on ‘age’, i.e.

TSQL – New column value based on other columns with highest level of match

So I’ve got a mapping table with the following information: And another fact table with the following info: Ideally I should add CCode in the fact table with the highest matching value of Number (first) & FCODE (second) So for example the first record: Number: 0123456789 FCode: 12345 should result to have value CCode 6 to be added in the

Incorrect syntax near ‘<' in SQL Server Scalar Functions

Here is the requirement: Find all teachers whose FirstName length is less than 5 and the first 3 characters of their FirstName and LastName are the same I tried this query (Scalar Function): To call function: But, when I execute first query, it shows error: Incorrect syntax near ‘<‘. Can anyone help me with this? Answer Just use a normal

Performing division with PostgreSQL / json

I have written a simple query against a table in a Postgres database that contains a column “manifest” of type json. Each cell contains a very long value and I am extracting the numerical value for “size”. I need to create a new column (perhaps call it “size in MB”), and perform division against size. Specifically, I need to take

How to not include duplicates in SQL with inner join?

I’m trying to list the customer’s name, last name, email, phone number, address, and the title of the show they are going. I am not supposed to list duplicate names of the customer, but unfortunately, if 1 customer is seeing different shows, their name appears twice. I am still getting duplicates despite using DISTINCT and GROUP BY. What should I

Find rows that have the same value and select the newer one

I got a table, that looks like this: serialNr sensorNr ModifyDate 1234 12EE56423 2022-04-06 4567 12EE56423 2018-06-12 6789 AD3FF0C44 2018-03-08 9101 AD3FF0C44 2019-06-07 From rows with the same sensorNr, I only want to select those with newer ModifyDate, so the result should look like this: serialNr sensorNr ModifyDate 1234 12EE56423 2022-04-06 9101 AD3FF0C44 2019-06-07 How can I achieve that? Answer

Conversion Failed in a CASE expression

I have a column that is of varchar type, it contains dates and ‘#’: I am trying to convert the dates to the standard date format (YYYY-MM-DD) and leave the ‘#’ as it is whenever it occurs. Here is my code: The outcome column is also of varchar(10) type (same as the original column). I expected to get # whenever

Exclude blank column while Joining two SQL server tables

I have set of vehicle parts stored in two tables as per below: Source Table1: Vehicle_ID Part1 Part2 Part3 Part4 Part5 1 10 20 30 2 10 20 3 10 Source Table2: Vehicle_ID Part6 Part7 Part8 Part9 Part10 1 40 2 30 50 60 3 30 Required Table like below: Vehicle_ID Part1 Part2 Part3 Part4 Part5 1 10 20 30