Skip to content

Tag: pyspark

Daily forecast on a PySpark dataframe

I have the following dataframe in PySpark: DT_BORD_REF: Date column for the month REF_DATE: A date reference for current day separating past and future PROD_ID: Product ID COMPANY_CODE: Company ID CUSTOMER_CODE: Customer ID MTD_WD: Month to Date count of working days (Date = DT_BORD_REF) QUANTITY: Number of items sold QTE_MTD: Number of items month to date for DT_BORD_REF < REF_DATE

Converting query from SQL to pyspark

I am trying to convert the following SQL query into pyspark: The code I have in PySpark right now is this: However, this is simply returning the number of rows in the “data” dataframe, and I know this isn’t correct. I am very new at PySpark, can anyone help me solve this? Answer You need to collect the result into

How to add a ranking to a pyspark dataframe

I have a pyspark dataframe with 2 columns – id and count. I want to add a ranking to this by reverse count. So the highest count has rank 1, second highest rank 2, etc. testDF = spark.createDataFrame([(DJS232,437232)], [“id”, “count”]) I first tried using and this worked, ish. It had monotonically increasing id numbers but the jump from the first
