Count number of weeks, days and months from a certain date in PySpark

Question

So, I have a DataFrame of this type: And I want to create multiple columns containing, for each line, the current day, week, month and year from a certain date(simply a year, like 2020 for 2020-01-01). At first I thought of using something like this line of code unfortunately this wouldn&#8217;t work (except …

Accepted Answer

To calculate the day difference, you can use datediff, and from that, you can get the week difference by dividing the number of days by 7, and rounding down to integer:import pyspark.sql.functions as Finitial_date = 2020df2 = df.withColumn(    'year',     F.year('date')-initial_date).withColumn(    'month',     F.month('date')+F.col('year')*12    # or you can use     # F.months_between('date', F.lit('%s-01-01'%initial_date)).cast('int')).withColumn(    'day',     F.datediff('date', F.lit('%s-01-01'%initial_date))).withColumn(    'week',     (F.col('day') / 7).cast('int'))df2.show()+-------------------+----+-----+---+----+|               date|year|month|day|week|+-------------------+----+-----+---+----+|2020-05-10 22:40:51|   0|    5|130|  18||2020-05-10 23:05:25|   0|    5|130|  18||2020-05-10 22:49:42|   0|    5|130|  18||2020-05-10 23:16:06|   0|    5|130|  18||2020-05-10 22:33:25|   0|    5|130|  18|+-------------------+----+-----+---+----+

Advertisement

Answer