Skip to content
Advertisement

How to get the COUNT of emails for each id in Scala

I use this query in SQL to get return how many user_id’s have more than one email. How would I write this same query against a users DataFrame in Scala? also how would I be able to return to exact emails for each user_id

Advertisement

Answer

Let’s assume that you have a dataframe of users. In spark, one could create a sample of such a dataframe like this:

Now, the logic would be very similar as the one you have in SQL:

Then you can add a .drop("count") or a .select("user_id") to only keep users.

Note that there is no having clause in spark. Once you have called agg to aggregate your dataframe by user, you have a regular dataframe on which you can call any transformation function, such as a filter on the count column here.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement