Skip to content
Advertisement

SQL query to replace multiple occurrences with the most frequent ones

I have a products table, which has the product description in two languages, one in English and one in an alternate language.
Let’s say:

Product_Desc Product_Desc_Alt
A A1
A A2
A A1
A A3
B B1
B B2
B B2
C C1

If I do a GROUP BY statement, there are multiple alternate language occurrences for the same product. So, let’s say:

Product_Desc CNT
A 3
B 2
C 1

I would like to replace the Product_Desc_Alt occurrences with the most frequent ones, so for example I would like the output to be:

Product_Desc Product_Desc_Alt
A A1
A A1
A A1
A A1
B B2
B B2
B B2
C C1

Obviously, if a product has only one alter lang description, just keep that one.

There may be lots of ways to do that, but I can’t think of one.

I am using Azure Databricks so this could also happen with PySpark, but I am interested in doing this the SQL way.
Thanks a lot!

Advertisement

Answer

Is this what you mean?

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement