Skip to content
Advertisement

How to merge rows using SQL only?

I can neither use pyspark or scala. I can only write SQL code. I have a table with 2 columns item id, name.

item_id, name
1        name1
1        name2
1        name3
2        name4
2        name5

I want to generate results with the names of an item_id concatenated.

item_id,    names
1           name1-name2-name3
2           name4-name5

How do I create such a table with Spark sql?

Advertisement

Answer

The beauty of Spark SQL is that once you have a solution in any of the supported languages (Scala, Java, Python, R or SQL) you can somewhat figure out other variants.

The following SQL statement seems doing what you ask for:

SELECT item_id, array_join(collect_list(name), '-') as names 
FROM tableName
GROUP BY item_id

In spark-shell it gives the following result:

scala> sql("select item_id, array_join(collect_list(name), '-') as names from so group by item_id").show
+-------+-----------------+
|item_id|            names|
+-------+-----------------+
|      1|name1-name2-name3|
|      2|      name4-name5|
+-------+-----------------+
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement