I can neither use pyspark or scala. I can only write SQL code. I have a table with 2 columns item id, name.
x
item_id, name
1 name1
1 name2
1 name3
2 name4
2 name5
I want to generate results with the names of an item_id
concatenated.
item_id, names
1 name1-name2-name3
2 name4-name5
How do I create such a table with Spark sql?
Advertisement
Answer
The beauty of Spark SQL is that once you have a solution in any of the supported languages (Scala, Java, Python, R or SQL) you can somewhat figure out other variants.
The following SQL statement seems doing what you ask for:
SELECT item_id, array_join(collect_list(name), '-') as names
FROM tableName
GROUP BY item_id
In spark-shell
it gives the following result:
scala> sql("select item_id, array_join(collect_list(name), '-') as names from so group by item_id").show
+-------+-----------------+
|item_id| names|
+-------+-----------------+
| 1|name1-name2-name3|
| 2| name4-name5|
+-------+-----------------+