I can neither use pyspark or scala. I can only write SQL code. I have a table with 2 columns item id, name.
item_id, name 1 name1 1 name2 1 name3 2 name4 2 name5
I want to generate results with the names of an item_id
concatenated.
item_id, names 1 name1-name2-name3 2 name4-name5
How do I create such a table with Spark sql?
Advertisement
Answer
The beauty of Spark SQL is that once you have a solution in any of the supported languages (Scala, Java, Python, R or SQL) you can somewhat figure out other variants.
The following SQL statement seems doing what you ask for:
SELECT item_id, array_join(collect_list(name), '-') as names FROM tableName GROUP BY item_id
In spark-shell
it gives the following result:
scala> sql("select item_id, array_join(collect_list(name), '-') as names from so group by item_id").show +-------+-----------------+ |item_id| names| +-------+-----------------+ | 1|name1-name2-name3| | 2| name4-name5| +-------+-----------------+