Skip to content
Advertisement

Tag: apache-spark-sql

Extract time from Date Time as a separate column

My table looks like this – DateTime ID 2010-12-01 08:26:00 34 2010-12-01 09:41:00 42 I want to extract the time from DateTime and create a third column of it and then group it with frequency counts. Is there a way to do this in SQL? I’m using Apache Spark with inline SQL. I have achieved the equivalent using Spark functions

Escaped single quote ignored in SELECT clause

Not sure why the escaped single quote doesn’t appear in the SQL output. Initially tried this in Jupyter notebook, but reproduced it in PySpark shell below. Output shows Bobs home instead of Bob’s home Answer Use backslash instead of a single quote to escape a single quote: Alternatively, you can use double quotes to surround the string, so that you

Change null to empty array in databricks SQL?

I have a value in a JSON column that is sometimes all null in an Azure Databricks table. The full process to get to JSON_TABLE is: read parquet, infer schema of JSON column, convert the column from JSON string to deeply nested structure, explode any arrays within. I am working in SQL with python-defined UDFs (json_exists() checks the schema to

How to use where clause referencing a column when querying a JSON object in another column in SQL

I have the following sales table with a nested JSON object: sale_id sale_date identities 41acdd9c-2e86-4e84-9064-28a98aadf834 2017-05-13 {“SaleIdentifiers”: [{“type”: “ROM”, “Classifier”: “CORNXP21RTN”}]} To query the Classifier I do the following: This gives me the result: Classifier CORNXP21RTN How would I go about using the sale_date column in a where clause? For instance this shows me a list of the classifiers in

Advertisement