I have string data in the form 2020-10-21 12:49:27.090 I want to cast it as a timestamp. When I do this: select cast(column_name as timestamp) as column_name from table_name all of the milliseconds are dropped, like this: 2020-10-21 12:49:27 I also tried this: select cast(date_format(column_name,’yyyy-MM-dd HH:mm:ss.SSS’) as timestamp) as column_name from table_name and the same problem persists, it drops the
Tag: hive
Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns
I have a dataset with booking hotels. date_in has format “yyyy-MM-dd”. I need select top 10 the most visited hotel by month. I get the following error: Error: Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line
How to UPDATE a value in hive table?
I have a flag column in Hive table that I want to update after some processing. I have tried using hive and impala using the below query but it didn’t work, and got that it needs to be a kudu table …
In Hive, how to read through NULL / empty tags present within an XML using explode(XPATH(..)) function?
In below Hive-query, I need to read the null / empty “string” tags as well, from the XML content. Only the non-null “string” tags are getting considered within the XPATH() list now….
Hive: group by calculated column
I need to execute query like select myUsualField, SOME_FUNCTION(myAnotherField) as myUnusualField from MYTABLE group by myUsualField, myUnusualField In Hive this query fails: it cannot find field …
How to count all rows in raw data file using Hive?
I am reading some raw input which looks something like this: Note the first two rows are “good” rows and the last two rows are “bad” rows since they are missing some data. Here is the snippet of my hive query which is reading this raw data into a readonly external table: I need to get the count of ALL
AWS Athena custom data format?
I’d like to query my app logs on S3 with AWS Athena but I’m having trouble creating the table/specifying the data format. This is how the log lines look: 2020-12-09T18:08:48.789Z {“reqid”:&…
Performance difference with Where condition in subquery/cte
Is there a performance difference for applying the where condition to a subquery data source compared to applying it at the joined statement? Is there a difference between these in performance? Let’s say I have two hive tables A and B which are both partitioned on the field date. Is that query’s performance the same as the following? Answer The
Selecting most recent rows in a SQL query
I want to join two tables, selecting the most recent rows for an ID value present in table 1. i.e. For each ID value in table 1, only return the most recently added row for an ID value. For example, table 1 looks something like this: So if the same ID value is found twice in this table, only return
Exclude records with certain values in Qubole
Using Qubole I have Table A (columns in json parsed…) I need to Select only IDs which have Recommendation GOOD but Decision BAD. Therefore output should be 3. I tried : Answer Use analytic functions. Demo: Result: