How to change data type in csv column

Question

Using databricks with SQL, I have to import my csv dataset into a table and analyse data using it. My problem is after I imported csv dataset, all column are String type, but some of these need to be Numeric. How can I solve? How can I define the column types of a csv file? I tried converting file in

Accepted Answer

If you are using Databricks on Azure, when you select &#8220;Create table with UI&#8221; there should be options for you to choose a data type for each column as in the screenshot A below.If you are importing table by some Python Spark codes, there should be an option, infer_schema, for you to set.  If it is set to &#8220;true&#8221;, all columns that contain only numeric will have appropriate numeric data types.    file_location = "/FileStore/shared_uploads/xxx/dbo_project.csv"    file_type = "csv"    infer_schema = "true"    first_row_is_header = "false"    delimiter = ","    df = spark.read.format(file_type)       .option("inferSchema", infer_schema)       .option("header", first_row_is_header)       .option("sep", delimiter)       .load(file_location)Screenshot A

Advertisement

Answer