Is there any way to change data type inside struct column in a table?
Example:
emp_details: struct
emp_id: integer
emp_name: string
If emp_details is a column in a table which is of strict type and inside it emp_id and emp_name is present and I want to change emp_id to string.
Yes you can. You should explicitly cast the column and build the new emp_details using the casted column. Once you create the desired dataframe you can overwrite the table in Databricks to store it with the desired schema.
This should look something like this:
# For code readability, let's first create the correct casted column
original_df_with_casted_column_df = original_df.withColumn("casted_emp_id", col("emp_details.emp_id").cast("string"))
# We generate the new struct field using the original emp_name column and the newly created column after renaming it.
final_df = original_df.select(struct(col("casted_emp_id").alias("emp_id"), col("emp_name")).alias("emp_details"))
Related
I have a PostgresSQL table that has a Date column of type TEXT
Values look like this: 2019-07-19 00:00
I want to either cast this column to type DATE so I can query based on latest values, etc.... or create a new column of type DATE and cast into there (so I have both for the future). I would appreciate any advice on both options!
Hope this isn't a dupe, but I havn't found any answers on SO.
Some context, I will need to add more data later on to the table that only has the TEXT column, which is why i want to keep the original but open to suggestions.
You can alter the column type with the simple command:
alter table my_table alter my_col type date using my_col::date
This seems to be the best solution as maintaining duplicate columns of different types is a potential source of future trouble.
Note that all values in the column have to be null or be recognizable by Postgres as a date, otherwise the conversion will fail.
Test it in db<>fiddle.
However, if you insist on creating a new column, use the update command:
alter table my_table add my_date_col date;
update my_table
set my_date_col = my_col::date;
Db<>fiddle.
Is there a way to ignore values with missing columns when using INSERT INTO in PostgreSQL?
For example:
INSERT INTO tblExample(col_Exist1, col_Exist2, col_NotExist) VALUES ('Val1', 'Val2', 'Val3)
I want to insert a new row containing values Val1 and Val2, but ignore Val3 since its column does not exist, so the result would be:
# | col_Exist1 | col_Exist2
-----------------------------
1 | Val1 | Val2
I see that there is a INSERT ... ON CONFLICT DO NOTHING construct, but this seems to apply to an entire row only - not a singular value.
For explanation, I realise this may be not best practice, but my application is using dynamically created queries based on properties from documents - the properties can vary, but there are lots of columns, so defining them explicitly is painful. Instead, I'm using a 'template' document to define them and, hopefully, I can just ignore properties from other documents that don't exist in the template document.
Thanks in advance.
EDIT: I've figured out a workaround for now - I'm just querying the table to get the list of columns - if the column name exists, add the property to the new INSERT INTO query. The original question still stands.
What about moving document's data to json?
Form one table where you will have following fields:
Table: Documents
id: uuid4
name: varchar or text
data: json type according https://www.postgresql.org/docs/devel/datatype-json.html
After this trick you can store any dynamical data you'd like
I've multiple csv files in folder like employee.csv, student.csv, etc.,.. with headers
And also I've tables for all the files(Both header and table column name is same).
employee.csv
id|name|is_active
1|raja|1
2|arun|0
student.csv
id|name
1|raja
2|arun
Table Structure:
emplyee:
id INT, name VARCHAR, is_active BIT
student:
id INT, name VARCHAR
now I'm trying to do copy activity with all the files using foreach activity,
the student table copied successfully, but the employee table was not copied its throwing error while reading the employee.csv file.
Error Message:
{"Code":27001,"Message":"ErrorCode=TypeConversionInvalidHexLength,Exception occurred when converting value '0' for column name 'is_active' from type 'String' (precision:, scale:) to type 'ByteArray' (precision:0, scale:0). Additional info: ","EventType":0,"Category":5,"Data":{},"MsgId":null,"ExceptionType":"Microsoft.DataTransfer.Common.Shared.PluginRuntimeException","Source":null,"StackTrace":"","InnerEventInfos":[]}
Use data flow activity.
In dataflow activity, select Source.
After this select derived column and change datatype of is_active column from BIT to String.
As shown in below screenshot, Salary column has string datatype. So I changed it to integer.
To modify datatype use expression builder. You can use toString()
This way you can change datatype before sink activity.
In a last step, provide Sink as postgresql and run pipeline.
I have a table with an INTEGER Column which has NOT NULL constraint and a DEFAULT value = 0;
I need to copy data from a series of csv files.
In some of these files this column is an empty string.
So far, I have set NULL parameter in the COPY command to some non existing value so empty string is not converted to NULL value, but now I get an error saying that empty string is incorrect value for the INTEGER column.
I would like to use COPY command because of its speed, but maybe it is not possible.
The file contains no header. All columns in the file have their counterparts in the table.
It there a way to specify that:
an empty sting is zero, or
if there is en empty string use the default column value?
You could create a view on the table that does not contain the column and create an INSTEAD OF INSERT trigger on it. When you COPY data into that view, the default value will be used for the table. Don't know if the performance will be good enough.
Is there a way to rename a table column such that all references to that column in existing functions are automatically updated?
e.g. Doing this
ALTER TABLE public.person RENAME COLUMN name TO firstname;
would automatically change a reference like the following in any function:
return query
select * from person where name is null;
Since function bodies are just strings, there is no way to automatically change references to columns in function bodies when you rename a column.