I have created 6 tables :- A, B, C, D, E & F where A is the parent table. Now I want to create a trigger to insert the data in table F after successful insertion of data in tables A, B, C, D & E. Also the data in F should contain the values from table A(so basically the trigger should be made for Table A).
In addition to that a column in table F is json. So i need to combine 3 columns from table A(which is character varying) and insert into json field of table F.
The data in json should appear like column name : value
Please suggest me the right approach.
Related
I have a dynamic sql, where I need to select two columns(say A & B) from the table. I need to generate the result set only if column B has at least one non zero value. If there is no non zero value in the column B, result set should be empty.
Should be that simple, just as you wrote the rule -
select a, b
from the_table
where exists (select from the_table where b <> 0);
I have table A and table B. Every five minutes I sync table A into Table B using
insert into B (col_1) select col_1 from A on conflict do update A.col_1 = B.col_1
This works great to keep B up to date with A. Except it's possible that A has a record, then it gets inserted into B, and then deleted from A.
How can I ensure B does not contain any records that are not in A?
It would be better practice if you use logical replication https://www.postgresql.org/docs/current/logical-replication.html. In your current state you could run:
delete from b where not exists (select col_1 from a);
How to insert data into delta table with changing schema in Databricks.
In Databricks Scala, I'm exploding a Map column and loading it into a delta table. I have a predefined schema of the delta table.
Let's say the schema has 4 columns A, B, C, D.
So, one day 1 I'm loading my dataframe with 4 columns into the delta table using the below code.
loadfinaldf.write.format("delta").option("mergeSchema", "true")\
.mode("append").insertInto("table")
The columns in the dataframe change every day. For instance on day 2, two new columns E, F are added and there is no C column. Now I have 5 columns A, B, D, E, F in the dataframe. When I load this data into the delta table, columns E and F should be dynamically created in the table schema and the corresponding data should load into these two columns and column C should be populated as NULL. I was assuming that spark.conf.set("spark.databricks.delta.schema.autoMerge","true") will do the job. But I'm unable to achieve this.
My approach:
I was thinking to list the pre-defined delta schema and the dataframe schema and compare both before loading it into the delta table.
Can you use some Python logic?
result = pd.concat([df1, df2], axis=1, join="inner")
Then, push your dataframe into a dynamically created SQL table?
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html
Let's consider I have a four of tables A, B, C, D.
All of these have 4 exactly the same columns like:
last_modified_by
last_modified_time
active
inactive_date
So in order to avoid code duplicacy, I did:
CREATE TABLE X(
last_modified_by,
last_modified_time,
active,
inactive_date);
Now A, B, C and D will be something like:
CREATE TABLE A (
...,
...,
) INHERITS X;
Now I want a Partitioning in table A by field active. So I will do:
CREATE TABLE A (
...,
...,
) INHERITS X PARTITION BY LIST(active);
But this fails with error: cannot create partitioned table as inheritance child
So How should I do this?
Don't use inheritance just to avoid code duplication. You may have unpleasant surprises, e.g. if you want to change the data type of a column for only one table.
Besides, it won't work together with partitioning, because both use the same technology under the hood.
To avoid code duplication, you can use
CREATE TABLE b (LIKE a);
I have two partitioned tables. Table A is my main table and Table B is full of columns that are exact copies of some of the columns in Table A. However, there is one column in Table B that has data I need- because the matching column in Table A is full of nulls.
I would like to get rid of Table B completely, since most of it is redundant, and update the matching column in Table A with the data from the one column in Table B.
Visually,
Table A: Table B:
a b c d a b d
__________________ ______________
1 null 11 A 1 joe A
2 null 22 B 2 bob B
3 null 33 C 3 sal C
I want to fill the b column in Table A with the values from the b column in Table B, and then I no longer need Table B and can delete it. I will have to do this repeatedly since these two tables are given to me daily from two separate sources.
I cannot key these tables, since they are both partitioned.
I have tried:
update columnb:(exec columnb from TableB) from TableA;
but I get a `length error.
Suggestions on how to approach this in any manner are appreciated.
To replace a column in memory you would do the following.
t1:([]a:1 2 3;b:0N)
a b
---
1
2
3
t2:([]c:`aa`bb`cc;b:5 6 7)
c b
----
aa 5
bb 6
cc 7
t1,'t2
a b c
------
1 5 aa
2 6 bb
3 7 cc
If you are getting length errors then the columns do not have
the same count and the following would solve it. The obvious
problem with this solution is that it will start to repeat
data if t2 has a lower column count that t1. You will have to find out why that is.
t1,'count[t1]#t2
Now for partitions, you will use the amend function to change
the the b column of partitioned table, table A, at date 2007.02.23 (or whatever date your partition is).
This loads the b column of tableB into memory to preform the amend. You must perform the amend for each partition.
#[`:2007.02.23/tableA/;`b;:;count[tableA]#exec b from select b from tableB where date=2007.02.23]