Which SQL function can I use to merge datasets? - merge

I keep getting an error when attempting to merge datasets. Is there an inner function that I could use to avoid errors in the future?
Attempted to use merge info

Related

We have an issue in running our dataprep pipeline using joins of reference dataset

It seems that flows using the union of reference a dataset fails, whereas the dataflow console presents a fine execution. Our flow is based on a reference dataset union. When enabling the union in a receipe, the ouput fails without any explicit message (and whereas in dataflow the treatment seems ok).
I try to reduce the data to only one line in each branch before the union - it does not change the fail.

How to perform a merge operation on dataset/dataframe in sparksql - Python

In SQL, you can perform MERGE operation on a table and insert/update data into it. Example: merge in snowflake
Are we able to do something similar in pyspark.sql?

Why is not possible to use Commit and rollback in a PostgreSQL procedure?

I am creating procedures on a PostgreSQL database. I read about is not possible to use rollback inside these procedures.
Why?
Is it possible to use commit?
I guess that this is related with ACID properties, but what if we have two insert operations in a procedure. If the second fails the second one gets a rollback?
Thank you.
Postgres' overview gives a hint by explaining how their functions are different than traditional stored procedures:
Functions created with PL/pgSQL can be used anywhere that built-in functions could be used. For example, it is possible to create complex conditional computation functions and later use them to define operators or use them in index expressions.
This makes it awkward to support transactions within functions for every possible situation. From the docs:
Functions and trigger procedures are always executed within a transaction established by an outer query... However, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without affecting the outer transaction.
If you were to have two INSERTs in a function, the first can be wrapped in an EXCEPTION block to catch any errors and decide if the second should be executed.
You are correct. You cannot rollback transactions that were started prior to the procedure from within the procedure. In addition, a transaction cannot be created within the procedure either. You would receive this error:
ERROR: cannot begin/end transactions in PL/pgSQL
SQL state: 0A000
Hint: Use a BEGIN block with an EXCEPTION clause instead.
As this error states, and as Matt mentioned, you could use an exception block to essentially perform a rollback. From the help:
When an error is caught by an EXCEPTION clause, the local variables of the PL/pgSQL function remain as they were when the error occurred, but all changes to persistent database state within the block are rolled back.
Alternatively, you could call the procedure from within a transaction, and roll that back if necessary.

Subtransactions in functions that can commit

I read that I can use a BEGIN-EXCEPTION block to have a subtransaction in a FUNCTION that can be rolled back. But why is it not possible to commit this subtransaction?
How can I circumvent the "all-or-nothing" transaction behavior of functions written in PL/pgSQL? Is it possible to have the function make commits using subtransactions while the outer transaction could be rolled back?
You can't circumvent it at the time of writing (PG 9.3).
Or more precisely, not directly. You can mimic autonomous subtransactions by using dblink, but be wary that doing so is a can of worms: what's supposed to happen, for instance, if your outer transaction is rolled back?
For background and references to discussions related to the topic on the PG-Hackers list, see:
http://wiki.postgresql.org/wiki/Autonomous_subtransactions
http://www.postgresql.org/message-id/20111218082812.GA14355#leggeri.gi.lan

SQL. How to exclude a statement from Actual Exec Plan?

SQL 2008. I am running sproc from SQL Studio and when I try to get Actual Execution Plan it blows up the tempdb.
I narrowed the problem down to call to a row scalar function which is used on 700K rows table.
I deduced that SQL is trying to create 700K exec plans for that function and writes all the data to tempdb which has 3Gb free space..
I dont really need to see the plan for that function.
Can I explicitely exclude a statement from generation of exec plan?
You can't exclude it from a execution plan, other than removing the call from the query.
It does however, sound like a prime candidate to switch from a scalar UDF to an inline table UDF. Scalar UDFs can be a big cause of poor performance due to be run once per row in a query.
Have a read through this article which contains an example to demonstrate.