how to dump data into a temporary table(without actually creating the temporary table) from an external table in Hive Script during run time - hiveql

In SQL stored procedures, we have an option of creating a temporary table "#temp" whose structure is as that of another table that it is referring to. Here we don't explicitly create and mention the structure of "#temp" table.
Do we have similar option is HQL Hive script to create a temp table during run time without actually creating the table structure. Thus I can dump data to temp table and use it. Below code shows an example of #temp table in SQL.
SELECT name, age, gender
INTO #MaleStudents
FROM student
WHERE gender = 'Male'

Hive has the concept of temporary tables, which are local to a user's session. These tables behave just like any other table, and can be created using CTAS commands too. Hive automatically deletes all temporary tables at the end of the Hive session in which they are created.
Read more about them here.
Hive Documentation
DWGEEK

You can create simple temporary table. On this table you can perform any operation.
Once you are done with work and log out of your session they will be deleted automatically.
Syntax for temporary table is :
CREATE TEMPORARY TABLE TABLE_NAME_HERE (key string, value string)

Related

SQL DB2: Declare Temporary Table Add Column

I have declared and temporary table successfully.
DECLARE GLOBAL TEMPORARY TABLE SESSION.MY_TEMP_TABLE
LIKE MYTABLES.AN_EXISTING_TABLE
INCLUDING IDENTITY
ON COMMIT PRESERVE ROWS
WITH REPLACE;
I then use the following to merge two tables and output this into my temporary table:
INSERT INTO SESSION.MY_TEMP_TABLE
SELECT a.*
FROM (SELECT * FROM MYTABLES.TABLE_A) as a
LEFT JOIN
(SELECT * FROM MYTABLES.TABLE_B) as b
ON a.KEY=b.KEY;
Now this above all works.
ISSUE: I now want to merge on two new variables from a further table (MYTABLES.TABLE_C), however it will not let me because I declared the temporary table with a certain number of columns and I am trying to add further columns. I did a google and it seems ALTER TABLE will not work with DECLARED TEMPORARY tables, any help please?
Session tables (DGTT) need to be declared with all the required columns , as you cannot use alter table to add additional columns to a session table.
A way around this limitation is to use session tables in a different manner, specifically to create a new session table on demand with whatever additional columns you need (possibly also including the data from other tables). This can be very fast when you use the NOT LOGGED option. It also works well if your session table uses DISTRIBUTE BY HASH on environments that support that feature.
Here is an example that shows 3 session tables, the third of which has all columns from the first two tables:
declare global temporary table session.m1 like emp including identity on commit preserve rows with replace not logged;
declare global temporary table session.m2 like org including identity on commit preserve rows with replace not logged;
declare global temporary table session.m3 as (select * from session.m1, session.m2) with data with replace not logged;
If you do not want to populate the session table at time of declaration you can use DEFINITION ONLY instead of WITH DATA (or use WITH NO DATA) and populate the table later via insert or merge.

Best practices for performing a table swap in Redshift

We're in the process of running a handful of hourly scripts on our Redshift cluster which build summary tables for data consumers. After assembling a staging table, the script then runs a transaction which deletes the existing table and replaces it with the staging table, as such:
BEGIN;
DROP TABLE IF EXISTS public.data_facts;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
COMMIT;
The problem with this operation is that long-running analysis queries will place an AccessShareLock on public.data_facts, preventing it from being dropped and thrashing our ETL cycle. I'm thinking a better solution would be one which renames the existing table, as such:
ALTER TABLE public.data_facts RENAME TO data_facts_old;
ALTER TABLE public.data_facts_stage RENAME TO data_facts;
DROP TABLE public.data_facts_old;
However, this approach presupposes that 1) public.data_facts exists, and 2) public.data_facts_old does not exist.
Do you know if there's a way to conduct this operation safely in SQL, without relying on application logic? (eg. something like ALTER TABLE IF EXISTS).
I haven't tried it but looking at the documentation of CREATE VIEW it seems that this can be done with late-binding views.
The main idea would be a view public.data_facts that users interact with. Behind the scenes, you can load new data and then swap the view to “point” to the new table.
Bootstrap
-- load data into public.data_facts_v0
CREATE VIEW public.data_facts AS
SELECT * from public.data_facts_v0 WITH NO SCHEMA BINDING;
Update
-- load data into public.data_facts_v1
CREATE OR REPLACE VIEW public.data_facts AS
SELECT * from public.data_facts_v1 WITH NO SCHEMA BINDING;
DROP TABLE public.data_facts_v0;
The WITH NO SCHEMA BINDING means the view will be late-binding. “A late-binding view doesn't check the underlying database objects, such as tables and other views, until the view is queried.” This means the update can even introduce a table with renamed columns or a completely new structure.
Notes:
It might be a good idea to wrap the swap operations into a transaction to make sure we don't drop the previous table if the VIEW swap failed.
You can add a new load time timestamp encode runlength default getdate() column to your target table, and make your ETL do this:
INSERT INTO public.data_facts
SELECT * FROM public.data_facts_staging;
DELETE FROM public.data_facts
WHERE load_time<(select max(load_time) from public.data_facts);
DROP TABLE public.data_facts_staging;
note: public.data_facts_staging should have exactly the same structure as public.data_facts except that the last column of public.data_facts is load_time, so that on insert it will be populated with the current timestamp.
The only implication is that it would require extra disk space for a moment between you insert new rows and delete the old rows, and load_time has to be always the last column. Also you have to vaccum table every time you do this.
Another good thing about this is that if your ETL fails and staging table is empty or there is no staging table you won't lose your data. In the pure SQL scenario of swapping tables with DDL you're not protected from dropping the target table when staging table is missing. In the suggested scenario if no new rows are inserted the delete statement deletes nothing (there are no rows less than max load time), so worst case is just having the old version of data.
p.s. there is a command that instead of insert ... select ... just changes the pointer from staging to target table (alter table ... append from ...) but it requires the same type of lock as alter table I guess, so I don't suggest this

Can two temporary tables with the same name exist in separate queries

I was wondering, if it is possible to have two temp tables with the same name in two separate queries without them conflicting when called upon later in the queries.
Query 1: Create Temp Table Tmp1 as ...
Query 2: Create Temp Table Tmp1 as ...
Query 1: Do something with Tmp1 ...
I am wondering if postgresql distinguishes between those two tables, maybe through addressing them as Query1.Tmp1 and Query2.Tmp1
Each connection to the database gets its own special temporary schema name, and temp tables are created in that schema. So there will not be any conflict between concurrent queries from separate connections, even if the tables have the same names. https://dba.stackexchange.com/a/5237 for more info
The PostgreSQL docs for creating tables states:
Temporary tables exist in a special schema, so a schema name cannot be given when creating a temporary table.

how can I rename a table / move to a different schema in sql DB2?

I am trying to rename a table in db2 like so
rename table schema1.mytable to schema2.mytable
but getting the following error message:
the name "mytable" has the wrong number of qualifiers.. SQLCODE=-108,SQLSTATE=42601
what is the problem here.... I am using the exact syntax from IBM publib documentation.
You cannot change the schema of a given object. You have to recreate it.
There are severals ways to do that:
If you have only one table, you can export and import/load the table. If you use the IDX format, the DDL will be included in the generated file. If using another format, the table has be created.
You can recreate the table by using:
Create table schema2.mytable like schema1.mytable
You can extract the DDL with the db2look tool
If you are changing the schema name for a schema given, you can use ADMIN_COPY_SCHEMA
These last two options only create the table structure, and you still need to import the data. After having create the table, you insert the data by different ways:
Inserting directly
insert into schema2.mytable select * from schema1.mytable
Via load from cursor
Via a Load or import from file (The file exported in the previous step)
The problem is the foreign relations, because they have to be recreated.
Finally, you can create an alias. It is easier, and you do not have to deal with relations.
You can easily rename a table with this statement:
RENAME TABLE SCHEMA.TABLENAME TO NEWTABLENAME;
You're not renaming table in provided example, you're trying to move to different schema, it's not the same thing. Look into db2move tool for this.
if you want to rename a table in the same schema, you can use like this.
RENAME TABLE schema.table_name TO "new_table_name";
Otherwise, you can use tools like DBeaver to rename or copy tables in a db2 db.
What if you leave it as is and create an alias with the new name and schema.
Renaming a table means to rename a table within same schema .To rename in other schema ,db2 call its ALIAS:
db2 create alias for

Postgres and temporary tables

I have a stored procedure that use temp table and explicitly drops it when done.
What happed when same procedure is run at same time by 2 different sessions? Sessions are run as single user (webapp).
Will one session interfere with temp table, and data inside it, of other session?
I'm using Postgres 9.0.
In postgresql temporary tables are unique for each session, so no problem.
If I am not totally wrong, the result would be one temp table per query in your procedure which generates the temp table, so the two temp tables would not disturb eachother.