Airflow export schema only from PostgreSQL to bigquery - postgresql

In airflow, we can export databases like postgres, MySQL and etc to GCS. they have an option called schema file where the SCHEMA of the source table will be exported as a JSON file, and we can use it for creating the table on bigquery.
But unfortunately, we can export the schema file with select * from table; (or we can reduce the rows with select * from table limit 1). It will upload both the data and the schema files.
Is there a way to export only the schema file without data?

You can use INFORMATION_SCHEMA to pull the schema/metadata/columns from your table.
For example:
SELECT
*
FROM
`bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name="population_by_zip_2010"
See here.

Related

Unload data from snowflake into Postgres?

I want to unload data from a snowflake table into a Postgres database. Snowflake documentation does not show an unload option in a relational database.
Is there a way to unload the data from snowflake to Postgres currently.
Any help is appreciated.
Snowflake only has connectivity to cloud storage. It can't connect to any other database directly.
If the table is small to medium size, you can use the Snowflake Web GUI:
Query the data: SELECT * FROM my_table;
Press the download button in the Results pane (next to the Copy button) and export as TSV or CSV
Import the file into Postgress (I don't know the details of this step)

sqoop export of hive orc table

I have a hive table in orc format populated by pyspark dataframe_writer.
I need to export this table to oracle.I am having issues exporting the table because sqoop could not parse the orc file format.
Are there any special considerations or parameters that need to be specified with the sqoop command for exporting hive orc table.
A simple Google query points to that blog post labeled quite explicitly...
How to Sqoop Export a Hive ORC table to a Oracle Database?
And there is also that SO post labeled...
Reading ORC files and putting into RDBMS?
So it appears that you did not do any research.
By the way, did you consider using Spark to send the data directly into an Oracle staging table, via JDBC, without the intermediate ORC dump?
I just worked on the same sqoop from orc to Oracle. Make sure you have your ORC table pre-created with correct datatypes as you have them in dataframe. Same order of the columns will also ease the sqoop. If you tried any command , please post it.

How to export data including large objects from Postgres and later import the exported data to Greenplum

I don't want to use pg_dump to export data into sql script, since feeding it to the greenplum cluster is too slow when I have a large amount of data to import. So it seems using greenplum's gpfdist is prefered. Is there any way I can do this?
Or as an alternative, can I export a particular Postgres table's data into a CSV format file containing the large orbjects of that table?
pg_dump will create a file that will use "COPY" to load the data back into a database. When loading into Greenplum, it will load through the Master server and for very large loads, it will become a bottleneck. Yes, the preferred method is to use gpfdist but you can most certainly use COPY to load data into Greenplum. It won't load in the 10+ TB per hour rate that gpfdist can achieve but it still can achieve 1 to 2 TB per hour.
Another alternative is to use gpfdist to execute a program to get data. It would execute the SELECT statement against PostgreSQL to make that available to an External Table in Greenplum. I created a wrapper for this process called, "gplink". You can check it out here: http://www.pivotalguru.com/?page_id=982
Accoridng to greenplum reference:
The simplest data loading method is the SQL INSERT statement...
You can use the COPY command to load the data into a table when the data
is in external text files...
You can use a pair of Greenplum utilities, gpfdist and gpload, to load external data into tables...
Nevertheless if you want to use csv to import data, you can generate csv with large object "filename" joining you table against pg_largeobject. Eg:
b=# create table lo (n text,p oid);
CREATE TABLE
b=# insert into lo values('wheel',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=# copy (select lo.*, pg_largeobject.pageno, pg_largeobject.data from lo join pg_largeobject on lo.p = loid) to '/tmp/lo.csv' WITH (format csv, header);
COPY 20
Generated /tmp/lo.csv will have name, oid and data bytea in csv format.

how can I rename a table / move to a different schema in sql DB2?

I am trying to rename a table in db2 like so
rename table schema1.mytable to schema2.mytable
but getting the following error message:
the name "mytable" has the wrong number of qualifiers.. SQLCODE=-108,SQLSTATE=42601
what is the problem here.... I am using the exact syntax from IBM publib documentation.
You cannot change the schema of a given object. You have to recreate it.
There are severals ways to do that:
If you have only one table, you can export and import/load the table. If you use the IDX format, the DDL will be included in the generated file. If using another format, the table has be created.
You can recreate the table by using:
Create table schema2.mytable like schema1.mytable
You can extract the DDL with the db2look tool
If you are changing the schema name for a schema given, you can use ADMIN_COPY_SCHEMA
These last two options only create the table structure, and you still need to import the data. After having create the table, you insert the data by different ways:
Inserting directly
insert into schema2.mytable select * from schema1.mytable
Via load from cursor
Via a Load or import from file (The file exported in the previous step)
The problem is the foreign relations, because they have to be recreated.
Finally, you can create an alias. It is easier, and you do not have to deal with relations.
You can easily rename a table with this statement:
RENAME TABLE SCHEMA.TABLENAME TO NEWTABLENAME;
You're not renaming table in provided example, you're trying to move to different schema, it's not the same thing. Look into db2move tool for this.
if you want to rename a table in the same schema, you can use like this.
RENAME TABLE schema.table_name TO "new_table_name";
Otherwise, you can use tools like DBeaver to rename or copy tables in a db2 db.
What if you leave it as is and create an alias with the new name and schema.
Renaming a table means to rename a table within same schema .To rename in other schema ,db2 call its ALIAS:
db2 create alias for

dump subset of table

I want to dump a subset of a table of my postgres database. Is there a way to dump a SELECT statement without creating a view?
I need to copy a part of the table to an other postgres database.
Use COPY to dump it directly to disk.
Example (from the fine manual) using a SELECT:
COPY
(SELECT * FROM country WHERE country_name LIKE 'A%')
TO '/usr1/proj/bray/sql/a_list_countries.copy';