how to copy hive table data from production to stage environment. using DBVisualizer or DBeaver - dbeaver

I have a transactional table in hive production server. It contains almost 10M rows. I want to copy this table data to hive stage server to test scenarios. We are using DBVisualizer and DBeaver to connect to Hive.
Please let me know if there is any way to copy data across servers using any of these two tools.
Also please advise if same process is OK to copy huge volumes (over 100M rows)

Related

Is there any way of of get backup of postgres db in docker container without generating sql file?

I am new in Docker, I want to get a backup for my postgress database running in docker. All solutions i saw are offering to generate a dump sql script and restore db with running this script. But i dont want to do this? Is it possible backup and restore by migrating binary files of the db?
You can build Postgres image from plain empty Postgres db image. In Dockerfile you add SQL script which runs on db initialization (docker-entrypoint-initdb.d). The SQL script contains dblink to your backed up db and commands create table my_table as select * from my_table#remotedb. After docker build you have image with backup of your original database tables.
I do something similar with Oracle with more complexity (copying only subset of original database, preserving indexes etc.). Oracle docker image differs from PG in some properties but I believe the rough idea is applicable. It is some time ago I worked with PG so I won't advise you how to migrate binary files (though I believe it would be possible too).

loading one table from RDS / postgres into Redshift

We have a Redshift cluster that needs one table from one of our RDS / postgres databases. I'm not quite sure the best way to export that data and bring it in, what the exact steps should be.
In piecing together various blogs and articles the consensus appears to be using pg_dump to copy the table to a csv file, then copying it to an S3 bucket, and from there use the Redshift COPY command to bring it in to a new table-- that's my high level understanding, but am not sure what the command line switches should be, or the actual details. Is anyone doing this currently and if so, is what I have above the 'recommended' way to do a one-off import into Redshift?
It appears that you want to:
Export from Amazon RDS PostgreSQL
Import into Amazon Redshift
From Exporting data from an RDS for PostgreSQL DB instance to Amazon S3 - Amazon Relational Database Service:
You can query data from an RDS for PostgreSQL DB instance and export it directly into files stored in an Amazon S3 bucket. To do this, you use the aws_s3 PostgreSQL extension that Amazon RDS provides.
This will save a CSV file into Amazon S3.
You can then use the Amazon Redshift COPY command to load this CSV file into an existing Redshift table.
You will need some way to orchestrate these operations, which would involve running a command against the RDS database, waiting for it to finish, then running a command in the Redshift database. This could be done via a Python script that connects to each database (eg via psycopg2) in turn and runs the command.

Backing up redshift database

I want to backup entire redshift cluster, such that I can use it in other databases like mysql or hadoop in future.
I was looking up and creating a manual screenshot seems to be an option but I guess that wont work for cross database languages.
So what would be the detailed steps to backup the entire cluster of redshift
Cluster backups can be done via the aws console, however these can only be restored to another redshift cluster.
Because Redshift is not the same as postgres in many ways, it will be inpossible / tricky to use standard tools like pg_dump and pg_restore.
I think that your best option is to :
extract the ddl from the Redshift tables that you wish to create elsewhere, most ide's have a simple way to do this.
modify the ddl to work with your target database (e.g. postgres will
be easy, mysql harder)
copy the contents of the Redshift database, one table at a time to s3 using
the unload command
import the data that you unloaded in step 3 to your target tables

How do I restore a database dump to a Citus cluster?

While restoring a (pg_dump-produced) database dump, I get the following error:
Cannot execute COPY FROM on a distributed table on master node
How can I work around this?
COPY support was added in Citus 5.1, which was released May 2016 and is available in the official PostgreSQL Linux package repositories (PGDG).
Are you trying to load data via a pg_dump output? Creating distributed tables is slightly different than regular tables, and requires picking of partition columns and partitioning method. Take a look at the docs to get more information on both.

Backup specific tables in AWS RDS Postgres Instance

I have two databases on Amazon RDS, both Postgres. Database 1 and 2
I need to restore an instance from a snapshot of Database 1 for my Staging environment. (Database 2 is my current Staging DB).
However, I want the data from a few of the tables in Database 2 to overwrite the tables in the newly restored snapshot. What is the best way to do this?
When restoring RDS from a Snapshot, a new database instance is created. If you only wish to copy a portion of the snapshot:
Restore the snapshot to a new (temporary) database
Connect to the new database and dump the desired tables using pg_dump
Connect to your staging server and restore the tables using pg_restore (most probably deleting any matching existing tables first)
Delete the temporary database
pg_dump actually outputs SQL commands that are then used to recreate tables and restore data. Look at the content of a dump to understand how the restore process actually works.
I hope this still works for someone else.
With my team we faced a similar issue. We also had 2 Postgres databases and we also just needed to backup some tables from db1 to db2.
What we did is to use a lambda function using Python (from AWS lambda ofc) that connected to both databases and validates if db1.table1 has the same data as db2.table1, if not, then the lambda function should write the missing data from db1.table1 into db2.table1. The approach of using lambda was because we wanted to automate the process due to the main db (let's say db1) is constantly being updated. In addition, it allowed us to only backup our desired tables (let's say 3 tables out of 10), instead of backing up the whole database.
Note: Maybe you want to do these writes using temporary tables to avoid issues with any constraints you have in your tables.