gcloud Export to Google Storage Bucket from Cloud SQL instance - google-cloud-sql

Running this command:
gcloud sql instances export myinstance gs://my_bucket_name/filename.csv -d "mydatabase" -t "mytable"
Giving me the following error:
ERROR: (gcloud.sql.instances.import) ERROR_RDBMS
I have manually ran console uploads to the bucket which go fine. I am able to login to the sql instance and run queries. Which makes me think that there are no permission issues. Has anybody ever seen this type of error and knows a way around it?
Note: i have googled for possible situations, and most of them point to either sql or bucket permission issues.

Nvm. I figured out that i need to make an oauth connection (using the json token generated from gcloud api/credentials section) to the instance before interacting with it.

Related

Copy data from Postgres DB (GCP Project A) to another Postgres DB (GCP Project B)

I would be happy to get your help / feedback re data load.
Goal:
Load source data from a Postgres database, which is located in GCP project A to another Postgres database, which is located in GCP project B.
Challenge:
Get a connection (I have an IAM account with sufficient rights to run a COPY TO / COPY FROM command) to the Postgres DB in GCP Project A and copy the table either to a CSV or create a dump that can be used in order to be inserted to another Postgres DB in GCP Project B.
How do I connect to the database (e.g. if I create a key, where shall I store the json keyfile and would that approach even be feasible?) with this IAM email account?
Other ways I've researched were to use psycopg2 (thus I could use the function cursor.copy_expert (which doesn’t need any superuser right or Postgres user credentials and copy the data), but I didn’t succeed in connecting to the database with psycopg2 due to challenges with cloud proxy.
Another idea was to use pg_dump or gcloud sql export csv.
I would be curious if some of you were facing a similar challenge and how did you solve it and what might be the best way/practice
You can have a try out database migration service. You can set up a continuous migration configuration and use Cloud SQL for PostgreSQL.
Hello after a lot of searching I've come to these solutions:
If you have continuous copy, you need to use the database migration service, check this documentation.
If you have one shot copy:
you can restore your instance, see the bottom page of this documentation
you can create a bucket and backup your instance on it, then import it from the other project

How to import sql file in Google SQL with binary mode enabled?

I have a database that is giving error:
ASCII '\0' appeared in the statement, but this is not allowed unless option --binary-mode is enabled and mysql is run in non-interactive mode. Set --binary-mode to 1 if ASCII '\0' is expected.
I'm including importing the database through the console with gcloud sql import sql mydb gs://my-path/mydb.sql --database=mydb but I don't see in the documentation any flags for binary mode. Is it possible at all?
Optional - is there a way to set this flag when importing through the MySQL Workbench. I haven't seen anything about it there too, but may be I'm missing some setting or something. If there is way to set that flag, then I can import my database through MySQL Workbench.
Thank you.
Depending where the source database is hosted, on Cloud SQL or on an on-premise environment, the proper flags are set during the export, so the dump file is compatible with the target database.
Since you would like to import a file that has been exported from an on-premise environment, mysqldump is the suggested way to perform the export.
First, create a dump file as suggested in the documentation. Make sure to pay attention to the following 2 points:
Do not export customer-created MySQL users. This will cause the import to the new instance to fail. Instead, manually create the MySQL users you wish to.
Make sure that you have configured the appropriate flags in order to make sure that the dump file will contain all the necessary details you need. Eg triggers, stored procedures etc.
Then, create a Cloud Storage Bucket and upload the dump file to the bucket.
Before proceeding with the import, grant the Storage Object Admin role to the service account of the target Cloud SQL instance. You may do that with the following command:
gsutil iam ch serviceAccount:[SERVICE-ACCOUNT]:objectAdmin gs://[BUCKET-NAME]
You may locate the aforementioned Service Account in the Cloud SQL instance Overview, or by running the following command:
gcloud sql instances describe [INSTANCE_NAME]
The service account will be mentioned at the serviceAccountEmailAddress field.
Now you are able to do the import either from Console, or using the gcloud command or a REST API.
More details in Google documentation
Best Practices for importing/exporting data

Heroku Postgres app works on local machine but not on Heroku

I built and deployed a Node.js Postgres app to Heroku and can not get to any of my endpoints via the Heroku site except the root GET route. Curiously, when I run Heroku local web ALL my endpoints behave exactly as they should. I can successfully perform CRUD on the app running via Heroku local web. However, when I try, for instance, to create a user using the Heroku URL, it returns an empty error message. Yet, when I check the associated database I find that the user was indeed created. Other than returning an empty error message when I try to either create a user or sign it, the app correctly responds with the different errors I programmed. For example, when I tweak my login details or try to register the same user I earlier tried to register it correctly says the user already exists!. Still, when I try to log in that same existing user I get a blank error message. Note that I created both the Heroku PostgreSQL database and my local PostgreSQL database from exactly the same queries. Please, can you help me through this bottleneck? I am using Postman to test my APIs.
Test to sign in user on Heroku app running on the local machine: success!
Same exact test with Heroku URL: cryptic error.
Ok, so after a lot of researching and fiddling around I discovered the solution. I did not add keys from my .env file to Heroku as config vars found under the settings tab of the Heroku User dashboard. Manually adding my environment variables resolved the matter. Now my app is working both on my local machine and via the Heroku URL.

Can't connect to Cloud SQL via SQL Proxy on Dataproc

I am trying to access Cloud SQL from Dataproc via Cloud SQL Proxy (without using Hive)
After much tinkering based on instructions here:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/cloud-sql-proxy
I got to the point where at least the cluster gets created with no errors and the proxy seems to be installed. However, my Java Spark jobs can't connect to the cluster with this error:
Exception in thread "main" java.sql.SQLException: Access denied for user 'root'#'localhost' (using password: NO)
I deliberately created an instance with NO user password, but it doesn't work for instances with the password either.
What I find strange is that when I access the same database from my local computer, also using a locally running Cloud SQL Proxy, everything works well, but when I try to force a similar error by deliberately submitting the wrong password, I get a similar, but DIFFERENT error, like this:
Exception in thread "main" java.sql.SQLException: Access denied for user 'root'#'cloudsqlproxy~217.138.38.242' (using password: YES)
So, in the Dataproc error, it says root#localhost, whereas in my local proxy the error says root#cloudproxy~IP address. Why is it doing this? It's exactly the same code running in both places. It seems like it's trying to connect to something local within the Dataproc master machine?
What further confirms this is that I don't see this error logged on the server side when the attempt fails on Dataproc, but the error IS logged when I force the failure from local machine. So the Dataproc's proxy doesn't seem to be pointing at the SQL Server?
I created the cluster with the following instructions:
--scopes sql-admin \
--initialization-actions gs://bucket_name/cloud-sql-proxy.sh \
--metadata 'enable-cloud-sql-hive-metastore=false' \
--metadata 'additional-cloud-sql-instances=project_id:europe-west2:sql_instance_id' \
And the connection string that I specify inside the Spark code is like this:
jdbc:mysql://127.0.0.1:3306/database_name
Thanks for your help!
**** Update:
Based on the below suggestion, I modified my connection string to be as follows:
"jdbc:mysql://google/DATABASE_NAME?cloudSqlInstance=INSTANCE_NAME&socketFactory=com.google.cloud.sql.mysql.SocketFactory&useSSL=false&user=root"
However, in this case I get the following error:
Exception in thread "main" java.sql.SQLNonTransientConnectionException: Cannot connect to MySQL server on google:3,306.
Make sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:108)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:95)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:87)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:61)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:71)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:458)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:230)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:226)
How/where is it supposed to get the driver for 'google'? Also, note that it seems to mal-format the default port 3306 and shows it as 3,306? (I tried supplying the port explicitly, but that didnt' help...
I followed instructions in the tutorial you shared and both Cloud SQL instance and Dataproc Cluster were created. The validation process also was carried out:
$ gcloud dataproc jobs submit pyspark --cluster githubtest pyspark_metastore_test.py
Job [63d2e1ef8c9f45ae818c135c775dcf93] submitted.
Waiting for job output...
18/08/22 17:21:51 INFO org.spark_project.jetty.util.log: Logging initialized #3074ms
...
Successfully found table table_mdhw in Cloud SQL Hive metastore
18/08/22 17:22:53 INFO org.spark_project.jetty.server.AbstractConnector: Stopped Spark#5061d2ce{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Job [63d2e1ef8c9f45ae818c135c775dcf93] finished successfully.
I only got the same error like yours when I put a different password for root. Could you update the root password and try again from the master the following command?
mysql -u root -h 127.0.0.1 -p
In my environment, the command above connects successfully. If that works, please check this link for further steps to connect your Java application. Authentication and the connector mysql-connector-java are required as additional steps.
Hope it helps!
I ran into the same issues, with the exact same symptoms (Access Denied on localhost instead of cloudsqlproxy~*, and google:3,306).
SSH-ing in and looking at /var/log/cloud-sql-proxy/cloud-sql-proxy.log, I saw that cloud-sql-proxy wasn't actually starting; port 3306 was apparently already in use for some reason. I added =tcp:3307 to the end of the instance connection name in additional-cloud-sql-instances, and I was up and running.
I never managed to get the SocketFactory URIs working. If changing the port doesn't work, others elsewhere have suggested using VPC.

How to load data from S3 to PostgreSQL RDS

I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...