Is it possible to export results from Google Cloud SQL using SQuirreL SQL - google-cloud-sql

I'm looking for the most efficient way to get results out of a Google Cloud SQL instance.
I know from the FAQ that the INTO OUTFILE command isn't supported, and the only official way of getting data out of Cloud SQL is by exporting your entire instance into a Cloud Storage bucket.
Should SQuirreL SQL be able to save the results of a query? They have a "Store results of SQL in file" script, but when I try that it gives me an "Invalid parameter for rows" error. I'm not sure if that is an issue with Squirrel or due to the Cloud SQL limitations.

You mentioned the "most efficient way" - do you need to do bulk export? In that case, exporting to a cloud storage bucket is the most efficient way.
If you just want to dump query result to file, assuming the number of rows is small, you can use the command line tool and pipe the result to a file (https://developers.google.com/cloud-sql/docs/commandline). SQuirreL SQL should be able to do the same thing. Not sure about that particular error you got.

Related

ADF Mapping Dataflow Temp Table issue inside SP call

I have a mapping dataflow inside a foreach activity which I'm using to copy several tables into ADLS; in the dataflow source activity, I call a stored procedure from my synapse environment. In the SP, I have a small temp table which I create to store some values which I will later use for processing a query.
When I run the pipeline, I get an error on the mapping dataflow; "SQLServerException: 111212: Operation cannot be performed within a transaction." If I remove the temp table, and just do a simple select * from a small table, it returns the data fine; it's only after I bring back the temp table that I get an issue.
Have you guys ever seen this before, and is there a way around this?
If you go through the official MS docs, this error is very well documented.
Failed with an error: "SQLServerException: 111212; Operation cannot be performed within a transaction."
Symptoms
When you use the Azure SQL Database as a sink in the data flow to
preview data, debug/trigger run and do other activities, you may find
your job fails with following error message:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'sink': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: 111212;Operation cannot be performed within a transaction.","Details":"at Sink 'sink': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: 111212;Operation cannot be performed within a transaction."}
Cause
The error "111212;Operation cannot be performed within a transaction." only occurs in the Synapse dedicated SQL pool. But you
mistakenly use the Azure SQL Database as the connector instead.
Recommendation
Confirm if your SQL Database is a Synapse dedicated SQL pool. If so,
use Azure Synapse Analytics as a connector shown in the picture below.
So after running some tests around this issue, it seems like Mapping Dataflows do not like Temp Tables when I call my stored procedure.
The way I ended up fixing this was that instead of using a Temp Table, I ended up using a CTE, which believe it or not, runs a bit faster than when I used the Temp Table.
#KarthikBhyresh
I looked at that article before, but it wasn't an issue with the sink, I was using Synapse LS as my source and a Data Lake Storage as my sink, so I knew from the beginning that this did not apply to my issue, even though it was the same error number.

Insight into Redshift Spectrum query error

I'm trying to use Redshift Spectrum to query data in s3. The data has been crawled by Glue, I've run successful data profile jobs on the files with DataBrew (so I know Glue has correctly read it), and I can see the correct tables in the query editor after creating the schema. But when I try to run simple queries I get one of two errors: if it's a small file I get: "ERROR: Parsed manifest is not a valid JSON object...."; if it's a large file I get: "ERROR: Manifest too large Detail:...". I suspect it's looking for or believes that the file in the query is a manifest, but I have no idea why or how to address it. I've followed the documentation as rigorously as possible, and I've replicated the process via a screen share with an AWS tech support rep who is also stumped.
Discovered the issue: error is happening because I had more than one type of file (i.e., files of differing layouts) in the same s3 folder. There may be other ways to solve the problem, but isolating one type of file for a given s3 folder solved the problem and allowed Redshift Spectrum to successfully execute queries against my file(s).

streaming PostgreSQL tables into Google BigQuery

I would like to automatically stream data from an external PostgreSQL database into a Google Cloud Platform BigQuery database in my GCP account. So far, I have seen that one can query external databases (MySQL or PostgreSQL) with the EXTERNAL_QUERY() function, e.g.:
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries
But for that to work, the database has to be in GCP Cloud SQL. I tried to see what options are there for streaming from the external PostgreSQL into a Cloud SQL PostgreSQL database, but I could only find information about replicating it in a one time copy, not streaming:
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external
The reason why I want this streaming into BigQuery is that I am using Google Data Studio to create reports from the external PostgreSQL, which works great, but GDS can only accept SQL query parameters if it comes from a Google BigQuery database. E.g. if we have a table with 1M entries, and we want a Google Data Studio parameter to be added by the user, this will turn into a:
SELECT * from table WHERE id=#parameter;
which means that the query will be faster, and won't hit the 100K records limit in Google Data Studio.
What's the best way of creating a connection between an external PostgreSQL (read-only access) and Google BigQuery so that when querying via BigQuery, one gets the same live results as querying the external PostgreSQL?
Perhaps you missed the options stated on the google cloud user guide?
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external#setup-replication
Notice in this section, it says:
"When you set up your replication settings, you can also decide whether the Cloud SQL replica should stay in-sync with the source database server after the initial import is complete. A replica that should stay in-sync is online. A replica that is only updated once, is offline."
I suspect online mode is what you are looking for.
What you are looking for will require some architecture design based on your needs and some coding. There isn't a feature to automatically sync your PostgreSQL database with BigQuery (apart from the EXTERNAL_QUERY() functionality that has some limitations - 1 connection per db - performance - total of connections - etc).
In case you are not looking for the data in real time, what you can do is with Airflow for instance, have a DAG to connect to all your DBs once per day (using KubernetesPodOperator for instance), extract the data (from past day) and loading it into BQ. A typical ETL process, but in this case more EL(T). You can run this process more often if you cannot wait one day for the previous day of data.
On the other hand, if streaming is what you are looking for, then I can think on a Dataflow Job. I guess you can connect using a JDBC connector.
In addition, depending on how you have your pipeline structure, it might be easier to implement (but harder to maintain) if at the same moment you write to your PostgreSQL DB, you also stream your data into BigQuery.
Not sure if you have tried this already, but instead of adding a parameter, if you add a dropdown filter based on a dimension, Data Studio will push that down to the underlying Postgres db in this form:
SELECT * from table WHERE id=$filter_value;
This should achieve the same results you want without going through BigQuery.

How to pass variable in Load command from IBM Object Storage file to Cloud DB2

I am using below command to load Object Storage file into DB2 table:NLU_TEMP_2.
CALL SYSPROC.ADMIN_CMD('load from "S3::s3.jp-tok.objectstorage.softlayer.net::
<s3-access-key-id>::<s3-secret-access-key>::nlu-test::practice_nlu.csv"
OF DEL modified by codepage=1208 coldel0x09 method P (2) WARNINGCOUNT 1000
MESSAGES ON SERVER INSERT into DASH12811.NLU_TEMP_2(nlu)');
above command inserts 2nd column from object storage file to DASH12811.NLU_TEMP_2 in nlu column.
I want to insert request_id from variable as a additional column:request_id in DASH12811.NLU_TEMP_2(request_id,nlu).
I read in some article to use statement concentrator literals to dynamically pass a value. Please let us know if anyone has an idea on how to use it.
Note, i would be using this query in DB2 but not DB2 warehouse. External tables wont work in DB2.
LOAD does not have any ability to include extra values that are not part of the load file. You can try to do things with columns that are generated by default in Db2 but it is not a good solution.
Really you need to wait until DB2 on Cloud supports external tables

writing SQL queries in db2 database

I'm new be to DB2/AS400. I know writing sql queries to insert/update in the database but not sure how could i perform same thing in DB2/AS400.Can any body guide me how could i write the sql insert/stored procedure queries in db2 database
STRSQL as suggested previously would work fine but it is a green screen (5250 emulation) option; if you are not familiar with that environment, I would recommend you use the 'Run SQL Script' function from either IBM i Access Client Solutions or IBM i Navigator, whichever is available to you.
On the AS400 command line issue the following command.
STRSQL
Start Here and this is typical IBM documentation. It leaves you guessing why am I here.