Any way to have the query still continue running even if the original calling client has shut down?
I have an ETL server with 64 cores.
I want to run a COPY command after I process many files per day.
COPY takes a really long time and that ETL server should only exist if it has more files to process, it's a waste of money to run it waiting on SQL Copy commands.
I could send a ready status to SQS and have it be picked up by a nano server, which can wait on SQL commands to finish all day without worry.
But it would probably better if I could just submit the SQL to the redshift server and have it work on it async.
Commands like psql -c "query..." will block until query is finished. If the psql process gets interrupted, it will cancel and roll back thq query. Is there a way to send an async query that does not rely on the server being online to complete the query.
Yes, look at EXTERNAL FUNCTION in Redshift.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_FUNCTION.html
It's a Lambda UDF which can be executed synchronously but ingest all your files asynchronously.
CREATE EXTERNAL FUNCTION ingest_files()
RETURNS VARCHAR
STABLE
LAMBDA 'ingest files'
IAM_ROLE 'arn:aws:iam::123456789012:role/Redshift-Ingest-Test';
ingest_files is synchronous. It identifies file list to ingest and passes it to another asynchronous lambda function.
You can execute it like this:
select ingest_files();
ingest_files
------------
10 files submitted.
(1 row)
Now inside ingest_files you kick of another Lambda function invoke_ingest_files
#ingest_files.py
...
payload = [..file names...]
response = lambda_client.invoke(
InvocationType='Event',
FunctionName='invoke_ingest_files',
Payload=json.dumps(payload)
)
...
Look into Redshift Data API as it might meet your needs - https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html
You submit the query to Data API and it runs the query and you can poll for the result as you wish. May not work with your solution but an option to consider.
Related
I've had a strange bug pop up -- when I write to a partitioned table, then immediately after do a select on that same table, I get an error like:
./2018.04.23/ngbarx/cunadj. OS reports: Operation not permitted.
This error does not appear if, after writing the table, I wait a few seconds. This to me seems like pointing towards there being a caching situation, where q responds before an operation is complete, but afaik everything I am doing should be synchronous.
I would love to understand what I am doing wrong / what is causing this error exactly / which commands are executing asynchronously.
The horrible details:
Am writing from Python connected to q synchronously using the qpython3 package
The q session is launched with slaves i.e. -s 4
To write the partitioned table, I am using the unofficial function .Q.dcfgnt which can be found here
I write to a q session that was initialized with a database directory as is usual when dealing with partitioned tables
After writing the table with .Q.dcfgnt, but before doing the select, I also do .Q.chk`:/db/; system"l /db/"; .Q.cn table in that order, just to be sure the table is up and ready to use in the q session. These might be both overkill and in the wrong order, but I believe they are all synchronous calls afaik, please correct me if I am wrong.
The trigger to the error is a 10#select from table; I understand
why this is a bad idea to do in general on a partitioned table, but
also from my understanding it shouldn't be triggering the particular error that I am getting.
In order to check if a new version of the database (in staging) react the same way (or better) than the production database, I would like to capture all requests execute on production server, .. to replay them on the staging database.
Is there a tool that does this job ?
what would be interesting is the abality to compare execution time, when replay, and highlight queries executed slower.
Else, I thought I would catpure queries by configuring '0' to log_min_statement_duration (so that queries can be logged in postgresql logfile), and then parse the file to grab and re play request on other server.... is there a better way to do it ?
(current database version postgresql9.6, but I'm interesting even if it's for higher version.. for next time)
I'm currently in need to execute a stored procedure just before I load data into a table.
I've tried with a stored procedure activity but it still has a time (around 10 secs) to start the copy and it will interfere with other processes we have running.
Is there a faster way? I also looked at Azure functions but I dont think it should be that complicated.
The only way I can think of running it immediately before doing the actual copy is at the pre-copy script on the sink tab of the copy activity.
Any query you write there will be run before inserting the data, so if your database is a postgres (as you tagged the question) you may write:
Call functionName()
If it was a sql server:
exec functionName
Hope this helped!!
I'm writing a program to run mass calculation and output results into PostgreSQL.
My platform is Windows Sever 2008, PostgreSQL 10. My program is written in C.
The results would be produced group by group, finishing of each group will create an extra thread to write the output.
Now since the output threads are created one by one, it is possible that two or more SQL input commands will be created simultaneously, or the previous one is under process when new ones call the function.
So my questions are:
(1) What would happen if one thread is in SQL processing and another thread called PQexec(PGconn *conn, const char *query), would they effect each other?
(2) What if I apply different PGconn? Would it speed up?
If you try to call PQexec on a connection that is in the process of executing an SQL statement, you would cause a protocol violation. That just doesn't work.
Processing could certainly be made faster if you use several database connections in parallel — concurrent transactions is something that PostgreSQL is designed for.
Does anyone knows how can I set up an insert trigger so when a perform an insert from my application, the data gets inserted and postgres returns, even before the trigger finishes executing?
There is no built-in support for this; you will have to hack something up. Options include:
Write the trigger in C, Perl, or Python and have it launch a separate process to do the things you want. This can get tricky and possibly slightly dangerous to your database system, and it only works if the things you want to do are outside of the database.
Write a lightweight trigger function that only records an entry into a log or task table, and have a separate job or daemon that looks into that table on its own schedule and executes things from there. That's more or less how Slony works.
The question is : why do you need it? Triggers should be fast. If you need to do something complicated, write trigger that send notification to some daemon that does the complex part - for example using LISTEN/NOTIFY feature of PostgreSQL.