I'm currently in need to execute a stored procedure just before I load data into a table.
I've tried with a stored procedure activity but it still has a time (around 10 secs) to start the copy and it will interfere with other processes we have running.
Is there a faster way? I also looked at Azure functions but I dont think it should be that complicated.
The only way I can think of running it immediately before doing the actual copy is at the pre-copy script on the sink tab of the copy activity.
Any query you write there will be run before inserting the data, so if your database is a postgres (as you tagged the question) you may write:
Call functionName()
If it was a sql server:
exec functionName
Hope this helped!!
Related
I have a stored procedure on Postgres, which processes large data and takes a good time to complete.
In my application, there is a chance that 2 processes or schedulers can run this procedure at same time. I want to know if there is a built in mechanism in db to allow only instance of this procedure to run at db level.
I searched the internet, but didn't find anything concrete.
There is nothing built in to define a procedure (or function) so that concurrent execution is prevented.
But you can use advisory locks to achieve something like that.
At the very beginning of the procedure, you can add something like:
perform pg_advisory_lock(987654321);
which will then wait to get the lock. If a second session invokes the procedure it will have to wait.
Make sure you release the lock at the end of the procedure using pg_advisory_unlock() as they are not released when the transaction is committed.
If you use advisory locks elsewhere, make sure you use a key that can't be used in other places.
Any way to have the query still continue running even if the original calling client has shut down?
I have an ETL server with 64 cores.
I want to run a COPY command after I process many files per day.
COPY takes a really long time and that ETL server should only exist if it has more files to process, it's a waste of money to run it waiting on SQL Copy commands.
I could send a ready status to SQS and have it be picked up by a nano server, which can wait on SQL commands to finish all day without worry.
But it would probably better if I could just submit the SQL to the redshift server and have it work on it async.
Commands like psql -c "query..." will block until query is finished. If the psql process gets interrupted, it will cancel and roll back thq query. Is there a way to send an async query that does not rely on the server being online to complete the query.
Yes, look at EXTERNAL FUNCTION in Redshift.
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_FUNCTION.html
It's a Lambda UDF which can be executed synchronously but ingest all your files asynchronously.
CREATE EXTERNAL FUNCTION ingest_files()
RETURNS VARCHAR
STABLE
LAMBDA 'ingest files'
IAM_ROLE 'arn:aws:iam::123456789012:role/Redshift-Ingest-Test';
ingest_files is synchronous. It identifies file list to ingest and passes it to another asynchronous lambda function.
You can execute it like this:
select ingest_files();
ingest_files
------------
10 files submitted.
(1 row)
Now inside ingest_files you kick of another Lambda function invoke_ingest_files
#ingest_files.py
...
payload = [..file names...]
response = lambda_client.invoke(
InvocationType='Event',
FunctionName='invoke_ingest_files',
Payload=json.dumps(payload)
)
...
Look into Redshift Data API as it might meet your needs - https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html
You submit the query to Data API and it runs the query and you can poll for the result as you wish. May not work with your solution but an option to consider.
I am new to DB and I needed it for a project.My problem is as follows: I have 3 scripts that write to Postgres DB and another script that does updates on it. So far, with that I haven't had any issues. However, now I need to read that data at the same time. More specifically from that DB, I need to read last 1 min data meanwhile. And I have another script for that. But, when I run this script, I can't see any writes from the scripts that is supposed to write. Any suggestions?
Chances are your other scripts haven't COMMITed their data yet, which means that their updates aren't visible to your queries yet.
Similar to SQL script to create insert script, I need to generate a list of INSERT from a table to load onto another database (sqlite), like the dump command. I do this for sync purposes.
I have limitations because this will run on a cloud server without acces to the filesystem, so I need to do this in the DB (I can do this in the app server, I'm asking if is possible to do this in the DB directly).
In the app server, I load a datatable, walk his fieldnames and datatypes and build a insert... I wonder if exist a way to do the same in the DB...
I am not entirely sure whether that helps, but you can use simple ETL tool like 'Pentaho Kettle'. I used it once for a similar function and it did not take me more than 10 min. You can also schedule the jobs. I am not sure whether it is supported in database level.
Thanks,
Shankar
Is it possible to write a stored procedure or trigger that will be executed automatically inside of a database on particular time without any calls from application? If yes, then could anybody give me an example or link to some resource where I can read how to do that.
Check out pgAgent. If that doesn't work for you, there's always cron in Unix/Linux and the Task Scheduler service in Windows.
I don't think there's anything built-in, but you might want to check out
pgjobs or pgAgent.
You can use Stored Procedures. Stored Procedure is a set of statements, which allow ease and flexibility for a programmer because stored procedure is easy to execute than reissuing the number of individual SQL statements but they need to perform the same database operations.Using the stored procedure less information needs to be sent between the server and the client.
You can visit These links :-
Postgres Procedures
Best way to use stored Procedures