dbt restart with defer state picks only last run and ignores previous aborts - command-line

Currently we need to run two independent set of models at same time , I am currently looking into restartability for these two independent of each other as they will be monitored by separate teams.if both models abort restart picks only last one from runresults.json
Here is my current solution to restart both parallelly
dbt run --select +modelset2 --target-path .\target\modelset2
dbt run --select +modelset1 --target-path .\target\modelset1
for restartability
dbt run --select result:error --defer --state .\target\modelset2
dbt run --select result:error --defer --state .\target\modelset1
Is this a correct approach ..

Related

Does Airflow clear Tasks remove data from the database also?

We have a use case where we have to populate fresh data in our DB. There is already old data present from the successful DAG run in our DB . Now we need to delete the old data and re run the task.
Airflow already provides a command to clear selection.
airflow clear -dx occupancy_reports.* -t building -s 2022-04-01 -e 2022-04-30
Will running this also delete the data from the Database and then populate fresh data ?
I guess you meant : airflow **tasks** clear ...
It is only clear the set of task instance, as if they never ran (it is not rollback)

Ecto.reset inside docker container does not work

I am building a RESTful API with Phoenix and PostgreSQL and run the Phoenix app and the database in separate docker containers using docker-compose. If I attach a shell to the Phoenix container and type mix ecto.reset to reset the database, I get the following error:
** (Mix) The database for Home.Repo couldn't be dropped: ERROR 55006 (object_in_use):
database "home" is being accessed by other users
There are 10 other sessions using the database.
I already tried this:
REVOKE CONNECT ON DATABASE dbname FROM PUBLIC, username;
SELECT
pg_terminate_backend(pid)
FROM
pg_stat_activity
WHERE
pid <> pg_backend_pid()
AND datname = 'database_name';
What am I doing wrong?
You shouldn't stop only the cowboy webserver in this case, as it requires many phoenix dependencies to be up again, and most likely there will be a Supervision tree to restore the process if you find it and kill it yourself. So, as mentioned before, you should stop your whole phoenix server with Application.stop(:your_app) inside iex, and then do ecto.reset and start it again.
Although, you shouldn't need to use ecto reset within a running application, ecto.reset does the following: "ecto.drop", "ecto.create", "ecto.migrate". You should check if the drop is really necessary, as that is what's impossible to do in a running application that relies on database.
If you're resetting for testing purposes, for example, then you should also check Ecto.Sandbox, that sets up a copy database for each test:
https://hexdocs.pm/ecto/Ecto.Adapters.SQL.Sandbox.html#content

How to restore/rewind my PostgreSQL database

We do nightly full backups of our db and I then use that dump to create my own dev-db. The creation of the dev-db takes roughly 10 minutes so its scheduled every morning by cron before I get to work. So I can now work with an almost live db.
But when I'm testing things it would sometimes be convenient to rollback the full db or just some specific tables to the initial backup. Of course I could do the full recreation of the dev-db but that would make me wait for another 10 minutes before I could run the tests again.
So is there an easy way to restore/rewind the database/table to a specific point in time or from a dump?
I have tried to use pg_restore like this to restore specific tables:
pg_restore -d my-dev-db -n stuff -t tableA -t tableB latest-live-db.dump
I have tried with options like -cand --data-only also. But there seems to be several issues here that I did not foresee:
The old data is not automatically removed when the restored data is copied back.
There is several foreign-key constraints that makes this impossible (correct me if I'm wrong) without explicitly removing the FK before the restore and then adding them back again.
PK-sequences that gets out of order does not concern me at all at this point but that might be an issue as well.
Edit: more things I tested/looked into:
pg_basebackup
A more brute force alternative to pg_basebackup is to stop the db-server, copy the db-files, then start the db-server.
Both of the alternatives above fail because I have several local databases running in the same cluster and that sums up to a lot of data on disk. There is no way to separate the databases this way! So the file copy action here will not give me any speed gain.
I'm assuming you are asking about a database not a cluster. The first thing that comes to my mind is to restore the backup to 2 different dbs, one with the dev_db name and the other with another name like dev_db_back. Then when you need a fresh db drop dev_db and rename dev_db_backup to dev_db with
drop database if exists dev_db;
alter database dev_db_backup rename to dev_db;
After that, to have another source to rename from, restore the backup to dev_db_backup again. This could be done by a script so the dropping, renaming and restoring would be automated. As dropping and renaming are instantaneous just start the script and the renaming is done without a need to wait for the new restore.
If it is common to need repeated restores in less 10 minutes intervals I think you can try to do what you are doing inside a transaction:
begin;
-- alter the db
-- test the alterations
commit; -- or ...
-- rollback;

PostgreSQL How to check if my function is still running

Just a quick question .
I have one heavy function in PostgreSQL 9.3 , how I can check if the function is still running after several hours and how to run a function in background in psql ( my connection is unstable from time to time)
Thanks
For long running functions, it can be useful to have them RAISE LOG or RAISE NOTICE from time to time, indicating progress. If they're looping over millions of records, you might emit a log message every few thousand records.
Some people also (ab)use a SEQUENCE, where they get the nextval of the sequence in their function, and then directly read the sequence value to check progress. This is crude but effective. I prefer logging whenever possible.
To deal with disconnects, run psql on the remote side over ssh rather than connecting to the server directly over the PostgreSQL protocol. As Christian suggests, use screen so the remote psql doesn't get killed when the ssh session dies.
Alternately, you can use the traditional unix command nohup, which is available everywhere:
nohup psql -f the_script.sql </dev/null &
which will run psql in the background, writing all output and errors to a file named nohup.out.
You may also find that if you enable TCP keepalives, you don't lose remote connections anyway.
pg_stat_activity is a good hint to check if your function is still running. Also use screen or tmux on the server to ensure that it will survive a reconnect.
1 - Login to the psql console.
$ psql -U user -d database
2- Issue a \x command to format the results.
3- SELECT * from pg_stat_activity;
4- Scroll until you see your function name on the list. It should have an active status.
5- Check if there are any blocks on the table that your function relies on using:
SELECT k_locks.pid AS pid_blocking, k_activity.usename AS user_blocking, k_activity.query AS query_blocking, locks.pid AS pid_blocked, activity.usename AS user_blocked, activity.query AS query_blocked, to_char(age(now(), activity.query_start), 'HH24h:MIm:SSs') AS blocking_age FROM pg_catalog.pg_locks locks JOIN pg_catalog.pg_stat_activity activity ON locks.pid = activity.pid JOIN pg_catalog.pg_locks k_locks ON locks.locktype = k_locks.locktype and locks.database is not distinct from k_locks.database and locks.relation is not distinct from k_locks.relation and locks.page is not distinct from k_locks.page and locks.tuple is not distinct from k_locks.tuple and locks.virtualxid is not distinct from k_locks.virtualxid and locks.transactionid is not distinct from k_locks.transactionid and locks.classid is not distinct from k_locks.classid and locks.objid is not distinct from k_locks.objid and locks.objsubid is not distinct from k_locks.objsubid and locks.pid <> k_locks.pid JOIN pg_catalog.pg_stat_activity k_activity ON k_locks.pid = k_activity.pid WHERE k_locks.granted and not locks.granted ORDER BY activity.query_start;

Postgres: how to start a procedure right after database start?

I have dozens of unlogged tables, and doc says that an unlogged table is automatically truncated after a crash or unclean shutdown.
Based on that, I need to check some tables after database starts to see if they are "empty" and do something about it.
So in short words, I need to execute a procedure, right after database is started.
How the best way to do it?
PS: I'm running Postgres 9.1 on Ubuntu 12.04 server.
There is no such feature available (at time of writing, latest version was PostgreSQL 9.2). Your only options are:
Start a script from the PostgreSQL init script that polls the database and when the DB is ready locks the tables and populates them;
Modify the startup script to use pg_ctl start -w and invoke your script as soon as pg_ctl returns; this has the same race condition but avoids the need to poll.
Teach your application to run a test whenever it opens a new pooled connection to detect this condition, lock the tables, and populate them; or
Don't use unlogged tables for this task if your application can't cope with them being empty when it opens a new connection
There's been discussion of connect-time hooks on pgsql-hackers but no viable implementation has been posted and merged.
It's possible you could do something like this with PostgreSQL bgworkers, but it'd be a LOT harder than simply polling the DB from a script.
Postgres now has pg_isready for determining if the database is ready.
https://www.postgresql.org/docs/11/app-pg-isready.html