Is there an equivalent of pg_backend_pid in Cassandra? - postgresql

I am starting to use Cassandra and I need to work with several sessions without creating different roles. I am trying to implement a record that saves the session ID in each modification (aka AuditLog). Previously it was already implemented in Postgresql, so I learned about triggers. I am adapting to Cassandra's triggers. So far I can't find a way to track a cql session / connection that doesn't include an external process. But in this way the use of triggers is excluded.

Cassandra has the function to enable or disable traces with the command TRACING, which will create traces for all the queries in that session. There is also a more useful approach with nodetool settraceprobability, where you can determine a percentage of traces stored.
All those traces are kept in a separate keyspace, for 3.x this is system_traces, the traces are kept with a Time to Live (TTL) of 24 hours.

Related

How to setup mutli-tenancy using row level security on Postgres with knex

I am architecting a database where I expected to have 1,000s of tenants where some data will be shared between tenants. I am currently planning on using Postgres with row level security for tenant isolation. I am also using knex and Objection.js to model the database in node.js.
Most of the tutorials I have seen look like this where you create a separate knex connection per tenant. However, I've run into a problem on my development machine where after I create ~100 connections, I received this error: "remaining connection slots are reserved for non-replication superuser connections".
I'm investigating a few possible solutions/work-arounds, but I was wondering if anyone has been able to make this setup work the way I'm intending. Thanks!
Perhaps one solution might be to cache a limited number of connections, and destroy the oldest cached connection when the limit is reached. See this code as an example.
That code should probably be improved, however, to use a Map as the knexCache instead of an object, since a Map remembers the insertion order.

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?
I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

How to write integration tests depending on Druid?

I am coding an application that generates reports from a Druid database. My integration tests need to read data from that database.
My current approach involves creating synthetic data for each of the tests. However, I am unable to remove data created from the database (be it by removing entries or completely dropping the schema). Tried this but still getting data back after disabling the segment and firing the kill task.
I think that either I am completely wrong with my approach or there is a way to delete information from the database that I haven't been able to find.
You can do this by below 2 approaches
Approach 1 :
Disable the segment(used=0)
Fire a kill task for that segment
Have the load and drop rules
Refer : http://druid.io/docs/latest/ingestion/tasks.html (look for destroying segments)
Approach 2 : (prefer this for doing integration tests before setting up production):
stop coordinator node and delete all entires in the druid_segments
table in the metadata store
stop historical node and delete everything inside the directory pointed by druid.segmentCache.locations at historical node
start coordinator and historical nodes
Remember this will delete everything from druid cluster.
In the end I worked around the issue by inserting data in Druid with ids specific to each unit test and querying for that.
Not very elegant since now one malicious test can (potentially) mess with the results of another test.

Quartz JDBC Job Store - Maintenance/Cleanup

I am currently in the processes of setting up Quartz in a load balanced environment using the JDBC job store and I am wondering how everyone manages the quartz job store DB.
For me Quartz (2.2.0) will be deployed as a part of a versioned application with multiple versions potentially existing on the one server at the one time. I am using the notation XXScheduler_v1 to ensure multiple schedulers play nice together. My code is working fine, with the quartz tables being populated with the triggers/jobs/etc as appropriate.
One thing I have noticed though is that there seems to be no database cleanup that occurs when the application is undeployed. What I mean is that the Job/Scheduler data seems to stay in the quartz database even though there is no longer a scheduler active.
This is less than ideal and I can imagine with my model the database would get larger than it needed to be with time. Am I missing how to hook-up some clean-up processes? Or does quartz expect us to do the db cleanup manually?
Cheers!
I got this issue once, and here is what I did to rectify the issue. This will work for sure but in case it does not then we will have backup of table so you don't have anything to loose while trying this.
Take sql dump of following tables using method mentioned at : Taking backup of single table
a) QRTZ_CRON_TRIGGERS
b) QRTZ_SIMPLE_TRIGGERS
c) QRTZ_TRIGGERS
d) QRTZ_JOB_DETAILS
Delete data from above tables in sequence as
delete from QRTZ_CRON_TRIGGERS;
delete from QRTZ_SIMPLE_TRIGGERS;
delete from QRTZ_TRIGGERS;
delete from QRTZ_JOB_DETAILS;
Restart your app which will then freshly insert all deleted tasks and related entries in above tables (Provided your app has its logic right).
This is more like starting your app with all the tasks being scheduled for the first time. So you must keep in mind that tasks will behave as if these are freshly inserted.
NOTE: If this does not work then apply the backup you took for tables and try to debug more closely. As of now, I have not seen this method fail.
It's definitely not doing any DB cleanup when undeploying the application or shutting down the scheduler. You would have to build some cleanup code during application shutdown (i.e. building some sort of StartupServlet or context listener that would do the cleanup on the destroy() event lifecycle)
You're not missing anything.
However, these quartz tables aren't different from any applicative DB objects you use in you data model. You add Employees table and in a later version you don't need it anymore. Who's responsible for deleting the old table? Only you. If you habe a DBA you might roll it on the DBA ;).
This kind of maintenance would typically be done using an uninstall script / wizard, upgrade script / wizard, or during the first startup of the application in its new version.
On a side note, typically different applications use different databases, or different schemas for the least, thus reducing inter-dependencies.
To clean Quartz Scheduler internal data one needs more SQL:
delete from QRTZ_CRON_TRIGGERS;
delete from QRTZ_SIMPLE_TRIGGERS;
delete from QRTZ_TRIGGERS;
delete from QRTZ_JOB_DETAILS;
delete from QRTZ_FIRED_TRIGGERS;
delete from QRTZ_LOCKS;
delete from QRTZ_SCHEDULER_STATE;

what happens to my dataset in case of unexpected failure

i know this has been asked here. But my question is slightly different. When the dataset was designed keeping the disconnected principle in mind, what was provided as a feature which would handle unexpected termination of the application, say a power failure or a windows hang or system exception leading to restart. Say the user has entered some 100 rows and it is modified at the dataset alone. Usually the dataset is updated at the application close or at a timely period.
In old times which programming using vb 6.0 all interaction used to take place directly with the database, thus each successful transaction was committing itself automatically. How can that be done using datasets?
DataSets are never for direct access to database, they are a disconnected model only. There is no intent that they be able to recover from machine failures.
If you want to work live against the database you need to use DataReaders and issue DbCommands against the database live for changes. This of course will increase your load on the database server though.
You have to balance the two for most applications. If you know a user just entered vital data as a new row, execute an insert command to the database, and put a copy in your local cached DataSet. Then your local queries can run against the disconnected data, and inserts are stored immediately.
A DataSet can be serialized very easily, so you could implement your own regular backup to disk by using serialization of the DataSet to the filesystem. This will give you some protection, but you will have to write your own code to check for any data that your application may have saved to disk previously and so on...
You could also ignore DataSets and use SqlDataReaders and SqlCommands for the same sort of 'direct access to the database' you are describing.