MongoDB Backups: Expire Data from Collections by Setting TTL - mongodb

I have read the MongoDB's official guide on Expire Data from Collections by Setting TTL. I have set everything up and everything is running like clockwork.
One of the reasons why I have enabled the TTL is because one of the product's requirements is to auto-delete a specific collection. Well, the TLL handles it quite well. However, I have no idea if the data expiration will also persist on the MongoDB backups. The data is also supposed to be automatically deleted from the backups. In case the backups get leaked or restored, the expired data shouldn't be there.

Backup contains the data that was present in the database at the time of the backup.
Once a backup is made, it's just a bunch of data that sits somewhere without being touched. The documents that have been deleted since the backup was taken are still in the backup (arguably this is the point of the backup to begin with).
If you want to expire data from backups, the normal solution is to delete backups older than a certain age.

As mentioned by #D.SM data is not deleted from backup. One solution could be to encrypt your data, e.g. with Client-Side Field Level Encryption
For every day use a new encryption key for your data. When your data should expire, drop according encryption key from your password storage. With this your data becomes unusable, even if somebody restores the data from an old backup.

Related

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?
I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

Application event logging for statistics

I have app in production and working. It is hosted on heroku and uses Postgres 9.3 in the cloud. There are 2 databases: master and (read-only follower) slave. There are tables like Users, Likes, Followings, Subscriptions and so on. We need to store complete log about events like userCreated, userDeleted, userLikedSomething, userUnlikedSomething, userFollowedSomeone, userUnfollowedSomeone and so on. Later on we have to prepare statistic reports/charts about current and historical data. The main proble is that when user is deleted it is just removed from db so we can't retrieve users that were deleted from db because they are not stored in db anymore. Same applies to likes/unlikes follows/unfollows and so on. There are few things I don't know how to handle properly:
If we will store events in same database with foreign keys to user profiles then historical data will change because each event will be "linked" to current user profile which will change in time.
If we will store events in separate postgres database (db just for logs to offload the main database) then to join the events with actual user profiles we would have to use cross-db joins (dblink) which might be slow I guess (I have never used this feature before). Anyway this wont solve the problem from point 1.
I thought about using different type of database for storing logs - maybe MongoDb - as I remember mongoDb is more "write-heavy" than postgres (which is more "read-heavy"?) so it might be more suitable for storing logs/events. However then I would have to store user profiles in two databases (and even user profile per each event to solve point 1).
I know this is very general question but maybe there is some kind of standard approach or special database type for storing such data?

Restore mongodb data if collection or database was dropped

I need to know - Is it any possibility to restore data in collection or database if it was dropped?
The OS, by default (or in the case of Windows: any case) will not allow you to restore deleted data. You will need a third party program which can read the sectors. It is also good to note that while database drops will delete the files collection drops will not, instead they get nulled.
Dropping a collection should make it near on impossible to retrieve the data since the hard drive sectors that were used have now been overwritten with new data (basically one pass 0).
So the files may be recoverable on a database drop but that is still questionable.

Memcache lifetime

I read a article that using memcache for caching the passwords stored in the mysql database with a certain lifetime, but when a user's password was updated in the database,memcache still cache the old data,after the lifetime is over,fetching data from database again, then got the newest data.
Any other way to get the newest updated data immediately?
Usually the framework you use in your app let's you set up rules, such as:
How long to keep data in memcache / when to flush a record from cache
How / in what circumstances to go to the db
One approach, of course, is that in the routine that updates the pw in the db, that same routine expires the relevant record in memcache.

Memcache Delete Also Deletes Database?

I'm working on client-server software that uses memcached.
If I want to delete content from my database that is held within memcached, which of the following is usually required to achieve this objective?
A - delete from database AND delete from memcached
B - delete from memcached (which will automatically delete from the database)
Thanks
Option A is what you would want.
Memcache and your Database are completely separate and it is up to you to make them both reflect one another.
For example, if you insert into your DB you must also insert into memcache. If you delete from your DB you must also delete from memcache.
In most of today's frameworks this is abstracted out. However, if you are doing it manually then you must do both for consistent data.
Edit: by delete I mean invalidate