I'm working on client-server software that uses memcached.
If I want to delete content from my database that is held within memcached, which of the following is usually required to achieve this objective?
A - delete from database AND delete from memcached
B - delete from memcached (which will automatically delete from the database)
Thanks
Option A is what you would want.
Memcache and your Database are completely separate and it is up to you to make them both reflect one another.
For example, if you insert into your DB you must also insert into memcache. If you delete from your DB you must also delete from memcache.
In most of today's frameworks this is abstracted out. However, if you are doing it manually then you must do both for consistent data.
Edit: by delete I mean invalidate
Related
I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.
Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.
On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.
If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.
So for e.g.
(from bash) createdb todayDB -T snapshotDB
or
(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;
Pros:
In theory, always exact same DB by design (no custom logic)
Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)
Cons:
Takes 2x the disk space (although template could be on a low performance NFS etc)
For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.
There are 3 types of changes.
For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.
We have a requirement that says we should have a copy of all the items that were in our system at one point. The most simple way to explain it would be replication but ignoring the delete statement (INSERT and UPDATE are ok)
Is this possible ? or maybe the better question would be what is the best approach to tackle this kind of problem?
Make a copy/replica of current database and use triggers via dblink from current database to the replica. Use after insert and after update trigger to insert and update data in replica.
So whenever a row insertion/updation take place in current database it will directly reflect to replica.
I'm not sure that I understand the question completely, but I'll try to help:
First (opposite to #Sunit) - I suggest avoiding triggers. Triggers are introducing additional overhead and impacting performance.
The solution I would use (and I'm actually using in few of my projects with similar demands) - don't use DELETE at all. Instead you can add bit (boolean) column called "Deleted", set its default value to 0 (false), and instead of deleting the row you update this field to 1 (true). You'll also need to change your other queries (SELECT) to include something like "WHERE Deleted = 0".
Another option is to continue using DELETE as usual, and to allow deleting records from both primary and replica, but configure WAL archiving, and store WAL archives in some shared directory. This will allow you to moment-in-time recovery, meaning that you'll be able to restore another PostgreSQL instance to state of your cluster in any moment in time (i.e. before the deletion). This way you'll have a trace of deleted records, but pretty complicated procedure to reach the records. Depending on how often deleted records will be checked in the future (maybe they are not checked at all, but simply kept for just-in-case tracking) this approach my also help.
I have a web application using EntityFramework and an Azure SQL Database. I would like to know if deleting a row in the database removes the information permanently or simply marks is as deleted but it still be accessed if needed?
db.MyTable.Remove(objectInstance);
db.SaveChanges();
Is this someting that can be configured or do I need to implement this feature myself adding a deleted attribute?
The reason I want this is to be able to perform analytics including objects that might have been already deleted
EF has nothing to do with this actually. Whether records are deleted permanently or not is actually up to the RDBMS. EF is an ORM for the RDBMS.
Options IMO:
You manage the records marked as deleted using an extra column
You can move the deleted records to another table or file whichever is convenient for you to run analytics on. That way your queries will have to touch less number of records and be faster.
You can go through the log files and execute the INSERTs again to get the deleted records.
Hope my suggestions help you in right direction.
I am SysAdmin for a couple of large online shops and I'm researching Memcached as a possible caching solution.
The most accessed queries are the ones which make up the dynamic product pages, so it would make sense to cache these. Staff regularly use an update program to update the tables with new prices. As I understand, if I used Memcached the changes would only be apparent after the cache expires and not after my program has updated.
In the docs, I can see "Memcache::flush" which flushes ALL existing items, but is there a way to flush an individual object?
You can see in docs that there is delete command that removes one item. Also there is a set to add or replace one item.
The most important part is to have a solid naming scheme on your keys. Presumably you have a cms type page to update/insert rows in your database (mysql?). Just ensure that you delete the memcache record whenever you do an update in mysql and you'll be fine.
I'd like to create a read-only snapshot of a database at the end of each day, and keep them around for a couple of months.
I'd then like to be able to run queries against a specific (named) snapshot.
Is this possible to achieve elegantly and with minimal resource usage (the database only changes very slowly, but has a few GBs of data - so almost all data is common to all snapshots).
The usual way to create a snapshot in PostgreSQL is to use pg_dump/pg_restore.
A much quicker method is to simply use CREATE DATABASE to clone your database.
CREATE DATABASE my_copy_db TEMPLATE my_production_db;
which will be much faster than a dump/restore. The only drawback to this solution is that the source database must not have any open connections.
The copy will not be read-only by default, but you could simply revoke the respective privileges from the users to ensure that