Persisting cached data after removal/eviction using Memcache API compatible store - nosql

This question is specifically pertaining to Couchbase, but I believe it would apply to anything with the memcached api.
Lets, say I am creating a client/server chat application, and on my server, I am storing chat session information for each user in a data bucket. Well after the chat session is over, I will remove the session object from the data bucket, but at the same time I also want to persist it to a permanent NoSQL datastore for reporting and analytics purposes. I also want session objects to be persisted upon cache eviction, when sessions timeout, etc.
Is there some sort of "best practice" (or even a function of Couchbase that I am missing) that enables me to do this efficiently and maintaining best possible performance of my in memory caching system?

Using Couchbase Server 2.0, you could setup two buckets (or two separate clusters if you want to separate physical resources). On the session cluster, you'd store JSON documents (the value in the key/value pair), perhaps like the following:
{
"sessionId" : "some-guid",
"users" : [ "user1", "user2" ],
"chatData" : [ "message1", "message2"],
"isActive" : true,
"timestamp" : [2012, 8, 6, 11, 57, 00]
}
You could then write a Map/Reduce view in the session database that gives you a list of all expired items (note the example below with the meta argument requires a recent build of Couchbase Server 2.0 - not the DP4.
function(doc, meta) {
if (doc.sessionId && ! doc.isActive) {
emit(meta.id, null);
}
}
Then, using whichever Couchbase client library you prefer, you could have a task to query the view, get the items and move them into the analytics cluster (or bucket). So in C# this would look something like:
var view = sessionClient.GetView("sessions", "all_inactive");
foreach(var item in view)
{
var doc = sessionClient.Get(item.ItemId);
analyticsClient.Store(StoreMode.Add, item.ItemId, doc);
sessionClient.Remove(item.ItemId);
}
If you instead, wanted to use an explicit timestamp or expiry, your view could index based on the timestamp:
function(doc) {
if (doc.sessionId && ! doc.isActive) {
emit(timestamp, null);
}
}
Your task could then query the view by including a startkey to return all documents that have not been touched in x days.
var view = sessionClient.GetView("sessions", "all_inactive").StartKey(new int[] { DateTime.Now.Year, DateTime.Now.Months, DateTime.Now.Days-1);
foreach(var item in view)
{
var doc = sessionClient.Get(item.ItemId);
analyticsClient.Store(StoreMode.Add, item.ItemId, doc);
sessionClient.Remove(item.ItemId);
}
Checkout http://www.couchbase.com/couchbase-server/next for more info on Couchbase Server 2.0 and if you need any clarification on this approach, just let me know on this thread.
-- John

CouchDB storage is (eventually) persistent and without built-in expiry mechanism, so whatever you store in it will remain stored until you remove it - it's not like in Memcached where you can set timeout for stored data.
So if you are storing session in CouchDB you will have to remove them on your own when they expire and since it's not an automated mechanism, but something you do on your own there is no reason for you not to save data wherever you want at the same time.
BTH I see no advantage of using Persistent NoSQL over SQL for session storage (and vice versa) - performance of both will be IO bound. Memory only key store or hybrid solution is a whole different story.
As for your problem: move data in you apps session expiry/session close mechanism and/or run a cron job that periodically checks session storage for expired sessions and move the data.

Related

Meteor - using snychronised non-persistent / in-memory MongoDB on the server

in a Meteor app, having real-time reactive updates between all connected clients is achieved with writing in collections, publishing and subscribing the right data. In normal case this means also database writes.
But what if I would like to sync particular data which does not need to be persistent and I would like to save the overhead of writing in the database ? Is it possible to use mini-mongo or other in-memory caching on the server by still preserving DDP synchronisation to all clients ?
Example
In my app I have a multiple collapsed threads and I want to show, which users currently expanded particular thread
Viewed by: Mike, Johny, Steven ...
I can store the information in the threads collection or make make a separate viewers collection and publish the information to the clients. But there is actually no meaning in making this information persistent an having the overhead of database writes.
I am confused by the collections documentation. which states:
OPTIONS
connection Object
The server connection that will manage this collection. Uses the default connection if not specified. Pass the return value of calling DDP.connect to specify a different server. Pass null to specify no connection.
and
... when you pass a name, here’s what happens:
...
On the client (and on the server if you specify a connection), a Minimongo instance is created.
But If I create a new collection and pass the option object with conneciton: null
// Creates a new Mongo collections and exports it
export const Presentations = new Mongo.Collection('presentations', {connection: null});
/**
* Publications
*/
if (Meteor.isServer) {
// This code only runs on the server
Meteor.publish(PRESENTATION_BY_MAP_ID, (mapId) => {
check(mapId, nonEmptyString);
return Presentations.find({ matchingMapId: mapId });
});
}
no data is being published to the clients.
TLDR: it's not possible.
There is no magic in Meteor that allow data being synced between clients while the data doesn't transit by the MongoDB database. The whole sync process through publications and subscriptions is triggered by MongoDB writes. Hence, if you don't write to database, you cannot sync data between clients (using the native pub/sub system available in Meteor).
After countless hours of trying everything possible I found a way to what I wanted:
export const Presentations = new Mongo.Collection('presentations', Meteor.isServer ? {connection: null} : {});
I checked the MongoDb and no presentations collection is being created. Also, n every server-restart the collection is empty. There is a small downside on the client, even the collectionHanlde.ready() is truthy the findOne() first returns undefined and is being synced afterwards.
I don't know if this is the right/preferable way, but it was the only one working for me so far. I tried to leave {connection: null} in the client code, but wasn't able to achieve any sync even though I implemented the added/changed/removed methods.
Sadly, I wasn't able to get any further help even in the meteor forum here and here

Change MongoDB Collection from local to server-side on running Meteor App

Due to the Meteor Docs there are 'server-side', 'client-side' and 'local' Collections. Is there a way to change the 'status' (e.g. if it's server-side, client-side or local) on a running app?
Use Case: A Web-Application where users can register and login. They can store sensible data. Depending on the Users personal preferences he should be able to choose if that data is stored local or on the server (General decision - not from case to case).
Current Approach: It's working fine if I either instantiate the Collection local CollectionName = new Mongo.Collection(null); or server side CollectionName = new Mongo.Collection('collectionName');.
But I can't think of an approach to make it possible to the user that he can change the Collection status.
Is there a way to do this?
Or is a workaround needed (e.g. Create both, a local and server-side Collaction, and just decide which to use for insert/update/find - what would mean a lot of duplicate code?!).
Edit: To make thinks clear: I want the user to be able to choose if his data is stored in a collection which is synced with the server or a collection without any syncing.
No, you can't change the type of a collection on a running app.
I think you are confused about what these terms mean. "Client-side" collections aren't permanently stored in localstorage. It just means it's a collection that's in the browser's memory. Just as "server-side" collections are those that reside in the server's memory. The difference is not how it's defined, but where the code runs. Most collections have a client-side and a server-side counterpart, and they are kept synchronized via pub/sub. Server-side collections are also synchronized with MongoDB (using the oplog).
Local collections can live in both places, but "local" means they aren't synchronized with anything.
I probably don't fully understand what you are trying to do, but local collections do not persist data.
If you pass null as the name, then you’re creating a local collection. It’s not synchronized anywhere; it’s just a local scratchpad that supports Mongo-style find, insert, update, and remove operations. (On both the client and the server, this scratchpad is implemented using Minimongo.)
This means any data added to them on the client will be blown away when the user closes their browser (unless you are also using one of the local collection persist meteor packages) and any data added to them on the server will be blown away when the meteor app is restarted. So I don't think you really want to use local collections.
Instead, I would use a regular collection (where a name is passed to the constructor) and either the standard allow or deny options (not really recommended anymore...but still a valid approach) or Meteor methods (the preferred approach) to control who can change data and what data is allowed to change.
Or, another option could be to pass your publication function a list of fields that the user wishes to see on the client for that given session. To do this you defined a new publication that receives a displayFields argument that you then use as the field specifier options in your collection .find().
Meteor.publish("userData", function (userId, displayFields) {
// validate the structure and contents of displayFields
// retrieve the data but only use the fields that the user requested
return UserData.find({user_id: userId}, {fields: displayFields});
});
Then on the client side you would subscribe to this and pass in the fields the user wishes to make visible on the client.
var displayFields = {
firstname: 1,
lastname: 0,
//...
};
this.subscribe("userData", [displayFields]);

CQRS and Event Sourcing coupled with Relational Database Design

Let me start by saying, I do not have real world experience with CQRS and that is the basis for this question.
Background:
I am building a system where a new key requirement is allowing admins to "playback" user actions (admins want to be able to step through every action that has happened in a system to any particular point). The caveats to this are, the company already has reports that are generated off of their current SQL db that they will not change (at least not in parallel with this new requirement) so the storage of record will be SQL. I do not have access to SQL's Change Data Capture, so creating a bunch of history tables with triggers would be incredibly difficult to maintain so I'd like to avoid that if at all possible. Lastly, there are potentially (not currently) a lot of data entry points that go through a versioning lifecycle that will result in changes to the SQL db (adding/removing fields) so if I tried to implement change tracking in SQL, I'd have to maintain the tables that handled the older versions of the data (nightmare).
Potential Solution
I am thinking about using NoSQL (Azure DocumentDB) to handle data storage (writes) and then have command handlers handle updating the current SQL (Azure SQL) with the relevant data to be queried (reads). That way the audit trail is created and that idea of "playing back" can be handled while also not disturbing the current back end functionality that is provided.
This approach would handle the requirement and satisfy the caveats. I wouldnt use CQRS for the entire app, just for the pieces that I needed this "playback" functionality. I know that I would have to mitigate failure points along the Client -> Write to DocumentDB -> Respond to user with success/fail -> Write to SQL on Success write to DocumentDB path, but my novice CQRS eyes can't see a reason why this isn't a great way to handle this.
Any advice would be greatly appreciated.
This article explained CQRS pattern and provided an example of a CQRS implementation please refer to it.
I am thinking about using NoSQL (Azure DocumentDB) to handle data storage (writes) and then have command handlers handle updating the current SQL (Azure SQL) with the relevant data to be queried (reads).
here is my suggestion, when a user do write operations to update a record, we could always do insert operation before admin audit user’s operations. For example, if user want to update a record, we could insert updating entity with a property that indicates if current operation is audited by admins instead of directly update the record.
Original data in document
{
"version1_data": {
"data": {
"id": "1",
"name": "jack",
"age": 28
},
"isaudit": true
}
}
For updating age field, we could insert entity with updated information instead of updating original data directly.
{
"version1_data": {
"data": {
"id": "1",
"name": "jack",
"age": 28
},
"isaudit": true
},
"version2_data": {
"data": {
"id": "1",
"name": "jack",
"age": 29
},
"isaudit": false
}
}
and then admin could check the current the document to audit user’s operations and determine if updates could write to SQL database.
One potential way to think about this is creating a transaction object that has a unique id and represents the work that needs to be done. The transaction in this case would be write an object to document db or write an object to sql db. It could contain the in memory object to be written and the destination db (doc db, sql, etc.) connection parameters.
Once you define your transaction you would need to adjust your work flow for a proper CQRS. Instead of client writing to doc db directly and waiting on the result of this call, let the client create a transaction with a unique id - which could be something like Date Time tick counts or an incremental transaction id for instance, and then write this transaction to a message queue like azure queue or service bus. Once you write the transaction to the queue return success to user at that point. Create worker roles that would read the transaction messages from this queue and process them, write objects to doc db. That is not overwriting the same entity in doc db, but just writing the transaction with the unique incremental id to doc db for that particular entity. You could also use azure table storage for that afaik.
After successfully updating the doc db transaction, the same worker role could write this transaction to a different message queue which would be processed by its own set of worker roles which would update the entity in sql db. If anything goes wrong in the interim, keep an error table and update failures in that error table to query and retry later on.

Add an extra field to collection only on the client side and not affecting server side

Is there a way to add or inject an additional or new field to a collection and then being able to access that newly inserted field to the collection on the client side and displaying it on a template or accessing it without affecting the server side? I didn't want to compromise with the APIs or the database since it's simply a count of something, like when a customer has a total of 8 deliveries.
I was doing this code where I'm subscribing to a collection and then trying to update the collection on the client side but obviously I should have no rights on updating it:
Meteor.subscribe("all_customers", function () {
Customer.find({}).forEach(function(data) {
var delivery_count = Delivery.find({ customerID: data._id }).count();
Customer.update( { _id: data._id } , { $push: { deliveries: delivery_count } } );
});
}),
And then doing this one where I'd try to manipulate the collection by inserting new key-value pair but it won't display or nothing at all when it's sent:
Meteor.subscribe("all_customers", function () {
Customer.find({}).forEach(function(data) {
var delivery_count = Delivery.find({ customerID: data._id }).count();
data.deliveries = delivery_count;
});
}),
My main objective is to basically be able to sort the count of the deliveries in the EasySearch package without compromising with the APIs or database but the package only sorts an existing field. Or is there a way to do this with the package alone?
Forgive me if this question may sound dumb but I hope someone could help.
It's completely possible, for that you can use low level publications described here: https://docs.meteor.com/api/pubsub.html
You have 2 solutions, one standard, the other hacky:
You follow documentation, you transform your document before sending it to the client. Your server data is not affected and you can enhance data (even do joins) from your subscription. It's cool but you can't push it too far as it might collide and conflict with another publication for the same collection on the same page. Solution is well explained here: How to publish a view/transform of a collection in Meteor?
You can do a client-only collection which will be customs objects with only the data you need. You still have a server-side observer on the source collection but you'll fetch new objects that client will treat and display without any calculation. You need more code but this way, you don't mess with classic server-client collections. It's a bit overkill so ping me if you want a detailed example.

Atomic get and delete in memcached?

Is there a way to do atomic get-and-delete in memcached?
In other words, I want to get the value for a key if it exists and delete it immediately, so this value can be read once and only once.
I think this pseudocode might work, but note the caveat postscript:
# When setting:
SET key-0 value
SET key-ns 0
# When getting:
ns = INCR key-ns
GET key-{ns - 1}
Constraint: I have millions of keys that could be accessed millions of times, and only a small percentage will have a value set at any given time. I don't want to have to update an atomic counter for every key with every get access request as above.
The canonical, but yet generic, answer to your question is : lock free hash table with a relaxed memory model.
The more relaxed is your memory model the more you get with a good lock free design, it's a way to get more performance out of the same chipset.
Here is a talk about that, I don't think that it's possible to answer to your question with a single post on hash tables and lock free programming, I'm not even trying to do that.
You cannot do this with memcached in a single command since there is no api that supports exactly what your asking for. What I would do to get the behavior your looking for is to implement some sort of marking behavior to signify that another client has or hasn't read the data. For example, you could create a JSON document as follows:
{
"data": "value",
"used": false
}
When you get the item check to see if it has already been used by another client by examining the used field. If it hasn't been used then set the value using the cas you got from the GET command and make sure that the document is updated to reflect the fact that a client has already accessed this key.
If the set operation fails because the cas is invalid then this means that another client has obtained this item and already updated it in memcached to signify that it has been used. In this case you just cancel whatever you were doing with the item and move on.
If the set operation succeeds then this means you client is the sole owner of this data. You can now delete it from memcached and do whatever processing on it you like.
Note that when doing the set I would also add an expiration time of about 5 seconds. This way if you application crashes your documents will clean themselves up if you don't finish with the entire process of deleting them.
To put some code to the answer from #mikewied, I think the basic gist is... (using Node.js):
var Memcached = require('memcached');
var memcache = new Memcached('localhost:11211');
var getOnce = function(key, callback) {
// gets is the check-and-set get (vs regular get)
memcache.gets(key, function(err, data) {
if (!data) {
// Cache miss, nothing to see here.
callback(null);
} else {
var yourData = data[key];
// Do a check-and-set to remove the data from the cache.
// This sets the value to null *only* if no one else already did.
memcache.cas(key, null /* new data */, data.cas, 10, function(err) {
if (err) {
// Check-and-set failed! (Here we'll treat it like a cache miss)
yourData = null;
}
callback(yourData);
});
}
});
};
I'm not an expert on Memcached and so I may be wrong. My answer is from reading the documentation and my experience using Memcached.
IMO this is not possible with memcached's current implementation.
to demonstrate why this is not possible currently here is a simple example to demonstrate the race condition:
two processes start at the same time
both execute a get/delete at the same time
memcached replies to both get commands at the same time
done (the desired result was to have get/delete execute atomically then the second get/delete to fail. instead memcached did get, get, delete, fails to delete)
to get an atomic get/delete would require:
a new command for memcached that is atomic let's call it get_delete
some sort of synchronization lock method of all the memcached clients to ensure both the get and delete commands are executed while the lock is held
so all clients would grab the synchronization lock whenever they need to enter the critcal section (i.e. get, delete) then release the lock after the critical section