Is there any mechanism to get informed about the last activity performed on a mnesia datababse, inparticular, the timestamp?
my case is not restricted in write activities but I need to know the last read activity.
Thank you very much.
Mnesia is not general purpose database, it's more like a library. So, no, there is no such functionality. But you can implement it by yourself. Just create proxy module for mnesia and update timestamp each time any function from your module get called. Also, notice, since Erlang is concurrent and parallel there might be cases where different functions called simultaneously. So, before updating timestamp make sure it's value bigger than one that's already stored.
Related
We would like to be able to read state inside a command use case.
We could get the state from event store for the specific aggregate, but what about querying aggregates by field(not id) or performing more complicated queries, that are not fitted for the event store?
The approach we were thinking was to use our read model for those cases as well and not only for query use cases.
This might be inconsistent, so a solution could be to have the latest version of the aggregate stored in both write/read models, in order to be able to tell if the state is correct or stale.
Does this make sense and if yes, if we need to get state by Id should we use event store or the read model?
If you want the absolute latest state of an event-sourced aggregate, you're going to have to read the latest snapshot (assuming that you are snapshotting) and then replay events since that snapshot from the event store. You can be aggressive about snapshotting (conceivably even saving a snapshot after every command), but you're giving away some write performance to make the read faster.
Updating the read model directly is conceivably possible, though that level of coupling is something that should be considered very carefully. Note also that you will very likely need some sort of two-phase commit to ensure that the read model is only updated when the write model is updated and vice versa. I strongly suggest considering why you're using CQRS/ES in this project, because you are quite possibly undermining that reason by doing this sort of thing.
In general, if you need a query for processing a particular command, it's likely that query will generally be the same, i.e. you don't need free-form query support. In that case, you can often have a read model that's tuned for exactly that query and which only cares about events which could affect that query: often a fairly small subset of the events. The finer-grained the read model, the easier it is to keep in sync (if it ignores 99% of events, for instance, it can't really fall that far behind).
Needing to make complex queries as part of command processing could also be a sign that your aggregate boundaries aren't right and could do with a re-examination.
Does this make sense
Maybe. Let's start with
This might be inconsistent
Yup, they might be. So what?
We typically respond to a query by sending an unlocked copy of the answer. In other words, it's possible that the actual information in the write model will change after this response is dispatched but before the response arrives at its destination. The client will be looking at a copy of the answer taken from the past.
So we might reasonably ask how much better it is to get information no more than one minute old compared to information no more than five minutes old. If the difference in value is pennies, then you should probably deploy the five minute version. If the difference is millions of dollars, then you're in a good position to negotiate a real budget to solve the problem.
For processing a command in our own write model, that kind of inconsistency isn't usually acceptable or wise. But neither of the two common answers require keeping the read and write models synchronized. The most common answer is to just work with the write model alone. The less common answer is to grab a snapshot out of a cache, and then apply any additional events to it to bring it up to date. The latter approach is "just" a performance optimization (first rule: don't.)
The variation that trips everyone up is trying to process a command somewhere else, enforcing a consistency rule on our data here. Once again, you need a really clear picture of how valuable the consistency is to the business. If it's really important, that may be a signal that the information in question shouldn't be split into two different piles - you may be working with the wrong underlying data model.
Possibly useful references
Pat Helland Data on the Outside Versus Data on the Inside
Udi Dahan Race Conditions Don't Exist
Our domain model deals with sales invoices, each of which has a unique, automatically generated number. When creating an invoice, our SalesInvoiceService retrieves a number from a SalesInvoiceNumberGenerator, creates a SalesInvoice using this number and a few other objects (seller, buyer, issue date, etc.) and stores it through the SalesInvoiceRepository. Since we are using MongoDB as our database, our MongoDbSalesInvoiceNumberGenerator uses a findAndModify command with $inc 1 on a given InvoicePolicies.nextSalesInvoiceNumber to generate this unique number, similar to what we would using an Oracle sequence.
This is working in normal situations. However, when invoice creation fails because of a broken business rule (e.g. invalid issue date), an exception is thrown and our InvoicePolicies.nextSalesInvoiceNumber has alreay been incremented. Obviously, since there is no transaction managing this unit of work, this increment is not rolled back, so we end up with lost invoice numbers. We do offer a manual compensation mechanism to the user, but we would like to avoid this sort of situation in the first place.
How would you deal with this situation? And no, switching to another database is not option :)
Thanks!
TL;DR: What you want is strict serializability, but you probably won't get it, unless you give up concurrency completely (then you even get linearizability, theoretically). Gap-free is easy, but making sure that today's invoice doesn't get a lower number than yesterdays is practically impossible.
This is tricky, or at least, very expensive. That is also true for any other data store, because you'll have to limit the concurrency of the application to guarantee it. Think of an auto-increasing stamp that is passed around in an office, but some office workers lose letters. Tricky... But you can reduce the likelihood.
Generating sequences without gaps is hard when contention is high, and very hard in a distributed system. Keeping a lock for the entire time the invoice is generated is usually not an option, though that would be easy. So let's try that:
Easiest way out: Use a singleton background worker, i.e. a single-threaded process that runs on a single machine. Have it explicitly check whether the current number is really present in the invoice collection. Because it's single-threaded on a single machine, it can't have race conditions. Done, via limiting concurrency.
When allowing concurrency, things get messy:
It might be best to use something like a two-phase commit protocol. Essentially, make the entire invoice creation process a long-running transaction, and store the pending transactions explicitly, i.e. store all numbers that haven't been used yet, but reserved.
Then track the completion status of each and every transaction. If a transaction hasn't finished after some timeout, consider that number available again. It's hard enough to add that to the counter code, but it's possible (check if a timed out transaction is present, otherwise get a new counter value).
There are several possible errors, but they can all be resolved. This is better explained in the link and on the net. Generally, getting the implementation right is hard though.
The timeout poses a problem, however, because you need to hard-code an assumption about the time it takes for invoices to be generated. That can be awkward close to day/month/year barriers, since you'll want to avoid creating invoice 12345 in 2015 and 12344 in 2014.
Even this won't guarantee gap free numbers for limited time intervals: if no more request is made that could use the gap number in the current year, you're facing a problem.
I wonder if using something like findAndModify and the new Transactions API combined could be used to achieve something like that while also accounting for gaps if ran within a transaction then? I haven't personally tried it, and my project isn't far along yet to worry about the billing system but would love to be able to use the same database for everything to make things a bit easier to operate.
One problem I would think is probably a write bottleneck but this should only take a few milliseconds I'd imagine and you could probably use a different counter for every jurisdiction or store like real life stores do. Then the cash register number could be part of it too, which I guess guess cash register numbers in the digital world could be the transaction processing server it went to if say you used microservices for example, so you could load balance round robin between them probably. That's assuming if it's uses a per document lock - which from my understanding it does possibly.
The only main time I'd probably worry about this bottleneck is if you had a very popular store or around black Friday where there's a huge spike or doing recurring invoices.
We have multiple processes which read one database table, get available record and work with it. It works fine.
When there is no record in this table each process waits 5 seconds and reads it again.
So, record could idle in the table for 5 seconds which is not good.
What would be recommended solution to eliminate such waiting and proceed immediately when record is created? One solution could be trigger which does something when record created. But this solution requires knowledge of working processes to deliver record to the one of idle processes.
It looks that ideal solution would be when each process starts to read via SQL from something and when record is created one of waiting processes will have it record and other will continue to wait.
Does Oracle 10 provide such or similar mechanism?
Look at Database Change Notification in 10g, which has since been renamed Continuous Query Notification.
I normally like to include an example but it's hard to find a 10g instance these days, and even a short example requires a lot of code. The process looks complicated, it might be better off to use triggers as you suggested, and deal with the tight coupling.
I am preparing a small app that will aggregate data on users on my website (via socket.io). I want to insert all data to my monogDB every hour.
What is the best way to do that? setInterval(60000) seems to be a lil bit lame :)
You can use cron for example and run your node.js app as scheduled job.
EDIT:
In case where the program have to run continuously, then probably setTimeout is one of the few possible choices (which is quite simple to implement). Otherwise you can offload your data to some temporary storage system, for example redis and then regularly run other node.js program to move your data, however this may introduce new dependency on other DB system and increase complexity depending on your scenario. Redis can also be in this case as some kind of failsafe solution in case when your main node.js app will unexpectedly be terminated and lose part or all of your data batch.
You should aggregate in real time, not once per hour.
I'd take a look at this presentation by BuddyMedia to see how they are doing real time aggregation down to the minute. I am using an adapted version of this approach for my realtime metrics and it works wonderfully.
http://www.slideshare.net/pstokes2/social-analytics-with-mongodb
Why not just hit the server with a curl request that triggers the database write? You can put the command on an hourly cron job and listen on a local port.
You could have mongo store the last time you copied your data and each time any request comes in you could check to see how long it's been since you last copied your data.
Or you could try a setInterval(checkRestore, 60000) for once a minute checks. checkRestore() would query the server to see if the last updated time is greater than an hour old. There are a few ways to do that.
An easy way to store the date is to just store it as the value of Date.now() (https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Date) and then check for something like db.logs.find({lastUpdate:{$lt:Date.now()-6000000}}).
I think I confused a few different solutions there, but hopefully something like that will work!
If you're using Node, a nice CRON-like tool to use is Forever. It uses to same CRON patterns to handle repetition of jobs.
I have a collection of objects, let's say they are "posts," and those objects can be modified. I'd like to display a list on the client side that updates dynamically. So on the client side, if doing this via polling, the client would invoke an API like:
getPostsChangedSince(serial)
where serial could be a monotonically increasing number, probably a timestamp. The client gets back a list of posts that have changed since that time, stores a new latest-serial, and next time the client polls it requests changes since that latest serial.
I think the basic idea is the same in this question (which is about ASP.NET): How to implement "get latests changed items" with ADO.NET Data Services?
I'm trying to find the best way to implement this in MongoDB.
I like the idea of using the time for the serial, since it automatically works at least mostly correctly even if there are multiple app servers. The serial would be stored in each post object, and updated whenever the object is modified.
The timestamp-based serial could be implemented as:
a Date (I think this is stored as a 64-bit milliseconds since epoch?)
a Timestamp http://www.mongodb.org/display/DOCS/Timestamp+Data+Type
something "by hand" e.g. store milliseconds as a number
Some nice features to have in a solution would include:
ensure that creating then immediately updating an object within the OS timer resolution will still increment the serial despite it being the same time
even better would to be guaranteed monotonic increase globally for all objects, not just guarantee that changing a given object will bump the serial on that object (absent this, getPostsChangedSince() calls probably need a fuzz backward in time, to avoid missing changes - at price of getting some changes twice)
mongodb-side timestamps might be nice because getting the time in the app creates a gap between when you get the time, and when the new object is saved and available in queries
update using findAndModify() with a query including the old serial, so "conflicts" (two changes at once) will throw an error allowing the app to retry
I realize some of the corner cases here are a little bit "academic" and can likely be fudged around in real life.
My approach so far is:
use the Date type for the serial
when modifying an object, get the current time, and if it matches the object's old serial, add 1 millisecond (yes this breaks if you make two modifications quickly without re-fetching from mongodb, but that seems OK)
use findAndModify(), but based on https://jira.mongodb.org/browse/JAVA-276 there may not be a way to detect if it ends up not finding anything to modify (i.e. second change is ignored, in case of conflict)
Questions:
I feel like I should use Timestamp instead; true? Any downsides?
if you had a mongo cluster, might time in milliseconds be more unique and correct than Timestamp's time in seconds plus a number, while with one mongod Timestamp is more unique?
is there a way to detect whether findAndModify() updated anything?
any general advice / experiences with this problem? how would you do it?
Have you considered "externalizing" the serial number generator? Time with MongoDB precision is good, but can become difficult to synchronize when involving multiple machines. One choice is that you can use memcached or something similar which is memory based, extremely fast and can be serialized (memcached has a CAS operation).
So what you would do is store a "seed" in memcached with a key say, counter.
Everytime an app needs to do an insert, it gets the next number from memcached and increments the counter.
On second thoughts, you can even do away with memcached and just use a single row (sorry document) collection that just has the counter. You can get the counter and increment it which will be an extremely fast operation, mimicking memcached.
And then naturally, you can index the data appropriately. However, I am wondering that this would result in the index to be very imbalanced (right-side loped). Depending upon the situation, it might be worthwhile exploring the use of capped collection. So when you insert data into your main collection, also insert it into the capped collection and read data from that collection.
You could continue to use your regular collection, as you do now, and after each update additionally insert the ID of the post into a special TTL collection. See http://docs.mongodb.org/manual/tutorial/expire-data/ for more info on using such a collection. Mongo will take care of all timing issues, you don't need to worry about serial numbers, and you can very quickly access time based lists of objects by their IDs.
Caveat:
use the blocking form of findAndModify, to ensure the changes have really been processed:
Blocking/Safe Writes
Unless you specify the "new" parameter as true the write operation will not block, and will not return an error (if there is one). If you do want the "new" document returned then the operation will wait until the write is done to return the new document, or an error.
For a "safe" (blocking) write operation you must call getLastError (if not using "new").