Most of the common use cases that I've found with delayed messages have typically used short lived delays of seconds, minutes, or hours for things such as rate limiting and de-duping.
The question is what special considerations should there be for events that delay for much longer? Say for days, or even weeks. I'm more interested in actual problems someone has encountered, or heard of, but theoretical concerns are welcome beyond the obvious things such as persistent messages. Would the opportunity for deserialization errors be more pronounced? Are there operational concerns to be weighed?
Related
I would say that the following are valid use cases for Early Media when used for IVRs:
Filtering
Disconnect incoming calls based on certain criteria (such as callers from a specific area code, or to play a message before hanging up on after hour callers).
Rate limiting
For example to limit the number of simultaneous callers, where the excess calls are either disconnected, or placed in a queue. (I'm not sure if the queue example is possible though without answering the call.)
Are there more?
Some links that were helpful:
https://www.dialogic.com/webhelp/csp1010/8.4.1_ipn3/sip_software_chap_-_early_media.htm
https://wiki.asterisk.org/wiki/display/AST/Early+Media+and+the+Progress+Application
https://freeswitch.org/confluence/display/FREESWITCH/Early+Media
3) Legal: in some countries it is illegal for a call center using payed line (caller pay per minute) to accept a call without sending it straight to an agent. So you can't accept the call, give the user some waiting music for 15 minutes (and make them pay for the privilege of waiting as well).
Result: you don't accept the call. However, this creates a new problem: if the caller only hears the ring back tone for those 15 minutes, he/she will assume no one will answer and hangs up.
Using early media, you can give them the traditional "your call is very important to us, please hold on"-type of experience without accepting the call and without starting to charge money. Of course this also depends somewhat on how much the provider is willing to tolerate, as this can also affects their income (depending on their own business model).
4) Comfort: you may not be aware of this, but the sound you hear as caller when the other side is ringing (ring-back tone), is not universally the same throughout the world. A company with a global number may wish to use early media to provide a ring-back tone more familiar to you, depending on where you are calling from. It was always a bit niche of a concept but some target audiences are statistically more likely to hangup if they hear an unfamiliar sound. 15-20 year ago this might have been a concern to some, but in the era of smartphones and internet calls, I doubt anyone really worries about it anymore.
I am trying to figure out how to use a background thread to execute a command ever 4hrs.
I have never created anything like this before so have only been reading about it so far.. One of the things I have read are this
"Threads tie up physical memory and critical system resources"
So in that case would it be a bad idead to have this thread that checkes the time then executes my method... or is there a better option, I have read about GCD (Grand Central Dispatch) but I am not sure if this is applicable as I think its more for concurrent requests? not something that repeats over and over again checking the time..
Or finally is there something I have completely missed where you can execute a request every 4hrs?
Any help would be greatly appreciated.
There is a max time background processes are allowed to run (10 min) which would make your approach difficult. Your next best attempt is to calculate the next event as save the times tamp somewhere. Then if the app is executed at or after that event it can carry out whatever action you want.
This might help:
http://www.audacious-software.com/2011/01/ios-background-processing-limits/
I think that it would be good to make use of a time stamp and post a notification for when the time reaches for hours from now.
Multithreading is not a good means to do this because essentially you would be running a loop for four hours eating clock cycles. Thanks to the magic of operating systems this would not eat up an entire core or anything silly like that however it would be continuously computed if it was allowed to run. This would be a vast waste of resources so it is not allowed. GCD was not really meant for this kind of thing. It was meant to allow for concurrency to smooth out UI interaction as well as complete tasks more efficiently, a 4hr loop would be inefficent. Think of concurrency as a tool for something like being able to interact with a table while its content is being loaded or changed. GCD blocks make this very easy when used correctly. GCD and other multithreading abilities give tools to do calculations in the background as well as interact with databases and deal with requests without ever affecting the users experience. Many people whom are much smarter then me have written exstensively on what multithreading/multitasking is and what it is good for. In a way posting a message for a time would be method of multitasking without the nastiness of constantly executing blocks through GCD to wait for the 4 hr time period, however it is possible to do this. You could execute a block that monitored for time less then the max length of a threads lifetime then when the threads execution is over dispatch it again until the desired time is achieved. This is a bad way of doing this. Post a notification to the notification center, its easy and will accomplish your goal without having to deal with the complexity of multithreading yourself.
You can post a notification request observing for a time change and it will return its note, however this requires you application be active or in the background. I can not guarantee the OS wont kill your application however if it is nice and quiet with a small memory footprint in "background" state its notification center request will remain active and function as intended.
Event sourcing is pitched as a bonus for a number of things, e.g. event history / audit trail, complete and consistent view regeneration, etc. Sounds great. I am a fan. But those are read-side implementation details, and you could accomplish the same by moving the event store completely to the read side as another subscriber.. so why not?
Here's some thoughts:
The views/denormalizers themselves don't care about an event store. They just handle events from the domain.
Moving the event store to the read side still gives you event history / audit
You can still regenerate your views from the event store. Except now it need not be a write model leak. Give him read model citizenship!
Here seems to be one technical argument for keeping it on the write side. This from Greg Young at http://codebetter.com/gregyoung/2010/02/20/why-use-event-sourcing/:
There are however some issues that exist with using something that is storing a snapshot of current state. The largest issue revolves around the fact that you have introduced two models to your data. You have an event model and a model representing current state.
The thing I find interesting about this is the term "snapshot", which more recently has become a distinguished term in event sourcing as well. Introducing an event store on the write side adds some overhead to loading aggregates. You can debate just how much overhead, but it's apparently a perceived or anticipated problem, since there is now the concept of loading aggregates from a snapshot and all events since the snapshot. So now we have... two models again. And not only that, but the snapshotting suggestions I've seen are intended to be implemented as an infrastructure leak, with a background process going over your entire data store to keep things performant.
And after a snapshot is taken, events before the snapshot become 100% useless from the write perspective, except... to rebuild the read side! That seems wrong.
Another performance related topic: file storage. Sometimes we need to attach large binary files to entities. Conceptually, sometimes these are associated with entities, but sometimes they ARE the entities. Putting these in the event store means you have to physically load that data each and every time you load the entity. That's bad enough, but imagine several or hundreds of these in a large aggregate. Every answer I have seen to this is to basically bite the bullet and pass a uri to the file. That is a cop-out, and undermines the distributed system.
Then there's maintenance. Rebuilding views requires a process involving the event store. So now every view maintenance task you ever write further binds your write model into using the event store.. forever.
Isn't the whole point of CQRS that the use cases around the read model and write model are fundamentally incompatible? So why should we put read model stuff on the write side, sacrificing flexibility and performance, and coupling them back up again. Why spend the time?
So all in all, I am confused. In all respects from where I sit, the event store makes more sense as a read model detail. You still achieve the many benefits of keeping an event store, but you don't over-abstract write side persistence, possibly reducing flexibility and performance. And you don't couple your read/write side back up by leaky abstractions and maintenance tasks.
So could someone please explain to me one or more compelling reasons to keep it on the write side? Or alternatively, why it should NOT go on the read side as a maintenance/reporting concern? Again, I'm not questioning the usefulness of the store. Just where it should go :)
This is a long dead question that someone pointed me to. There are quite a few reasons why its better to store events on the write side.
From my understanding the architecture you are talking about is a very common one that I see ... fail. We will store our domain model in a relational database then put out events. You add the twist of them saving the events on the read side in an event store. This will likely lead to a mess.
The first issue you will run into is in the publishing of your events. What happens when I save to the database and publish to say MSMQ (I die in the middle). So DTC gets introduced between them. This is a huge thing to bring in, distributed transactions should be avoided like the plague. It is also quite inefficient as I am probably making the data durable twice (once to queue once to database). This will limit system throughput by a lot (DTC benchmarks of 200-300 messages/second are common, with events only 20-30k/second is common).
Some work around the need for DTC by putting a table in their database that has the events and operates as a queue. This will avoid the need for DTC however this will still run into the next issue.
What happens when you have a bug? I know you would never write buggy code but one of the Jrs/maintenance developers later working with the project. As an example what happens when the domain object change and the event raised do not match? Say you set State on your domain object to "LA" (hardcoded) but you properly set State on the event to cmd.State ("CT").
How will you detect such errors are occurring? The biggest problem with what is being discussed is that there are now two sources of "truth" there is the database on the write side and the event stream coming out. There is no way to prove that they are equivalent. This will cause all sorts of weird bugs down the line.
I think this is really an excellent question. Treating your aggregate as a sequence of events is useful in its own right on the write side, making command retries and the like easier. But I agree that it seems upsetting to work to create your events, then have to make yet another model of your object for persistence if you need this snapshotting performance improvement.
A system where your aggregates only stored snapshots, but sent events to the read-model for projection into read models would I think be called "CQRS", just not "Event Sourcing". If you kept the events around for re-projection, I guess you'd have a system that was very much both.
But then wouldn't you have three definitions? One for persisting your aggregates, one for communicating state changes, and any number more for answering queries?
In such a system it would be tempting to start answering queries by loading your aggregates and asking them questions directly. While this isn't forbidden by any means, it does tend to start causing those aggregates to accrete functionality they might not otherwise need, not to mention complicating threading and transactions.
One reason for having the event store on the write-side might be for resolving concurrency issues before events become "facts" and get distributed/dispatched, e.g. through optimistic locking on committing to event streams. That way, on the write side you can make sure that concurrent "commits" to the same event stream (aggregate) are resolved, one of them gets through, the other one has to resolve the conflicts in a smart way through comparing events or propagating the conflict to the client, thus rejecting the command.
I'm quite impressed by what Lift 2.0 brings to the table with Actors and StatefulSnippets, etc, but I'm a little worried about the memory overhead of these things. My question is twofold:
How does Lift determine when to garbage collect state objects?
What does the memory footprint of a page request look like?
If a web crawler dances across the footprint of the site, are they going to be opening up enough state objects to drown out a modest VPS (512M)? The question is very obviously application dependent, but I'm curious if anyone has any real world figures they can throw out at me.
Lift stores state information in a session, so once the session is destroyed the state associated with that session goes away.
Within the session, Lift tracks each page that state is allocated for (e.g., mapping between an ajax button in the browser and a function on the server) and have a heart-beat from the browser. Functions for pages that have not seen the heartbeat in 10 minutes are unreferenced so the JVM can garbage collection them. All of this is tunable, so you can change heart-beat frequency, function lifespan, etc., but in practice the defaults work quite well.
In terms of session explosion, yeah... that's a minor issue. Popular sites (including http://demo.liftweb.net/ ) experience it. The example code (see http://github.com/lift/lift/tree/master/examples/example/ ) detects sessions that were created by a single request and then abandoned and expires those early. I'm running demo.liftweb.net with 256MB of heap size (that'd fit in a 512MB VPS) and occasionally, the session count rises over 1,000, but that's quickly tamped down for search engine traffic.
I think the question about memory footprint was once answered somewhere on the mailing list, but I can’t find it at the moment.
Garbage collection is done after some idle time. There is, however, an example on the wiki which uses some better heuristics to kill off sessions spawned by web crawlers.
Of course, for your own project it makes sense to check memory consumption with something like VisualVM while spawning a couple of sessions yourself.
I have been working at entering and sustaining periods of flow while working, and while researching the concept I came across this site which addressed the idea of sustaining flow in short bursts. The technique specifies that one sets a timer for 48 minutes in which they focus purely on their work, and when the timer runs out they spend 12 minutes doing whatever.
However, in the paragraph directly above that statement is a quote from Peopleware saying that it takes at least 15 uninterrupted minutes to enter a state of flow.
When reading that point, the 48 minute technique seemed counter-intuitive, since every 48 minutes you are "breaking" your flow, and once you start up again you have to spend 15 minutes going back into it, so you really only get (at a maximum) 33 minutes of "flow time". Obviously these quantities aren't necessarily rigid, but you get the idea.
My question is to those who have tried a timing technique similar to the one described. As I see it, the only justification for this technique is that it possibly reduces the amount of time it takes to re-enter that period of flow. Can anyone who has used this technique provide some clarification?
I've tried the Pomodoro Technique for a week or so. I didn't focus too much on the details of maintaining check lists of completed and interrupted sprints, but rather I started the day with the tasks that I wanted to do on that day, and went through them as best as I could using the 25-minute sprints and five-minute breaks.
The Pomodoro Technique suggests to use the breaks to relax, stretch your legs, grab a coffee, etc. I've noticed that activities like those that don't require a total focus shift and can be done on automatic don't break the flow, but rather helped me cleanse my mind and refocus my attention. It's as if you're allowing yourself to step away from the trees to survey the forest for a bit before diving back in.
I dropped Pomodoro after that short trial period because I found that in my case I got the most benefit from simply sitting down in the morning and writing down what I wanted to do on that day. Once the decision is made and the work is started I found that I fall into a natural rhythm of sorts. I work until I'm bored, frustrated, or interrupted, then I take a break and continue. I didn't find that time boxing my sprints added much value.
YMMV, but it's worth at least giving it a try.
A timer would be terrible for me, because it might go off and interrupt my train of thought at a critical moment.
For me, maximizing flow would begin by (1) first preventing forthcoming distractions (ie, making sure phone calls and other interrupts are disabled and that anything scheduled, like a meeting, is sufficiently far off there's no possibility of running into it), then (2) working on the problem at hand as long as I can, but all the while trying to recognize when I'm not making progress.
If I think I'm making progress, no matter how slowly, I just keep going as long as I can. But if I appear to be stuck or going in circles, I stop, and do something else instead, even if that something else is starting on a different problem. I let my subconscious dwell on the original problem, and once I have a feeling I feel fresh or curious enough to restart again, I do.