Event store and optimistic concurrency - cqrs

Greg Young in his document on CQRS in the "Building an event storage" section, when writing events to the event store he checked for optimistic concurrency. I do not really get why he made that check, can anyone explain to me with a concrete example.

I do not really get why he made that check, can anyone explain to me with a concrete example.
Event stores are supposed to be persistent, in the sense that once you write an event, it will be visible to every subsequent read. So every action in the database should be an append. A useful mental model is to think of a singly linked list.
If a database is going to support more than one thread of execution having write access, then you are faced with the "lost update" problem. Drawn as linked lists, this might look like:
Thread(1) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(2) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(1) set(/x, [ ... <- 69726c3e <- 709726c3 <- /x.tail])
Thread(2) set(/x, [ ... <- 69726c3e <- 83b97195 <- /x.tail])
The history written by thread(2) doesn't include event:709726c3 recorded by thread(1). Thus "lost update".
In a generic database, you typically manage this with transactions: some magic under the covers keeps track of all of your data dependencies, and if the preconditions don't hold when you try to commit your transaction, all of your work is rejected.
But event stores don't use need all of the degrees of freedom that support the general case -- edits to events stored in the database are forbidden, as is changing the dependencies between events.
The only mutable part of the change - which is the only place where we replace overwrite an old value with a new value - is when we change /x.tail
Thread(1) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(2) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(1) set(/x, [ ... <- 69726c3e <- 709726c3 <- /x.tail])
Thread(2) set(/x, [ ... <- 69726c3e <- 83b97195 <- /x.tail])
The problem here is simply that Thread(2) thought 6 <- /x.tail was true, and replaced it with a value that lost event 7. If we change our write from a set to a compare-and-set...
Thread(1) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(2) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(1) compare-and-set(/x, 69726c3e <- /x.tail, [ ... <- 69726c3e <- 709726c3 <- /x.tail])
Thread(2) compare-and-set(/x, 69726c3e <- /x.tail, [ ... <- 69726c3e <- 83b97195 <- /x.tail]) // FAILS
then the data store can detect the conflict and reject the invalid write.
Of course, if the data store sees the actions of the threads in a different order, then
the command that fails could change
Thread(1) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(2) [... <- 69726c3e <- /x.tail] = get(/x)
Thread(2) compare-and-set(/x, 69726c3e <- /x.tail, [ ... <- 69726c3e <- 83b97195 <- /x.tail])
Thread(1) compare-and-set(/x, 69726c3e <- /x.tail, [ ... <- 69726c3e <- 709726c3 <- /x.tail]) // FAILS
Put more simply, where set gives us "last writer wins" semantics, compare-and-set gives us "first writer wins", which eliminates the lost update concern.

TLDR; This concurrency check is needed because what events are emitted depends on the previous events. So, if there are other events that are emitted concurrently by another process then the decision must be re-made.
The way that an Event store is used is like this:
The old events are loaded from the Eventstream (=a partition in the Eventstore that contains all the events that were generated by an Aggregate instance)
The old events are processed/applied by the Aggregate that owns them in the order they were generated
The Aggregate, based on the internal state that was build from those events, decides to emit some new events
These new events are appended to the Eventstream
So, step 3 depends on the previous events that were generated before this command is executed.
If some events generated in parallel by another process are appended to the same Eventstream then it means that the decision that was made was based on a false premise and thus must be re-taken by repeating from step 1.

Related

ZIO : How to compute only once?

I am using ZIO: https://github.com/zio/zio
in my build.sbt:
"dev.zio" %% "zio" % "1.0.0-RC9"
No matter what I tried, my results are always being computed each time I need them:
val t = Task {
println(s"Compute")
12
}
val r = unsafeRun(for {
tt1 <- t
tt2 <- t
} yield {
tt1 + tt2
})
println(r)
For this example, the log look like :
Compute
Compute
24
I tried with Promise:
val p = for {
p <- Promise.make[Nothing, Int]
_ <- p.succeed {
println(s"Compute - P")
48
}
r <- p.await
} yield {
r
}
val r = unsafeRun(for {
tt1 <- p
tt2 <- p
} yield {
tt1 + tt2
})
And I get the same issue:
Compute - P
Compute - P
96
I tried with
val p = for {
p <- Promise.make[Nothing, Int]
_ <- p.succeed(48)
r <- p.await
} yield {
println(s"Compute - P")
r
}
first and I was thinking that maybe the pipeline is executed but not the value recomputed but I does not work either.
I would like to be able to compute asynchronously my values and be able to reuse them.
I looked at How do I make a Scalaz ZIO lazy? but it does not work for me either.
ZIO has memoize, which should do essentially what you want. I don't have a way to test it just now, but it should work something like:
for {
memoized <- t.memoize
tt1 <- memoized
tt2 <- memoized
} yield tt1 + tt2
Note that unless the second and third lines of your real code have some branching that might result in the Task never getting called, or getting called only once, this yields the same answer and side effects as the much simpler:
t flatMap {tt => tt + tt}
Does computing the results have side effects? If it doesn't you can just use a regular old lazy val, perhaps lifted into ZIO.
lazy val results = computeResults()
val resultsIO = ZIO.succeedLazy(results)
If it does have side effects, you can't really cache the results because that wouldn't be referentially transparent, which is the whole point of ZIO.
What you'll probably have to do is flatMap on your compute Task and write the rest of your program which needs the result of that computation inside that call to flatMap, threading the result value as a parameter through your function calls where necessary.
val compute = Task {
println(s"Compute")
12
}
compute.flatMap { result =>
// the rest of your program
}

Possible bug in the function hclust() of R-Project

Hi my frinds the observation is the following. I don't know what the problem is.
When I am making clusters with the hclust function, the labels of the object that it creates are lost if the way I subset the data frame is "incorrect".
This is the data frame.
set.seed(1234)
x <- rnorm(12,mean=rep(1:3,each=4),sd=0.2)
y <- rnorm(12,mean=rep(c(1,2,1),each=4),sd=0.2)
z <- as.factor(sample(c("A","B"),12,replace=T))
df <- data.frame(x=x,y=y,z=z)
plot(df$x,df$y,col=z,pch=19,cex=2)
This chunck of code returns NULL for the labels.
df1 <- df[c("x","y")]
d <- dist(df1)
cluster <- hclust(d)
cluster$labels #NULL
This chunck of code returns NULL as well.
df2 <- df[,1:2]
d <- dist(df2)
cluster <- hclust(d)
cluster$labels #NULL
This chunck of code does not return NULL.
df3 <- df[1:12,1:2]
d <- dist(df3)
cluster <- hclust(d)
cluster$labels #Character Vector
This has represented a problem for me because I have some codes that uses this information.
As you can see, the data frames are identical.
identical(df1, df2) #True
identical(df1, df3) #True
identical(df2, df3) #True

For comprehension - execute futures in order

If I have the following for comprehension, futures will be executed in order: f1, f2, f3:
val f = for {
r1 <- f1
r2 <- f2(r1)
r3 <- f3(r2)
} yield r3
For this one however, all the futures are started at the same time:
val f = for {
r1 <- f1
r2 <- f2
r3 <- f3
} yield ...
How can I enforce the order?(I want this order of execution f1, f2, f3)
It does matter what f1, f2, f3 are: a future will start executing a soon as it is created. In your first case, f2(r1) must be a function returning a future, so the future begins executing when the function is called, which happens when r1 becomes available.
If the second case is the same (f2 is a function), then behavior will be the same as the first case, your futures will be executed sequentially, one after the other.
But if you create the futures outside the for, and just assign them to variables f1, f2, f3, then by the time you get inside the comprehension, they are already running.
Future are eager constructs, that is, once created you can not dictate when they get processed. If the Future already exists when you attempt to use it in a for-comprehension, you've already lost the ability to sequence it's execution order.
If you want to enforce ordering on a method that accepts Future arguments then you'll need to wrap the evaluation in a thunk:
def foo(ft: => Future[Thing], f2: => Future[Thing]): Future[Other] = for{
r1 <- ft
r2 <- f2
} yield something(r1, r2)
If, on the other hand, you want to define the Future within a method body, then instead of val use a def
def foo() ={
def f1 = Future{ code here... }
def f2 = Future{ code here... }
for{
r1 <- f1
r2 <- f2
} yield something(r1, r2)
Executing futures in for comprehension is default behavior. It is good when few tasks are processed parrallel without any blocking.
But if you want to preserve procecessing order you have to ways:
Send result of first task to second like in your example
use andThen operator
val allposts = mutable.Set[String]()
Future {
session.getRecentPosts
} andThen {
posts => allposts ++= posts
} andThen {
posts =>
clearAll()
for (post <- allposts) render(post)
}

Scala error "value map is not a member of Double"

Explanation: I accepted gzm0's answer because it rocked!
#Eduardo did come in with a comment suggesting:
(for(i <- 10..20; j=runTest(i)) yield i -> j).toMap
which also lets me run build, he just never posted an answer and #gzm0 answer was conceptually AWESOME so I accepted it.
Once I get this other issue figured out relating to "can't call constructor" I will be able to test these out by actually running the program LOL
Question: I have an error in this expression, specifically how to fix it but more generally what am I missing about FP or Scala to make this mistake?
timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
I am Writing my first Gradle/Scala project for a Analysis of Algorithms assignment. Scala is not part of the assignment so I am not using a homework tag. Except for my work with Spark in Java, I am brand new to Functional Programming I am sure that's the problem.
Here's a snippet, the full .scala file is on GitHub, is that OK or I will post the full program here if I get flailed :)
val powersList = List(10 to 20)
// create a map to keep track of power of two and resulting timings
var timingsMap: Map[Integer, Double] = Map()
// call the runTest function once for each power of two we want, from powersList,
// assign timingsMap the power of 2 value and results of runTest for that tree size
timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
The error is:
/home/jim/workspace/Scala/RedBlackTree4150/src/main/scala/Main.scala:36: value map is not a member of Double
timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
What I think I am doing in that timingsMap = ... line is get all the elements of powersList mapped onto i for each iteration of the loop, and the return value for runTest(i) mapped onto j for each iteration, and then taking all those pairs and putting them into timingsMap. Is it the way I am trying to use i in the loop to call runTest(i) that causes the problem?
runTest looks like this:
def runTest(powerOfTwo: Range): Double = {
// create the tree here
var tree = new TreeMap[Int, Double]
// we only care to create a tree with integer keys, not what the value is
for (x <- powerOfTwo) {
// set next entry in map to key, random number
tree += (x -> math.random)
}
stopWatchInst.start()
// now go through and look up all the values in the tree,
for (x <- powerOfTwo) {
// get the value, don't take the time/overhead to store it in a var, as we don't need it
tree.get(x)
}
// stop watch check time report and return time
stopWatchInst.stop()
val totalTime = stopWatchInst.elapsed(TimeUnit.MILLISECONDS)
loggerInst.info("run for 2 to the power of " + powerOfTwo + " took " + totalTime + " ms")
return totalTime
}
Note: I've had a suggestions to change the j <- to = in j <- in this line: timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
Another suggestion didn't like using yield at all and suggested replacing with (10 to 20).map...
The strange part is the existing code does not show an error in the IntelliJ editor, only breaks when I run it. The suggestions all give type mismatch errors in the IDE. I am really trying to figure out conceptually what I am doing wrong, thanks for any help! (and of course I need to get it to work!)
After trying gzm0 answer I am getting down the same road ... my code as presented doesn't show any type mismatches until I use gradle run ... whereas when I make the suggested changes it starts to give me errors right in the IDE ... but keep em coming! Here's the latest error based on gzm0s answer:
/home/jim/workspace/Scala/RedBlackTree4150/src/main/scala/Main.scala:37: type mismatch;
found : List[(scala.collection.immutable.Range.Inclusive, Double)]
required: Map[Integer,Double]
timingsMap = for (i <- powersList) yield i -> runTest(i)
You want:
for (i <- powersList)
yield i -> runTest(i)
The result of runTest is not a list, therefore you can't give it to the for statement. The reason that you get a bit of a strange error message, is due to how your for is desugared:
for (i <- powersList; j <- runTest(i)) yield i -> j
// turns into
powersList.flatMap { i => runTest(i).map { j => i -> j } }
However, the result of runTest(i) is a Double, which doesn't have a map method. So looking at the desugaring, the error message actually makes sense.
Note that my point about runTest's result not being a list is not really correct: Anything that has a map method that will allow the above statement (i.e. taking some kind of lambda) will do. However, this is beyond the scope of this answer.
We now have successfully created a list of tuples (since powersList is a List, the result of the for loop is a List as well). However, we want a Map. Luckily, you can call toMap on Lists that contain tuples to convert them into a Map:
val tups = for (i <- powersList) yield i -> runTest(i)
timingsMap = tups.toMap
A note aside: If you really want to keep the j inside the for-loop, you can use the equals sign instead of the left arrow:
for (i <- powersList; j = runTest(i)) yield i -> j
This will turn into:
powersList.map { i =>
val j = runTest(i)
i -> j
}
Which is what you want.

How to select just one first or last record compliant to a where clause with ScalaQuery?

Having the following query template to select all:
val q = for {
a <- Parameters[Int]
b <- Parameters[Int]
t <- T if t.a == a && t.b == b
_ <- Query.orderBy(t.c, t.d)
} yield t
I need to modify it to select the very first (with minimum c and d minimum for this c) or the very last (with maximum c and d maximum for this c) record of those matching the where condition. I'd usually strongly prefer no other (than the last/first) records to be selected as there are hundreds thousands of them...
There's a potential danger here in how the OP's query is currently constructed. Run as is, getting the first or last result of a 100K result set is not terribly efficient (unlikely, yes, but the point is, the query places no limit on number of number of rows returned)
With straight SQL you would never do such a thing; instead you would tack on a LIMIT 1
In ScalaQuery, LIMIT = take(n), so add take(1) to get a single record returned from the query itself
val q = (for {
a <- Parameters[Int]
b <- Parameters[Int]
t <- T if t.a == a && t.b == b
_ <- Query.orderBy(t.c, t.d)
} yield t) take(1)
q.firstOption
There is method firstOption defined on the Invoker trait, and by some magic it is available on the Query class. So maybe you can try it like this:
val q = for {
a <- Parameters[Int]
b <- Parameters[Int]
t <- T if t.a == a && t.b == b
_ <- Query.orderBy(t.c, t.d)
} yield t
q.firstOption