NServiceBus Saga Not Found - nservicebus-sagas

I have vs 2015, NServiceBus 4.6.0.0, RavenDb 2.5.
I deployed a new release of my software and 1 saga is experiencing saga not found errors on at least two messages associated with the saga. When we rollback the release the messages are reprocessed without error.
I have used some c# code to access the saga data and that works fine. There are 600+ sagas in our production RavenDb. So I believe that rules out RavenDb as the problem and eliminates any changes to the saga object tree contributing to this issue.
so it looks like we are back at NServiceBus. I am at loss of what would cause these saga not found errors.
Any help would be greatly appreciated.

Related

Is it possible to scale Axon Framework without Axon Server Enterprise

Is it possible to scale Axon Framework without Axon Server Enterprise? I'm interested in creating a prototype CQRS app with Axon, but the final, deployable system has to be be free from licensing fees. If Axon Framework can't be scaled to half a dozen nodes using free software, then I should probably look elsewhere.
If Axon Framework turn out not to be a good choice for the system, what would you recommend? Would building something around Apache Pulsar be a sensible alternative?
I think I have good news for you then.
You can utilize Axon Framework perfectly fine without Axon Server Enterprise.
Firstly, you can use the Axon Server Standard edition, which is completely free and you can check out the code too if you want.
If you prefer to get infrastructure back in your own hands, you can also select different approaches to distributing the CommandBus and the EventBus/EventStore.
For the CommandBus the framework provides the DistributedCommandBus for which two implementation are in place, being:
JGroups
Spring Cloud
I'd argue option 2 is the most ideal for distributing your commands, as it gives you the freedom to choose whichever Spring Cloud Discovery Service implementation you desire. That should give you the handles to work "free of licenses" in that area.
For distributing your Events, you can broadly look at two approaches:
Share the database, aka your EventStore, among all instances
Use a event message bus to distribute your event messages
If you want instances of your nodes to be able to Event Source the Command Model, you are inclined to use option 1. This is required as Axon Framework requires a dedicated EventStore to be able to source the Command Models from history.
When you just want to connect other applications to your Event Stream, option 2 would be a good fit. Again, the framework has two options in this area:
AMQP
Kafka
The only thing I'd like to point out on this part additionally is that the Kafka extension is still in a release candidate state. It is being worked on actively at the moment though.
All these extensions and their options should be clearly stated in the Reference Guide, so I'd definitely check this documentation out if you are gonna start an application.
There is a sole thing you cannot distribute though, which is the QueryBus.
There is an outstanding issue to resolve this, for which someone put the effort in to provide a PR.
After seeing how Axon Server Standard edition does it though, he intentionally closed the PR (with the following comment) as it didn't seem feasible to him to maintain such a tool at this stage.
So, can you use Axon Framework without Axon Server Enterprise?
Well, I think you can. :-)
Mind you though, although you'd be winning on not having a license fee if you don't use Axon Server Enterprise, it doesn't mean your production is gonna be free.
You'd be introducing back quite some infrastructure set up and production time to get this going.
Hope this gives you plenty of feedback #ahoffer!

What are the concerns of using Crate.io as a primary datastore?

If I am correct, crate (crate.io) is backed by Elasticsearch (Lucene). Weren't there a few articles a month ago that said that ES lost some writes under heavy load? Are there any other concerns?
You are right, Crate is backed by Elasticsearch. We think that the guys at elasticsearch are doing a great job on improving data consistency. A good read is http://www.elasticsearch.org/blog/resiliency-elasticsearch/ which gives a pretty good overview about efforts towards reliability. We at Crate are confident that this storage engine is safe to use as primary store. We also see, that issues regarding this area are getting actively worked on by the Lucene and Elasticsearch Community.
I am currently evaluating Crate.io as a primary datastore for work. As the above answer is vague and unspecific, maybe it's time for an update on this question here. There is a Dec.2016 keynote presentation out there from the Jepsen author Kyle Kingsbury on Youtube who investigated Crate.io for some problems with the resiliency in Elasticsearch. The first 8 minutes are introduction, the Crate.io part is from 23:50 till 31:10.
For those of you who don't want to watch the full video, here is a short summary.
First, the test setup. They set up databases and a random pattern of clients with random queries. Also, they voluntarily introduced problems for the databases, like network partitioning. Secondly, the results. According to Kingsbury, there are two issues with ES resiliency. Both of them carry on to Crate.io. Let's get to the details...
Dirty reads
The first one - ES #20031 - is that ES may cause dirty reads, divergence and lost updates if network partitions occur. As of now - December 2017 - this issue is still open. In my opinion, it is possible for the same problems to occur if a node is unresponsive for extremely heavy duty, like during extensive querying, reindexing or garbage collection.
Lost updates
According to Kingsbury, there is another problem ("Can promote stale binaries") with ES that cause updates to get completely lost when network partitioning occurs. It has been tagged as #20384 and there is kind of a fix which Kingsbury summarizes as "partial". So, ES may still cause lost data upon writing.
What does ES say?
On the official site of ES about resiliency, only one of the two problems - #20384 - is mentioned. It has been marked as solved in the version 5.0 release notes, although the official site says that there is only a partial fix.
What does Crate.io say?
On the Crate.io documentation on resiliency, there is a list of known problems with Crate.io resiliency. The ES bug #20384 is commented as partially fixed and still causing an open problem. The ES bug #20031 is not mentioned. However, there is a paragraph about an issue with networking partition which Crate.io marked as fixed - so the official page is kind of inconclusive here.
Conclusion
Kingsbury concluded in December 2016 that Crate.io should not be used as the primary data store. It could be used of course as a replication of your primary data to benefit from the time series database features that Crate.io offers. He also suggests that for machine data where 5% data loss is not a severe problem, Crate.io is a viable option as primary store.
It is my impression that some bugs Kingsbury reported may have been fixed but not all.

Adding SQL Azure Error codes that are retried

We're using v5.1.1212 of the NuGet package for Enterprise Library Transient Fault Handling along with EntityFramework 5 in our Net 4.5.1 application. Overall it works well for us with SQL Azure, however I would like to add a couple more SQL Azure error codes to the list that are considered retriable. Is that possible?
We see enough -1, -2 and 10054 errors from SQL Azure that I am comfortable retrying them. I realize the general guidance is not to retry the -2 errors (not sure on -1 and 10054) but the quantities we seem them in I feel it would benefit our app. Any idea how I might do this?
You can define a custom detection strategy.
Alternatively, post your suggested codes to the Issue tracker on https://topaz.codeplex.com/ and someone from the project team or the community (yes, the project now accepts community contributions) will update the block.

Workflow 4 tracking records viewer

there was an application called 'WorkflowMonitor' that was included with the samples kit for workflow 3 which gave you a visual playback through previously run workflows.
The tracking records that app works against appear to be a different shape to those in workflow 4, is there a similar viewer that anyone knows of that can give me an insight into previously run workflows in workflow 4?
I am really just looking for the best way to interpret the data, the Workflow Monitor would have been perfect, but appears to be incompatible now.
Thanks,
Dave.
I know near to nothing about WF3 but, based on your request, you might want to start by downloading this WCF/WF Examples package.
Take a look at WF\Application\VisualWorkflowTracking solution to see a visual tracking system in action.
See also the concept of Workflow Tracking Participants on WF4, on these links:
Workflow Tracking and Tracing
Tracking Participants in .NET 4 Beta 1
A small introduction from the first link:
Windows Workflow tracking is a .NET Framework version 4 feature
designed to provide visibility into workflow execution. It provides a
tracking infrastructure to track the execution of a workflow instance.
The WF tracking infrastructure transparently instruments a workflow to
emit records reflecting key events during the execution. This
functionality is available by default for any .NET Framework 4
workflow.
The examples package contains a bunch of example code about tracking on WF_WCF_Samples\WF\Basic\Tracking folder.
#Jota's answer is a good one, you should look at those examples. The visual tracking example is kind of a mess though. A few of us of done some different variations that separate the running of the workflow from the viewing of the tracking data.
http://geekswithblogs.net/JoshReuben/archive/2011/06/07/workflow-4.0.1-statemachine—distributed-tracking-visualization.aspx
https://github.com/PeteGoo/Workflow-Service-Tracking-Viewer
and my own version with signalR but based on the visual tracking sample
http://panmanphil.wordpress.com/2012/11/05/slides-and-sample-from-the-chippewa-valley-code-camp/
Looks like you have some reading on your hands.

Please confirm: Is Windows Workflow Foundation a good horse to be backing right now?

We are in the process of selecting a workflow solution for a company that uses Microsoft products end to end. Given the news on WF4, in that it seems to be essentially a rewrite of previous versions, is it a wise move to back the current version or should we be looking elsewhere?
Ie - is the current version so bad that we would not be wise to try and use it?
Haiving just launched a project which .NET 3.5 and workflow I'd say that the current release of WF is good enough to use and run with. It has helped us to get a product out quickly (we have the usual feature creep and requirements changing weekly). However, I have a list of complaints with it:
The workflow designer will drive you insane because it is so slow (in certain circumstances) and re-arranges your state machines as it sees fit.
There is no built in upgrade strategy for keeping your old workflows running once you do a bug fix release. If you are going to use WF think carefully how to do upgrades early.
Itegrating with WCF (the send and recieve activity) hide the WorkflowRuntime from you this makes it very difficult to understand what is going on on the hood.
Its not easy to unit test them. There are ideas out there but none seemed particulary easy when we started this WorkFlow Unit Testing
I like the ideas and potential of Workflow based development, however I am not in a hurry to repeat this experience and would probably stick without it for long running processes. One place I would use it again would be in a short, complicated process (like a rules engine for working out prices).
Maybe it is a little late for you, but now that WF 4.0 is released in beta, other people thinking the same question can consider backing the 4.0 horse instead of 3.5 horse.
This goes some way to fixing the following problems:
•The workflow designer will drive you insane because it is so slow (in certain circumstances) and re-arranges your state machines as it sees fit.
[Designer Perf Improved]
•Its not easy to unit test them. There are ideas out there but none seemed particulary easy when we started this WorkFlow Unit Testing
[I think it's a little easier now, some of the introduction to workflow samples include plenty of unit testing]
My understanding is that Microsoft will provide backwards compatibility and/or a migration strategy to the new WF, so I would guess that you are safe to use it. However, I have heard from other developers in my organization that the current version of WF is extremely painful to use. If you have the budget (and depending on the complexity of your workflows), you may want to consider K2: http://www.k2.com/en/index.aspx
I, as a workflow developer, think that current version is painful to use. This is not surprising as this is a v1.0 software out from microsoft :)
I think you should first consider your expectations from a workflow software. Do you have a well defined list of expectations from WF? Acutally I am wondering content of such a list. Maybe we can help more detailed on each topic.
I don't know why people have such negative impressions about WF. Sure it has it drawbacks, but I thought it was pretty useful. The one major issue I have about it is the lack of support for upgrading existing workflow (bullent #2 in gbanfill's list).
Another point to use the current version is that "Dublin" (Microsoft new App Server) will be built on WCF & WF .NET 4.0 but will gladly host 3.5 WF's. So you will be able to migrate to that without a rewrite.
Just a quick note to mention that Visual Studio 2010 CTP contains a new updated WF designer as part of the Oslo objective.