Framework recommendation needed on .NET or JAVA for Volunteer computing (internet nodes) - frameworks

I want client machines on internet who subscribe to my server to donate their idle cpu cycles.(Like SETI#Home)
They would take jobs(work-units) from server to process, and send back results to the server. (This is the most simple description). The framework i need should allow me to define a job/task. Rest of things like communication, job execution/tracking, client binaries update etc. should be managed by framework.
I evaluated Alchemi.NET a bit, but its not actively maintained, seems half-baked.
BOINC has API in C, but i want a .NET or JAVA framework.
I am looking at Manjrasoft's ANEKA , but it seems to work only for LAN clouds.
There must be some such frameworks available. I need expert recommendations!

I'm hardly an expert but I do have a little experience with distributed computing using MPI (with C). What you're talking about does not sound like grid computing, rather a master/slave system. That is your server is the master and it directs all the clients (slaves).
I know very little about .net programming so I'll speak in general.
There are a lot of web frame works out there and probably most have the facilities you'll need. That is your client will be able to upload files with the content they have gathered (or they could just use http get/post), because you don't have to worry about UI issues you can handle everything probably through one action (assuming an action based web framework). Then the server can return a response via JSON or XML which the client can use to take further direction on. JSON is the right choice if the system is very simple and probably a good choice for prototyping.
For the application upgrade I would consider this a separate issue (although it should be a simple matter for the server to return this to the client).

BOINC is the framework that most naturally meets your volunteer computing requirements, and it's stable and highly scalable -- I'd make sure you've considered it completely before ruling it out.
If you need to deliver something to a short deadline then I'd consider working up a simple supervisor (or scheduler) - worker pattern, just to get the application off the ground. The supervisor would be responsible for chunking up the data and making it available over http. Workers (your client app) would connect to a supervisor server; download a chunk of work; complete the chunk; and upload the results to the supervisor.
The main trick is to get the state machine thrashed out properly, so that you can accurately track what state each work chunk is in. I'd have the supervisor persist state in a database in the background.
Does your first release need to be internal, or is it for public consumption?
While that's working I'd get started on looking at getting up to speed with BOINC and planning a migration.

My recommendation
Work dist:
Have a receiver of requests, that places messages in a message queue, like rabbit mq
Have a host of workers, listening to the same queue, taking work from it and acking it when done.
When done, send a message on another queue, containing an URI to a known location, such as your network drive. The target is your parsed data.
The receiver listens to these "completed" messages. Fetches the data from the URI.
Done.
RabbitMQ comes with great CLR APIs.
The same reasoning works well with Microsoft Azure and their AbbFabric Queue. A plus is that is scales extremely well.
Hot Versioning
http://topshelf-project.com/
It gives a folder where you can drop binaries, which are then run. It manages versioning of these as well as running them as windows services.
Deployment
You can deploy the binaries with robocopy/xcopy and "net use Q: pwd \server\share", "net delete Q:"
Continuous Integration
Teamcity
After working with MsBuild extensively, I would recommend scripting it with psake and running the build with PowerShell. If you get advanced with PowerShell you also have WinRM available to you from your build scripts, which is really smooth.
Use the git/subversion commit number as the 0.0.0.x, x in the previous version number, and you will have automatic versioning that is "shared" across "Debug"/"Production" builds.
The Azure way
Work dist:
Same as above but with AppFabric Queue instead of RabbitMQ.
Hot Versioning
By swapping "Staging" and "Production" instances around, you avoid the downtime.
Deployment
You can either tap into the Azure Tools for Visual Studio's MsBuild tasks as can be read about here or you could use the PowerShell AzureSnapIns with a similar setup as above for Continuous Integration.
Continuous Integration
Same as above.

How about .net's ClickOnce installer to manage auto updated client binaries.
http://msdn.microsoft.com/en-us/library/t71a733d.aspx
I'm not sure of a "jobs framework" per-say, but Microsoft's Sync framework to support rolling your own jobs syncing with clients?
http://msdn.microsoft.com/en-us/sync/default

Related

Handling third-party API requests in End-to-End testing

I want to test my Rest API with end-to-end tests. As I understand, the difference between integration tests is that we don't do in-memory system configuration, but use real test DB and network requests.
But I can't understand how to handle third-party API requests(like GitHub or Bitbucket API).
Is it a normal practice to create a fake Github account with fake data that would be fetched by my tests ?
And what to do with access tokens, not all services are public and even public services can fail with rate limit.
Is it a normal practice to create a fake Github account with fake data that would be fetched by my tests ?
Yes. The purpose of an E2E test (vs an integration test) is to verify that the full system works with all the real system components in place, both the ones you control and the ones you don't. This can be hard to setup and a pain to maintain; but many of those pain points will be exposing real potential issues in your production service. How your service responds to that instability is itself a feature to be tested: does your system crash and burn, or does it gracefully present an error message and support good retry handling?
This also nets you a type of coverage that mocks cannot provide: If the third party API you're using is naughty and introduces some sort of breaking change, your E2E tests will catch it. This is a decent reason to continually run your E2E suite; not just during deploys.
The next level of this sort of testing is chaos engineering where not only do you test your production systems, but you purposefully introduce faults (yes, into prod) in order to ensure that your service can really handle the pressure.
And what to do with access tokens, not all services are public and even public services can fail with rate limit.
Your staging environment should be configured with separate sandbox accounts for external services. I'm not sure what you mean by "not all services are public" but just strive to keep your staging environment (or test users on prod) as identical to a real prod user as possible. For services that don't support multiple access tokens, you can get creative and try to clearly delineate your test data within their system.
Rate limits can be annoying, but if you're getting close enough that your tests push you over the limit, then you should be pursuing a strategy to address that anyways (negotiating with the service, getting multiple accounts, ...).
Running your tests against 3rd party services can result in slow and flaky tests when the service is down or when network latency triggers certain testing timeouts. Not to mention you run the risk of triggering API rate limits depending on the 3rd party service you're hitting. Your tests should ideally be deterministic, not failing randomly, and not needing conditional logic to handle errors within a particular test case. If you expect to need to handle errors, then there should be a specific test case to cover those errors that runs in every build, not waiting for non-deterministic failures to come in from the 3rd party.
One argument people will make is that your tests should notify you if the 3rd party API breaks for one reason or another. Generally speaking, though, most major 3rd party APIs are extremely stable and are unlikely to make breaking changes. Even if it does happen, this is an awkward and confusing way to find out that the API is broken, and in all likelihood, your tests aren't going to be the first place you hear it from. More likely your customers and your production error tracker will notify you. If you want to track when these services change or go down, it makes sense to have a regular production check of some sort to verify it.
As for how to write tests around these situations, that's a little more tricky. There are tools such as VCR in Ruby which work well for stubbing out your language's internet connections and allowing you to stub out, record, and customize responses (there's a list of similar implementations in other languages further down in their readme). That doesn't work for when your browser connects to those resources in automated end-to-end tests, though. There are tools that proxy your browser's web connection such as Puffing Billy in Ruby, but it's a pretty involved process to set up, including managing security certificates. This seems pretty brittle and hard to debug when something isn't working quite right.
Your best bet for writing tests that are deterministic and maintainable may be to fake out the service in test mode. thoughtbot has a pretty decent video on this and here's a high-level article from CircleCI. Essentially, you swap in an adapter in test mode that stands in for your 3rd party service integration. Maybe what you can do on your local machine is make it possible to optionally use the real service or the adapter via an environment variable in order to verify that the tests run the same against both. You could also set up a daily build to run against the real thing so that it would verify that the tests still work alright without introducing a lot of flakiness to your more frequent builds. One issue I've run into, though, is that even if I set up a test account on that 3rd party service, the results will change over time as I add or modify information for the sake of testing new functionality, such as adding new repos, modifying issues, etc. It requires additional consideration for maintaining your test account as a set of fixtures for all of your tests.
One additional tool I've come across that may be helpful is the likes of ngrok-tunnel (Ruby again). This is only relevant in cases where you need the 3rd party service to contact your app, since they can't send requests across the web to localhost:3000. If you've configured some sort of webhooks, services like this can make testing a lot more straightforward.

How do I correctly set up a publisher-subscriber architecture using MassTransit with MSMQ?

How do I correctly set up a publisher-subscriber architecture with multiple subscribers (which all receive a published message) using MassTransit and MSMQ.
Note that I do not want to use the MSMQ multicast feature as it is a bit flaky and relies on PGM (which has some restrictions of its own).
I have read this, this and this but still cannot figure out how I should setup
the subscription service,
a publisher,
and a couple of subscribers.
In particular the sbc.UseSubscriptionService("uri"); which is used in many examples is now obsolete (I am using MassTransit 2.7). The obsolete comment says The extension method on UseMsmq should be used instaed, but I cannot find such a method.
How are we supposed to setup up the subscription service?
The Distributor sample on the MassTransit Github page gets closest to what I want but it sets up consuming subscribers.
Please point me to or provide a example of how to setup a publisher-subscriber architecture where multiple subscribers are possible.
Thanks for your time.
To use the UseSubscriptionService extension method you need to import the MSMQ configuration namespace.
Imports MassTransit.Transports.Msmq.Configuration
you can now write this (VB.Net)
sbc.UseMsmq(Sub(c)
c.UseSubscriptionService(ConfigurationManager.AppSettings("MassTransit_SubscriptionService"))
End Sub)
That's for configuring the Service, but in order to make everything work correctly (using MSMQ) you need to have the MassTransit Runtime Service running (latest version available from GitHub MassTransit Runtime Services; you'll need this one instead of the one available from the binaries on the MassTransit website, those are outdated).
Once you have downloaded the source, you should run SetupSQLServer.sql first.
Next is adjusting the config file to point to your database and use the correct credentials.
You should now run this program (as console during dev, but best installed as windows service in non-dev environments). Make sure the console is before starting your publishers/subscribers since they depend on this 'service'.
I have just now realized a fully functional setup and experienced my moment of bliss. I intend to do a full write-up of all my steps, but I hope this helps a bit already.
This article explains the setup of the Runtimeservice in a little more detail.
I think "sets up consuming subscribers" is the give away as to the source of your difficulty - all subscribers are "consuming subscribers".
The best way to think about MassTransit is in terms of fan-out: MassTransit maintains routes to all consumers interested in certain type of message. You setup one or more consumer at one or more endpoints and MassTransit makes sure a copy of the message gets to each consumer.
Distributor is actually a specialized case where this is intentionally not true and not one you should be looking at, unless you're interested in load-balancing.
Here's the relevant docs link: http://docs.masstransit-project.com/en/master/overview/publishing.html#plain-msmq

Biztalk vs MSMQ

We need to send an XML messages between a point of sale system and a java webservice (outside of our network). the messages contain very sensitive data. The messaging has to be secure and transactional and highly available (24/7) with failover. The solution requires the developement of a broker that does the following:
Poll messages from the POS of system (3 types of messages)
do some transformation to the messages
forward part of the message to the java webservice
store part of the message in a database
notify the POS system of the result
Based on these somewhat simplified requirements, do you believe that Biztalk would be overkill? would MSMQ/WCF do the trick here?
Thank you for your help
Amine
IMO if you have the ability to receive and deliver messages asynchronously, then MSMQ (or other Message Oriented Middleware) would be an obvious choice for reliable, transactional transport, irrespective of the rest of the solution. MSMQ's journalling can also be used for audit and debugging purposes (but you will need a strategy for archiving the journal).
For the Polling, Routing, Mapping / Broker and Auditing requirements you then have the choice of BizTalk, other ESB and EAI products, or a DIY solution.
As you've suggested, it is difficult to justify the cost and learning curve of BizTalk on a single message exchange scenario such as this - you could probably knock up a .NET Windows Service (e.g. using WCF, Workflow Foundation, Transaction Scopes, some XSLT for mapping and a data access layer) in a few days.
However, if this isn't a one-off integration scenario and the need for additional integration arises (more applications to integrate, more services, additional listeners, different communications technologies etc), then it would be advisable for your company to take a long term view on EAI and ESB technologies. IMO the main challenge in integration isn't the initial development work, but is instead the ongoing operational management requirements - e.g. security, auditing, failover, monitoring, handling of bad messages and other exceptions - where products such as BizTalk are really worth the outlay.
Do you want to and have the bandwidth to develop, monitor, and maintain your own custom solution? If you don't mind doing that, then going the route of a custom .net-based, MSMQ/WCF solution might work well.
BizTalk will also cover all of the requirements you have listed. There is a learning curve but it is certainly not insurmountable. The initial ramp-up may be lengthier than would a custom-code solution, but there are considerable benefits, particularly the benefit of having all your requirements reliably met:
secure
transactional
reliable (messages aren't lost)
highly available (24/7)
failover
adapter architecture (includes polling adapters)
transformations
working with external web services
returning correlated responses back to the source system (i.e., orchestrating the end-to-end process)
use a broker (you specifically listed this, and BizTalk is a broker; custom MSMQ and WCF means using no broker)
If BizTalk needs to poll the POS system, then you do not need to worry about using MSMQ. BizTalk can handle transferring messages reliably (they're persisted to SQL Server, while MSMQ persists messages to disk).
Note too that the only way to make MSMQ highly available is to cluster it. So either way you'll need to cluster something.
A BizTalk solution will be easier to maintain over time, particularly if you just want to update your transformations. With versioning you can do so in a way that doesn't require downtime. It'll be tough to update a custom solution without downtime.
Some people have had difficulty in the past with monitoring BizTalk for failed messages, but I have found it to be easier, especially with a tool like SCOM or BizTalk 360, than trying to monitor message queues, which often requires even more custom work to monitor. Just make sure to include monitoring in your cost estimates for the life of your solution.
If you do need auditing, then BizTalk also has you covered. MSMQ Journaling will keep a copy of each message for you, but without significant transaction details and with no out-of-the-box way to search through or archive the data.
Building your own .NET client code to work with a Java web service will likely take a good bit of work regardless of which way you go. With BizTalk that means running a wizard against the endpoint or against the WSDL. With WCF it means doing everything by hand or with the assistance of the svcutil tool.
You should go with MSMQ transporting either way.
If you use MSMQ from .NET you should know its limitation: 4 MB on a message size.
BizTalk on the other hand has MSMQ adapter which overcomes this limitation (if a second BizTalk server listen on the other side of the channel).On top of that BizTalk gives you features like: easy configurable message tracking, visual transformation maps. It can be set up in cluster too (Ent. version only).
But the question is can you (or do you want) afford biztalk licenses and hardware for it servers (it's slower then custom .net solution).

Where to host a queue in a rackspace cloud deployment

I am working on an application that consists out of a asp.net mvc front-end which call a bunch of webservices and those call onto sql server. The actions on the front-end could lead to a very large amount of jobs that need execution which I want to queue somewhere.
Due to the expected load profile of the application it makes sense to use a scalable infrastructure like the rackspace cloud. Now I am wondering where it would be best to queue the jobs. Queueing them on the front-end server means that the number of front-end servers can only be scalled back down once the queues are processed which is a waste of resources if the peak load on the front-end is over we want to scale that down and scale up on machines that process the queue items.
If we queue them on the database server we are adding the load onto a single machine which in the current setup is already the most likely botleneck. How would you design this?
You should read up on event sourcing and CQRS - in particular Greg Youngs 6hr presentation (http://www.viddler.com/v/dc528842) - it's aim is to alleviate the burden of these sorts of issues in a tried and tested way.
hth

Does anyone have any architectural information for MSMQ?

I'm trying to decide if MSMQ is the right tool for communication between our application and a third party web service we are currently communicating with directly. We're looking to uncouple this such that if the service goes down, life could still go on as normal.
I can't find anything outside the usual MS fluff about it being the greatest thing and solving all your problems etc etc. It would be really useful if I could find some information that was somewhere between marketing fluff and API - like an architecture diagram of the components and how they integrate with each other, and more importantly how I integrate with them.
It's probably that I'm just looking for the information in the wrong places, so if someone could point me in the right direction, I'd appreciate it.
A typical MSMQ architecture would be composed of 3 parts...
Message Queue - This would be on one of your servers. You would have to install the MSMQ bits and create your queue.
Client - Your client would insert messages into the queue. I'm assuming you're using .NET. If so, most of what you want is going to be located in the System.Messaging namespace.
Windows Service - This would also run on a server, probably the same server as your queue. Its job would be to watch the queue, process messages as they come in, handle making sure the external service is available, and probably do some logging.
Here's an article that should go into a little more detail and give you some code samples.
MSMQ is a implementation of a message queue as are websphere mq and a bunch of other systems. When looking into the concepts and high level architecture I would suggest reading up on message queue's and how they are applied in disconnected scenario's. I can highly recommend Patterns of Enterprise Application Architecture. For specific examples on msmq check out Pro MSMQ: Microsoft Message Queue Programming it doesn't contain allot of special information but it does group it allot better then most resources available on the internet. This Hello World with MSMQ article would give you a nice overview of what it entails and it's easily executed on a development system.
If you are calling a remote web service from your application, it makes sense to use a queue to decouple your application processing from the remote system. By abstracting the communicating through messaging and having a gateway service that is responsible for communication to the web service, you isolate your application from the latency of the web service and build fault tolerance into the design by reducing the request/response usage inside your application (since messaging is by default asynchronous - you deal with it up front).
There are frameworks for .NET that can make this much easier (such as MassTransit or NServiceBus).
You can also check out SOA Patterns (by Arnon Rotem-Gal-Oz, Manning Press, in MEAP) and Enterprise Integration Patterns (Hohpe,Woolf), the latter of which is an essential read for anyone building a message-based system.