Biztalk - how do I set up MSMQ load balancing and high availability? - msmq

From what I understand, in order to achieve MSMQ load-balancing, one must use a technology such as NLB.
And in order to achieve MSMQ high-availability, one must cluster the related Biztalk Host (and hence the underlying servers have to be in a cluster themselves).
Yet, according to Microsoft Documentation, NLB and FailOver Clustering technologies are not compatible. See this link for reference: http://support.microsoft.com/kb/235305
Can anyone PLEASE explain to me how MSMQ load-balancing and high-availability can be achieved?
thank you in advance,
M

I've edited my original answer because on reflection, I think I was talking nonsense.
I don't believe that it is possible to achieve both load balancing and high availability in a BizTalk transactional scenario. Have a look at the section "Migration considerations for moving from MSMQ/T to MSMQ adapter in BizTalk 2006" on the following site http://blogs.msdn.com/eldarm/
To summarise that post, there are a couple of scenarios:
High Availability (Non-transactional)
You simply have MSMQ on more than one BizTalk server behind NLB
High Availability (Transactional)
For this you need to have a clustered MSMQ host, which means that you can't do any sort of load balancing upon a single queue.
One possible halfway solution is to create two MSMQ adapters, on different clustered hosts, each handling different queues. Doesn't sound too nice to me though.
A key point is understanding the reasons why you would want transactional, clustered behaviour - you need this for ordered delivery and to ensure no duplicates.
In general I wouldn't go to the trouble of load balancing MSMQ - BizTalk itself is load balanced once messages have reached the MessageBox database. While it is true that you will see asymmetric load due to the queue processing happening on one machine, in the overall context of your BizTalk environment this should not be significant.
Again, it is worth remembering that you are clustering MSMQ for reasons beyond simple high availability:
MSMQ adapter receive handler - MSMQ does not support remote
transactional reads; only local
transactional reads are supported. The
MSMQ adapter receive handler must run
in a host instance that is local to
the clustered MSMQ service in order to
complete local transactional reads
with the MSMQ adapter.
That was from the following MSDN page.
I hope this edited answer helps - I don't think it was what you were after, maybe I'm wrong and you'll find a workable solution for NLB and transactional MSMQ, but the more I think about it the more it seems that the two scenarios are not compatible.
A final thought is that you could try posting a similar question on Server Fault - you get a few BizTalk devs on Stack Overflow, including at least two MVPs, but at least where I work this is that sort of question I'd be passing on to my networking team.

Related

Distributed systems with large number of different types of jobs

I want to create a distributed system that can support around 10,000 different types of jobs. One single machine can host only 500 such jobs, as each job needs some data to be pre-loaded into memory, which can't be kept in a cache. Each job must have redundancy for availability.
I had explored open-source libraries like zookeeper, hadoop, but none solves my problem.
The easiest solution that I can think of, is to maintain a map of job type, with its hosted machine. But how can I support dynamic allocation of job type on my fleet? How to handle machine failures, to make sure that each job type must be available on atleast 1 machine, at any point of time.
Based on the answers that you mentioned in the comments, I propose you to go for a MQ-based (Message Queue) architecture. What I propose in this answer is to:
Get the input from users and push them into a distributed message queue. It means that you should set up a message queue (Such as ActiveMQ or RabbitMQ) on several servers. This MQ technology, helps you to replicate the input requests for fault tolerance issues. It also provides a full end-to-end asynchronous system.
After preparing this MQ layer, you can setup you computing servers layers. This means that some computing servers (~20 servers in your case) will read the requests from the message queue and start a job based on the request. Because this MQ is distributed, you can make sure that a good level of load balancing can happen in your computing servers. In addition, each server is capable of running as much as jobs that you want (~500 in your case) based on the requests that it reads from the MQ.
Regarding the failures, the computing servers may only pop from the MQ, if and only if the job is completed. If one server is crashing, the job is still in the MQ and another server can work on it. If the job is saving some state somewhere or updates something, you should manage its duplicate run then.
The good point about this approach is that it is very salable. It means that if in future you have more jobs to handle, by adding a computing server and connecting it to the MQ, you can process more requests on the servers without any change to the system. In addition, some nice features in the MQ like priority-based queuing, helps you to prioritize the requests and process them based on the job type.
p.s. Your Q does not provide any details about the type and parameters of the system. This is a draft solution that I can propose. If you provide more details, maybe the community can help you more.

Microservice, amqp and service registry / discovery

I m studying Microservices architecture and I m actually wondering something.
I m quite okay with the fact of using (back) service discovery to make request able on REST based microservices. I need to know where's the service (or at least the front of the server cluster) to make requests. So it make sense to be able to discover an ip:port in that case.
But I was wondering what could be the aim of using service registry / discovery when dealing with AMQP (based only, without HTTP possible calls) ?
I mean, using AMQP is just like "I need that, and I expect somebody to answer me", I dont have to know who's the server that sent me back the response.
So what is the aim of using service registry / discovery with AMQP based microservice ?
Thanks for your help
AMQP (any MOM, actually) provides a way for processes to communicate without having to mind about actual IP addresses, communication security, routing, among other concerns. That does not necessarily means that any process can trust or even has any information about the processes it communicates with.
Message queues do solve half of the process: how to reach the remote service. But they do not solve the other half: which service is the right one for me. In other words, which service:
has the resources I need
can be trusted (is hosted on a reliable server, has a satisfactory service implementation, is located in a country where the local laws are compatible with your requirements, etc)
charges what you want to pay (although people rarely discuss cost when it comes to microservices)
will be there during the whole time window needed to process your service -- keep in mind that servers are becoming more and more volatile. Some servers are actually containers that can last for a couple minutes.
Those two problems are almost linearly independent. To solve the second kind of problems, you have resource brokers in Grid computing. There is also resource allocation in order to make sure that the last item above is correctly managed.
There are some alternative strategies such as multicasting the intention to use a service and waiting for replies with offers. You may have reverse auction in such a case, for instance.
In short, the rule of thumb is that if you do not have an a priori knowledge about which service you are going to use (hardcoded or in some configuration file), your agent will have to negotiate, which includes dynamic service discovery.

design high volume MSMQ

We have many communication servers sending data packets. We would like to store these data packets coming from these server programs in MSMQ until an updater will process them. Data loss has been a concern and we would like to not lose any data packet coming from these server programs and want an efficient and performant solution.
What will be the best design approach?
Well, there are two basic things you need to do to get started. First, you'll want to modify the default installation to move the storage location to a drive that is mirrored and/or is not the same as the one that the operating system boots from on that server. Also you'll want to ensure there is enough space there to hold messages as they are queued, depending on the volume you're contemplating. This article covers that.
Second, you'll want to use transactions and journaling to ensure reliability. This is both a programming and infrastructure issue, so you can start by looking at this article, and then following up with a general guide on how to program against MSMQ correctly. This for example is a good starting point if you've never used MSMQ, although it's fairly basic. If you're going to use MSMQ as a binding/transport for WCF then you have the plumbing part pretty much covered; it's just a matter of configuring your services to handle the volume and traffic you think you're going to see.
We have many communication servers sending data packets.
When storing 'data packets', I would recommend writing [Serializable] .NET objects to WCF, mainly because WCF can read/write them transparently to MSMQ. This will be easier to work with, but if your data packets are say TCP/IP or binary packets, you will need to turn on 'Ordering', to ensure they go into the queue in the exact order they were placed.
MSMQ also has sessions, so if you want to group items together this is possible. WCF does not make this guarantee. You will need to write custom code for this, but it is only a case of assigning a unique ID to each message in a particular session.
Data loss has been a concern and we would like to not lose any data packet coming from these server programs
MSMQ can persist the data to disk, so if a server goes down, its queue is preserved. MSMQ can hold the queue in memory, which is more efficient but crashes/restarts will not retain the queue information.
and want an efficient( good performance )
MSMQ is fairly performant. The persistence to disk has a small overhead, but only due to the disk write. If performance includes multi-threading, MSMQ does not offer this feature as the queue is sequential, so must be processed in order. But this is typical of queue technologies.
MSMQ also have 4MB max message size, so keep in mind what you want to send across the network.
The only other thing is that MSMQ is not massively scalable. Its primary goal is guaranteed delivery. If you post millions of packets, they will get to their destination, but MSMQ does have a finite ability to push the messages to other machines. It operates a ThreadPool-like system, so it will not scale if this is also a requirement.
I have also added info to the #msmq-wcf wiki with a basic example of writing data.

Biztalk vs MSMQ

We need to send an XML messages between a point of sale system and a java webservice (outside of our network). the messages contain very sensitive data. The messaging has to be secure and transactional and highly available (24/7) with failover. The solution requires the developement of a broker that does the following:
Poll messages from the POS of system (3 types of messages)
do some transformation to the messages
forward part of the message to the java webservice
store part of the message in a database
notify the POS system of the result
Based on these somewhat simplified requirements, do you believe that Biztalk would be overkill? would MSMQ/WCF do the trick here?
Thank you for your help
Amine
IMO if you have the ability to receive and deliver messages asynchronously, then MSMQ (or other Message Oriented Middleware) would be an obvious choice for reliable, transactional transport, irrespective of the rest of the solution. MSMQ's journalling can also be used for audit and debugging purposes (but you will need a strategy for archiving the journal).
For the Polling, Routing, Mapping / Broker and Auditing requirements you then have the choice of BizTalk, other ESB and EAI products, or a DIY solution.
As you've suggested, it is difficult to justify the cost and learning curve of BizTalk on a single message exchange scenario such as this - you could probably knock up a .NET Windows Service (e.g. using WCF, Workflow Foundation, Transaction Scopes, some XSLT for mapping and a data access layer) in a few days.
However, if this isn't a one-off integration scenario and the need for additional integration arises (more applications to integrate, more services, additional listeners, different communications technologies etc), then it would be advisable for your company to take a long term view on EAI and ESB technologies. IMO the main challenge in integration isn't the initial development work, but is instead the ongoing operational management requirements - e.g. security, auditing, failover, monitoring, handling of bad messages and other exceptions - where products such as BizTalk are really worth the outlay.
Do you want to and have the bandwidth to develop, monitor, and maintain your own custom solution? If you don't mind doing that, then going the route of a custom .net-based, MSMQ/WCF solution might work well.
BizTalk will also cover all of the requirements you have listed. There is a learning curve but it is certainly not insurmountable. The initial ramp-up may be lengthier than would a custom-code solution, but there are considerable benefits, particularly the benefit of having all your requirements reliably met:
secure
transactional
reliable (messages aren't lost)
highly available (24/7)
failover
adapter architecture (includes polling adapters)
transformations
working with external web services
returning correlated responses back to the source system (i.e., orchestrating the end-to-end process)
use a broker (you specifically listed this, and BizTalk is a broker; custom MSMQ and WCF means using no broker)
If BizTalk needs to poll the POS system, then you do not need to worry about using MSMQ. BizTalk can handle transferring messages reliably (they're persisted to SQL Server, while MSMQ persists messages to disk).
Note too that the only way to make MSMQ highly available is to cluster it. So either way you'll need to cluster something.
A BizTalk solution will be easier to maintain over time, particularly if you just want to update your transformations. With versioning you can do so in a way that doesn't require downtime. It'll be tough to update a custom solution without downtime.
Some people have had difficulty in the past with monitoring BizTalk for failed messages, but I have found it to be easier, especially with a tool like SCOM or BizTalk 360, than trying to monitor message queues, which often requires even more custom work to monitor. Just make sure to include monitoring in your cost estimates for the life of your solution.
If you do need auditing, then BizTalk also has you covered. MSMQ Journaling will keep a copy of each message for you, but without significant transaction details and with no out-of-the-box way to search through or archive the data.
Building your own .NET client code to work with a Java web service will likely take a good bit of work regardless of which way you go. With BizTalk that means running a wizard against the endpoint or against the WSDL. With WCF it means doing everything by hand or with the assistance of the svcutil tool.
You should go with MSMQ transporting either way.
If you use MSMQ from .NET you should know its limitation: 4 MB on a message size.
BizTalk on the other hand has MSMQ adapter which overcomes this limitation (if a second BizTalk server listen on the other side of the channel).On top of that BizTalk gives you features like: easy configurable message tracking, visual transformation maps. It can be set up in cluster too (Ent. version only).
But the question is can you (or do you want) afford biztalk licenses and hardware for it servers (it's slower then custom .net solution).

Heartbeat Protocols/Algorithms or best practices

Recently I've added some load-balancing capabilities to a piece of software that I wrote. It is a networked application that does some data crunching based on input coming from a SQL database. Since the crunching can be pretty intensive I've added the capability to have multiple instances of this application running on different servers to split the load but as it is now the load balancing is a manual act. A user must specify which instances take which portion of the input domain.
I would like to take that to the next level and program the instances to automatically negotiate the diving up of the input data and to recognize if one of them "disappears" (has crashed or has been powered down) so that the remaining instances can take on the failed instance's workload.
In order to implement this I'm considering using a simple heartbeat protocol between the instances to determine who's online and who isn't and while this is not terribly complicated I'd like to know if there are any established heartbeat network protocols (based on UDP, TCP or both).
Obviously this happens a lot in the networking world with clustering, fail-over and high-availability technologies so I guess in the end I'd like to know if maybe there are any established protocols or algorithms that I should be aware of or implement.
EDIT
It seems, based on the answers, that either there are no well established heart-beat protocols or that nobody knows about them (which would imply that they aren't so well established after all) in which case I'm just going to roll my own.
While none of the answers offered what I was looking for specifically I'm going to vote for Matt Davis's answer since it was the closest and he pointed out a good idea to use multicast.
Thank you all for your time~
Distribued Interactive Simulation (DIS), which is defined under IEEE Standard 1278, uses a default heartbeat of 5 seconds via UDP broadcast. A DIS heartbeat is essentially an Entity State PDU, which fully defines the state, including the position, of the given entity. Due to its application within the simulation community, DIS also uses a concept referred to as dead-reckoning to provide higher frequency heartbeats when the actual position, for example, is outside a given threshold of its predicted position.
In your case, a DIS Entity State PDU would be overkill. I only mention it to make note of the fact that heartbeats can vary in frequency depending on the circumstances. I don't know that you'd need something like this for the application you described, but you never know.
For heartbeats, use UDP, not TCP. A heartbeat is, by nature, a connectionless contrivance, so it goes that UDP (connectionless) is more relevant here than TCP (connection-oriented).
The thing to keep in mind about UDP broadcasts is that a broadcast message is confined to the broadcast domain. In short, if you have computers that are separated by a layer 3 device, e.g., a router, then broadcasts are not going to work because the router will not transmit broadcast messages from one broadcast domain to another. In this case, I would recommend using multicast since it will span the broadcast domains, providing the time-to-live (TTL) value is set high enough. It's also a more automated approach than directed unicast, which would require the sender to know the IP address of the receiver in order to send the message.
Broadcast a heartbeat every t using UDP; if you haven't heard from a machine in more than k*t, then it's assumed down. Be careful that the aggregate bandwidth used isn't a drain on resources. You can use IP broadcast addresses, or keep a list of specific IPs you're doing work for.
Make sure the heartbeat includes a "reboot count" as well as "machine ID" so that you know previous server state isn't around.
I'd recommend using MapReduce if it fits. It would save a lot of work.
I'm not sure this will answer the question but you might be interested by the way Weblogic Server clustering work under the hood. From the book Mastering BEA WebLogic Server:
[...] WebLogic Server clustering provides a loose coupling of the servers in the cluster. Each server in the cluster is independent and does not rely on any other server for any fundamental operations. Even if contact with every other server is lost, each server will continue to run and be able to process the requests it receives. Each server in the cluster maintains its own list of other servers in the cluster through periodic heartbeat messages. Every 10 seconds, each server sends a heartbeat message to the other servers in the cluster to let them know it is still alive. Heartbeat messages are sent using IP multicast technology built into the JVM, making this mechanism efficient and scalable as the number of servers in the cluster gets large. Each server receives these heartbeat messages from other servers and uses them to maintain its current cluster membership list. If a server misses receiving three heartbeat messages in a row from any other server, it takes that server out of its membership list until it receives another heartbeat message from that server. This heartbeat technology allows servers to be dynamically added and dropped from the cluster with no impact on the existing servers’ configurations.
Cisco content switches are a hardware solution for this problem. They implement a virtual IP address as a front end to multiple real servers, whose real IP addresses are known to the switch. The switch periodically sends HTTP HEAD requests to the web servers, to verify they are still running (which the switch software calls a "keepalive", although this doesn't keep the server itself alive). The Cisco switch accepts traffic on the virtual IP and forwards it to the actual web servers, using configurable load balancing such as round-robin, or user-defined load balancing.
These switches retail in the $3-10K range, although my business partner picked one up on eBay for about $300 a year ago. If you can afford one, they do represent a proven hardware solution to the question of how to have a service spread transparently across multiple servers. Redhat includes a built-in port configuration so that you could implement your own Cisco switch using a cheap RedHat box. Google for "virtual ip address" and "cisco content router" for more information.
In addition to trying hardware load-balancers, you can also try a free-open-source load-balancing software application such as HAProxy, available for Linux and the BSDs.