I'm using MSMQ to transfer a byte array.
The formatter is a BinaryMessageFormatter.
The destination queue is a private one, and I'm using direct TCP communication.
The destination machine is in a different LAN, I access it by its external IP address.
There's a firewall which routs the incoming TCP communication to the actual destination machine (port forwarding).
So the system's architecture looks like this:
[source machine] --> [destination firewall] --> [destination machine]
I've been using the system for a few months, and all went fine.
Recently we experienced a firewall failure.
Apparently, this caused to data corruption:
While the queue is transactional, and the message, according to MSMQ, was successfully delivered to the destination machine, the message's content was corrupted.
That is, the code for reading the message from the queue raised an exception:
try
{
var message = queue.Receive(readTimeout, transaction);
if (message != null)
{
data = (byte[]) message.Body; // this line raises an exception
return true;
}
}
catch (Exception e)
{
Logger.Error("error reading queue", e);
}
I had to delete a few messages from (top of) the queue, and then the system got back to normal.
My question:
Assuming that it was only the firewall's failure that caused this, and knowing it's a transactional queue, how come the message was considered to be delivered?
Isn't MSMQ executing some kind of a checksum to ensure data integrity on transactional queues?
It currently looks like MSMQ entirely "trusts" the TCP layer when it comes to data integrity and completeness.
This is just a guess, but I think the Distributed Transaction Coordinator (DTC) might have troubles here. As the name implies, the DTC is responsible for handling the transactions that involve multiple system in a network.
What I could imagine in your scenario, is that the transactions cannot be correctly managed over your different LANs.
My suggestion to you is to check the configurations of the involved DTC instances in both networks if there are option to enable the communication between them and/ or research if the firewall needs to be set up for to allow DTC communication.
Related
I am trying to implement an event driven architecture to handle distributed transactions. Each service has its own database and uses Kafka to send messages to inform other microservices about the operations.
An example:
Order service -------> | Kafka |------->Payment Service
| |
Orders MariaDB DB Payment MariaDB Database
Order receives an order request. It has to store the new Order in its DB and publish a message so that Payment Service realizes it has to charge for the item:
private OrderBusiness orderBusiness;
#PostMapping
public Order createOrder(#RequestBody Order order){
logger.debug("createOrder()");
//a.- Save the order in the DB
orderBusiness.createOrder(order);
//b. Publish in the topic so that Payment Service charges for the item.
try{
orderSource.output().send(MessageBuilder.withPayload(order).build());
}catch(Exception e){
logger.error("{}", e);
}
return order;
}
These are my doubts:
Steps a.- (save in Order DB) and b.- (publish the message) should be performed in a transaction, atomically. How can I achieve that?
This is related to the previous one: I send the message with: orderSource.output().send(MessageBuilder.withPayload(order).build()); This operations is asynchronous and ALWAYS returns true, no matter if the Kafka broker is down. How can I know that the message has reached the Kafka broker?
Steps a.- (save in Order DB) and b.- (publish the message) should be
performed in a transaction, atomically. How can I achieve that?
Kafka currently does not support transactions (and thus also no rollback or commit), which you'd need to synchronize something like this. So in short: you can't do what you want to do. This will change in the near-ish future, when KIP-98 is merged, but that might take some time yet. Also, even with transactions in Kafka, an atomic transaction across two systems is a very hard thing to do, everything that follows will only be improved upon by transactional support in Kafka, it will still not entirely solve your issue. For that you would need to look into implementing some form of two phase commit across your systems.
You can get somewhat close by configuring producer properties, but in the end you will have to chose between at least once or at most once for one of your systems (MariaDB or Kafka).
Let's start with what you can do in Kafka do ensure delivery of a message and further down we'll dive into your options for the overall process flow and what the consequences are.
Guaranteed delivery
You can configure how many brokers have to confirm receipt of your messages, before the request is returned to you with the parameter acks: by setting this to all you tell the broker to wait until all replicas have acknowledged your message before returning an answer to you. This is still no 100% guarantee that your message will not be lost, since it has only been written to the page cache yet and there are theoretical scenarios with a broker failing before it is persisted to disc, where the message might still be lost. But this is as good a guarantee as you are going to get.
You can further reduce the risk of data loss by lowering the intervall at which brokers force an fsync to disc (emphasized text and/or flush.ms) but please be aware, that these values can bring with them heavy performance penalties.
In addition to these settings you will need to wait for your Kafka producer to return the response for your request to you and check whether an exception occurred. This sort of ties into the second part of your question, so I will go into that further down.
If the response is clean, you can be as sure as possible that your data got to Kafka and start worrying about MariaDB.
Everything we have covered so far only addresses how to ensure that Kafka got your messages, but you also need to write data into MariaDB, and this can fail as well, which would make it necessary to recall a message you potentially already sent to Kafka - and this you can't do.
So basically you need to choose one system in which you are better able to deal with duplicates/missing values (depending on whether or not you resend partial failures) and that will influence the order you do things in.
Option 1
In this option you initialize a transaction in MariaDB, then send the message to Kafka, wait for a response and if the send was successful you commit the transaction in MariaDB. Should sending to Kafka fail, you can rollback your transaction in MariaDB and everything is dandy.
If however, sending to Kafka is successful and your commit to MariaDB fails for some reason, then there is no way of getting back the message from Kafka. So you will either be missing a message in MariaDB or have a duplicate message in Kafka, if you resend everything later on.
Option 2
This is pretty much just the other way around, but you are probably better able to delete a message that was written in MariaDB, depending on your data model.
Of course you can mitigate both approaches by keeping track of failed sends and retrying just these later on, but all of that is more of a bandaid on the bigger issue.
Personally I'd go with approach 1, since the chance of a commit failing should be somewhat smaller than the send itself and implement some sort of dupe check on the other side of Kafka.
This is related to the previous one: I send the message with:
orderSource.output().send(MessageBuilder.withPayload(order).build());
This operations is asynchronous and ALWAYS returns true, no matter if
the Kafka broker is down. How can I know that the message has reached
the Kafka broker?
Now first of, I'll admit I am unfamiliar with Spring, so this may not be of use to you, but the following code snippet illustrates one way of checking produce responses for exceptions.
By calling flush you block until all sends have finished (and either failed or succeeded) and then check the results.
Producer<String, String> producer = new KafkaProducer<>(myConfig);
final ArrayList<Exception> exceptionList = new ArrayList<>();
for(MessageType message : messages){
producer.send(new ProducerRecord<String, String>("myTopic", message.getKey(), message.getValue()), new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
exceptionList.add(exception);
}
}
});
}
producer.flush();
if (!exceptionList.isEmpty()) {
// do stuff
}
I think the proper way for implementing Event Sourcing is by having Kafka be filled directly from events pushed by a plugin that reads from the RDBMS binlog e.g using Confluent BottledWater (https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/) or more active Debezium (http://debezium.io/). Then consuming Microservices can listen to those events, consume them and act on their respective databases being eventually consistent with the RDBMS database.
Have a look here to my full answer for a guideline:
https://stackoverflow.com/a/43607887/986160
I have read following on msdn about accept function:
https://msdn.microsoft.com/pl-pl/library/windows/desktop/ms737526(v=vs.85).aspx
When using the accept function, realize that the function may return
before connection establishment has traversed the entire distance
between sender and receiver. This is because the accept function
returns as soon as it receives a CONNECT ACK message; in ATM, a
CONNECT ACK message is returned by the next switch in the path as soon
as a CONNECT message is processed (rather than the CONNECT ACK being
sent by the end node to which the connection is ultimately
established). As such, applications should realize that if data is
sent immediately following receipt of a CONNECT ACK message, data loss
is possible, since the connection may not have been established all
the way between sender and receiver.
Could someone explain it in more details? What it has with SYN, SYN ACK? What's the problem here? So when such data loss can happen, and how to prevent it?
You're omitting an important paragraph on that page, right before your quote:
The following are important issues associated with connection setup,
and must be considered when using Asynchronous Transfer Mode (ATM)
with Windows Sockets 2
That is, it is only applicable when you use things like AF_ATM and SOCKADDR_ATM. It is not relevant for TCP which you seem to imply with:
What it has with SYN, SYN ACK
I have created an NServiceBus Distributor and Worker, running on separate machines. When I run the worker, it successfully sends a message to the Distributor (and I can see it processed through the Storage queue) but for some reason an output queue is created on the Distributor called
'DIRECT=TCP:xx.xx.xx.xx\PRIVATE$\order_queue$ when the queue should be called
'DIRECT=OS:WORKERDNSNAME\private$\myqueue'.
Does anyone know why the order_queue$ is being created?
Shameless copy direct from an old post at pg2e.blogspot.co.uk:
Transactional queues over HTTP from private networks
When sending messages to a transactional queue over http/s from a
server without a public ip address the ACK-messages may have a hard
time reaching their destination. This is due to the same cause as in
this post (Basically NATting causing a mismatch with the message destination address).
By default the receipts are sent to the sending computers name, which
of course will not work unless both parties resides on the same
network. To fix this you have to map the receipts to the public address
of the sender. This is done by creating an xml-file (of any name) in
C:\WINDOWS\system32\msmq\mapping with the following content.
<StreamReceiptSetup xmlns="msmq-streamreceipt-mapping.xml">
<setup>
<LogicalAddress>http://msmq.domain.com/*</LogicalAddress>
<StreamReceiptURL>http://[ADDRESS_TO_SENDER]/msmq/Private$/order_queue$</StreamReceiptURL>
</setup>
<default>http://xxx.xx.xxx.xx/msmq/Private$/order_queue$</default>
</StreamReceiptSetup>
Explanation: All messages sent to any queue at msmq.domain.com will
have their receipts sent to the given StreamReceiptURL. The
order_queue$ queue is used to handle transactional control messages.
I suspect later versions of MSMQ or NServiceBus handle creating this queue automatically without you having to create the XML file yourself.
I have a setup with two Erlang nodes on the same physical machine, and I wanna be able to send large files between the nodes.
From the symptoms I see it looks like there is only one Tcp connection between the nodes, and sending the large binary across stops all other traffic, is this the case?
And even more interesting is there a way of making the vm use several connections between the nodes?
Yeah, you only get 1 connection, according to the manual
The handshake will continue, but A is informed that B has another
ongoing connection attempt that will be shut down (simultaneous
connect where A's name is greater than B's name, compared literally).
Not sure what "big" means in the question, but generally speaking (and imho), it might be good to setup a separate tcp port to handle the payloads, and only use the standard erlang messages as a signaling method (to negotiate ports, setup a listener, etc), like advising there's a new incoming payload and negotiate anything needed.
Btw, there's an interesting thread on the same subject, and you might try tunning the net_* variables to see if they help with the issues.
hope it helps!
It is not recommended to send large messages between erlang nodes,
http://learnyousomeerlang.com/distribunomicon
Refer to "bandwidth is infinite" section, I would recommend use something else like GFS so that you don't lose the distribution feature of erlang.
I understand that in HornetQ you can do live-backup pairs type of clustering. I also noticed from the documentation that you can do load balancing between two or more nodes in a cluster. Are those the only two possible topologies? How would you implement a clustered queue pattern?
Thanks!
Let me answer this using two terminologies: One the core queues from hornetq:
When you create a cluster connection, you are setting an address used to load balance hornetq addresses and core-queues (including its direct translation into jms queues and jms topics), for the addresses that are part of the cluster connection basic address (usually the address is jms)
When you load balance a core-queue, it will be load balanced among different nodes. That is each node will get one message at the time.
When you have more than one queue on the same address, all the queues on the cluster will receive the messages. In case one of these queues are in more than one node.. than the previous rule on each message being load balanced will also apply.
In JMS terms:
Topic subscriptions will receive all the messages sent to the topic. Case a topic subscription name / id is present in more than one node (say same clientID and subscriptionName on different nodes), they will be load balanced.
Queues will be load balanced through all the existent queues.
Notice that there is a setting on forward when no consumers. meaning that you may not get a message if you don't have a consumer. You can use that to configure that as well.
How would you implement a clustered queue pattern?
Tips for EAP 6.1/HornetQ 2.3 To implement a distributed queue/topic:
Read the official doc for your version: e.g. for 2.3 https://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/clusters.html
Note that the old setting clusterd=true is deprecated, defining the cluster connection is enough, check that internal core bridges are created automatically / clustered=true is deprecated in 2.3+
take the full-ha configuration as a baseline or make sure you have jgroups properly set. This post goes deeply into the subject: https://developer.jboss.org/thread/253574
Without it, no errors are shown, the core bridge connection is
established... but messages are not being distributed, again no errors
or warnings at all...
make sure security domain and security realms, users, passwords, roles are properly set.
E.g. I confused the domain id ('other') with the realm id
('ApplicationRealm') and got auth errors, but the errors were
generic, so I wasted time checking users, passwords, roles... until I
eventually found out.
debug by enabling debug (logger.org.hornetq.level=DEBUG)