There are two major fix engines:
opensource QuickFix http://www.quickfixengine.org/
commercial Fix Antenna http://www.b2bits.com/trading_solutions/fix_engines/fix_engine_cpp.html
What are cons and prons of each of them? I know that Fix Antenna is faster, but what else?
Is QuickFix project alive. Changelog shows that last commit to QuickFix was at 2010-04-06 06:22, does it mean that project is dead?
As DumbCoder says, there are a lot more than 2 major FIX engines. Cameron tends to be used by a number of investment banks. Rapid Addition also have a highly regarded FIX engine - and there are lots more. QuickFix is very popular and is relied on and used by lots of individuals and trading businesses and can be an excellent choice unless you are very latency sensitive.
There is always also the option to write your own. It depends on your use case. If you aren't hugely latency sensitive (care about microseconds), then QuickFix is probably your best bet. Otherwise, if you do care about every microsecond and want more predictable latency per message processed, then QuickFix will not work for you and you either want a low latency commerical solution (such as Rapid Addition Cheetah) or you want to write your own which is optimised to your usage scenario. Note that writing your own to beat the performance of a commercial solution will take some time and is no easy feat as many of the commercial engines are now highly performant.
Having undertaken extensive benchmarking for QuickFix, Antenna, Onix, Rapid Addition and Cameron we have opted for Onix . www.onixs.biz.
We benchmarked Java, C++ and .net solutions and have gone live with C++ on RHEL5.
I ran the performance test code included in QuickFix C++ and got the results below. As far as I can tell, this looks excellent. And I am running a commodity home desktop not a high-end trading server used by big shops. The build was done with VS 2012 at full optimization.
G:\projects\quickfix\test\release\pt>pt.exe -p 15000 -c 1000000
Converting integers to strings:
num: 1000000, seconds: 0.016, num_per_second: 6.25e+007
Converting strings to integers:
num: 1000000, seconds: 0, num_per_second: 1.#INF
Converting doubles to strings:
num: 1000000, seconds: 0.5, num_per_second: 2e+006
Converting strings to doubles:
num: 1000000, seconds: 0.219, num_per_second: 4.56621e+006
Creating Heartbeat messages:
num: 1000000, seconds: 0.75, num_per_second: 1.33333e+006
Identifying message types:
num: 1000000, seconds: 0.062, num_per_second: 1.6129e+007
Serializing Heartbeat messages to strings:
num: 1000000, seconds: 0.516, num_per_second: 1.93798e+006
Serializing Heartbeat messages from strings:
num: 1000000, seconds: 1.094, num_per_second: 914077
Creating NewOrderSingle messages:
num: 1000000, seconds: 2.312, num_per_second: 432526
Serializing NewOrderSingle messages to strings:
num: 1000000, seconds: 0.75, num_per_second: 1.33333e+006
Serializing NewOrderSingle messages from strings:
num: 1000000, seconds: 3.188, num_per_second: 313676
Creating QuoteRequest messages:
num: 1000000, seconds: 41.547, num_per_second: 24069.1
Serializing QuoteRequest messages to strings:
num: 1000000, seconds: 3.734, num_per_second: 267809
Serializing QuoteRequest messages from strings:
num: 1000000, seconds: 26.672, num_per_second: 37492.5
Reading fields from QuoteRequest message:
num: 1000000, seconds: 15.89, num_per_second: 62932.7
Storing NewOrderSingle messages:
num: 1000000, seconds: 3.485, num_per_second: 286944
Validating NewOrderSingle messages with no data dictionary:
num: 1000000, seconds: 0.11, num_per_second: 9.09091e+006
Validating NewOrderSingle messages with data dictionary:
G:\projects\quickfix\test\release\pt>
Not necessarily the 2 major FIX engines. Cameron is another widely used FIX engine. Many financial companies do develop their own FIX engines. There isn't a list of pros and cons for the FIX engines, per se as most of them aren't built the same way and other factors creep in during development. Only way is evaluate them yourselves for your specific needs.
Quickfix project is pretty much alive. There do exist some commercial companies which provide support for Quickfix, it is probably there on the Quickfix website. And you do have the source code, there is nothing stopping you from tinkering yourself.
Related
I have implemented a stream processing app that makes some calculations and transformations, and send the result to an output topic.
After that, I read from that topic and I want to suppress the results for 35", just like a timer, meaning that all the output records from that suppress will be sended to an specific "timeout" topic.
The simplified code looks like this:
inputStream
.suppress(Suppressed.untilTimeLimit(Duration.ofSeconds(35), Suppressed.BufferConfig.unbounded()))
.toStream()
.peek((key, value) -> LOGGER.warn("incidence with Key: {} timeout -> Time = {}", key, 35))
.filterNot((key, value) -> value.isDisconnection())
The problem I have here is that suppress contains the records during an arbitrary time, not the specified 35 seconds.
For more information I'm using event-time extracted in the former process described at the beginning, and records are arriving each second;
Thanks
Update
This is an input record example:
rowtime: 4/8/20 8:26:33 AM UTC, key: 34527882, value: {"incidenceId":"34527882","installationId":"18434","disconnection":false,"timeout":false, "creationDate":"1270801593"}
I ran into a similar issue some time ago, and the reason why suppress contains the records during an arbitrary time it's due to the fact that the suppress operator uses what they call stream time instead of the intuitive wall-clock time.
As of now, untilTimeLimit only supports stream time, which limits its usefulness. Work is underway to add a wall-clock time option, expanding this feature into a general rate control mechanism.
The important aspects of time for Suppress are:
The timestamp of each record: This is also known as event time.
What time is “now”? Stream processing systems have two main choices here:
The intuitive “now” is wall-clock time, the time you would see if you look at a clock on the wall while your program is running.
There is also stream time, which is essentially the maximum timestamp your program has observed in the stream so far. If you’ve been polling a topic, and you’ve seen records with timestamps 10, 11, 12, 11, then the current stream time is 12.
Reference: Kafka Streams’ Take on Watermarks and Triggers
I'm using Entity Framework Core 2.2 and i decided to follow some blog sugestion and enable retry on failure:
services.AddDbContext<MyDbContext>( options =>
options.UseSqlServer(Configurations["ConnectionString"]),
sqlServerOptionsAction: sqlOptions =>
{
sqlOptions.EnableRetryOnFailure(
maxRetryCount: 10,
maxRetryDelay: TimeSpan.FromSeconds(5),
errorNumbersToAdd: null);
});
My question is what is the maxRetryDelay argument for?
I would expect it to be the delay time between retries, but the name implies its the maximum time, does that mean i can do my 10 retries 1 second apart and not 5 seconds apart as i desire?
The delay between retries is randomized up to the value specified by maxRetryDelay.
This is done to avoid multiple retries occuring at the same time and overwhelming a server. Imagine for example 10K requests to a web service failing due to a network issue and retrying at the same time after 15 seconds. The database server would get a sudden wave of 15K queries.
By randomizing the delay, retries are spread across time and client.
The delay for each retry is calculated by ExecutionStragegy.GetNextDelay. The source shows it's a random exponential backoff.
The default SqlServerRetryingExecutionStrategy uses that implementation. A custom retry strategy could use a different implementation
So far I thought they are the same as bytes are made of bits and that both side needs to know byte size and endiannes of the other side and transform stream accordingly. However Wikipedia says that byte stream != bit stream (https://en.wikipedia.org/wiki/Byte_stream ) and that bit streams are specifically used in video coding (https://en.wikipedia.org/wiki/Bitstream_format). In this RFC https://www.rfc-editor.org/rfc/rfc107 they discuss these 2 things and describe Two separate kinds of inefficiency arose from bit streams.. My questions are:
what's the real difference between byte stream and bit stream?
how bit stream works if it's different from byte stream? How does a receiving side know how many bits to process at a given time?
why is bit stream better than byte stream in some cases?
This is a pretty broad question, I'll have to give the 10,000 feet view. Bit streams are common in two distinct usages:
very low-level, it is the fundamental way that lots of hardware operates. Best examples are the data stream that comes off a hard disk or a optical disk or the data sent across a transmission line, like a USB cable or the coax cable or telephone line through which you received this post. The RFC you found applies here.
high-level, they are common in data compression, a variable number of bits per token allows packing data tighter. Huffman coding is the most basic way to compress. The video encoding subjects you found applies here.
what's the real difference between byte stream and bit stream?
Byte streams are highly compatible with computers which are byte-oriented devices and the ones you'll almost always encounter in programming. Bit streams are much more low-level, only system integration engineers ever worry about them. While the payload of a bit stream is often the bytes that a computer is interested in, more overhead is typically required to ensure that the receiver can properly interpret the data. There are usually a lot more bits than necessary to encode the bytes in the data. Extra bits are needed to ensure that the receiver is properly synchronized and can detect and perhaps correct bit errors. NRZ encoding is very common.
The RFC is quite archeological, in 1971 they were still hammering out the basics of getting computers to talk to each other. Back then they were still close to the transmission line behavior, a bit stream, and many computers did not yet agree on 8 bits in a byte. They are fretting over the cost of converting bits to local bytes on very anemic hardware and the need to pack as many bits into a message as possible.
How does a receiving side know how many bits to process at a given time?
The protocol determines that, like that RFC does. In the case of a variable length bit encoding it is bit values themselves that determine it, like Huffman coding does.
why is bit stream better than byte stream in some cases?
Covered already I think, because it is better match for its purpose. Either because the hardware is bit-oriented or because variable bit-length coding is useful.
A bit is a single 1 or 0 in computer code, also known as a binary digit.
The most common use for the bit stream is with the transmission control protocol, or TCP. This series of guidelines tells computers how to send and receive messages between each other. The World Wide Web and e-mail services, among others, rely on TCP guidelines to send information in an orderly fashion. Sending through the bit stream ensures the pieces arrive in the proper order and the message isn't corrupted during delivery, which could make it unreadable.So a bit stream sends one bit after another.
Eight bits make up a byte, and the byte stream transmits these eight-bit packets from computer to computer.
The packets are decoded upon arrival so the computer can interpret them.Thus a byte stream is a special case of bits sent together as a group in sequential order.For a byte stream to be most effective, it flows through a dedicated and reliable path sometimes referred to as a pipe, or pipeline.
When it comes to sending a byte stream over a computer network, a reliable bi-directional transport layer protocol, such as the transmission control protocol (TCP) used on the Internet, is required. These are referred to as a byte stream protocol. Other serial data protocols used with certain types of hardware components, such as the universal asynchronous receiver/transmitter (UART) technique, is a serial data channel that also uses a byte stream for communication. In this case, the byte, or character, is packaged up in a frame on the transmitting end, where an extra starting bit and some optional checking bits are attached and then separated back out of the frame on the receiving end. This technique is sometimes referred to as a byte-oriented protocol.
Taking a general life example,suppose you have a lot of match sticks to send.Then you could send them one stick after the other,one at a
time.. or you could pack a few of them in a match box and send them
together ,one matchbox after the other in sequence.the first is like
bitstream and the latter like bytestream.
Thus it all depends on what the hardware wants or is best suited for..If your hand is small and you cant accept matchboxes but you still want matchsticks then you take them one at a time or else take the box.Also byte streams are better in a sense that every bit does not need to be checked and data can be sent in batches of 8,.if any of it fails the entire 8bits can be re sent.
To add to the other good answers here:
A byte stream is a type of bit stream. A byte stream describes the bits as meaningful "packages" that are 8 bits wide.
Certain (especially low-level) streams may be agnostic of meaning in each 8 bit sequence. It would be a poor description to call these "byte streams"
Similar to how every Honda Civic is a car, but not every car is a Honda Civic...
I'm adding peer-to-peer bluetooth using GameKit to an iPhone shoot-em-up, so speed is vital. I'm sending about 40 messages a second each way, most of them with the faster GKSendDataUnreliable, all serializing with NSCoding. In testing between a 3G and 3GS, this is slowing the 3G down a lot more than I'd like. I'm wondering where I should concentrate my efforts to speed it up.
How much slower is GKSendDataReliable? For the few packets that have to get there, would it be faster to send a GKSendDataUnreliable and have the peer send an acknowledgement so I can send again if I don't get the Ack within, say, 100ms?
How much faster would it be to create the NSData instance using a regular C array rather than archiving with the NSCoding protocol? Is this serialization process (for about a dozen floats) just as slow as you'd expect from an object creation/deallocation overhead, or is something particularly slow happening?
I heard that (for example) sending four seperate sets of data is much, much slower, than sending one piece of data four times the size. Would I make a significant saving by sending separate packets of data that wouldn't always go together in the same packet when they happen at the same time?
Are there any other bluetooth performance secrets I've missed?
Thanks for your help.
I'm not a bluetooth expert, but in general sending data using reliable is 1.5x the speed of sending data unreliable. I would avoid trying to send an ACK back using an unreliable method because then you're going to have to put in all kinds of ridiculous logic to detect whether the ACK failed to arrive which will slow you down much more than just using a reliable send.
Sending data has a high latency which means that sending 4 small packets is going to take more time than sending 1 packet with a 4x sized payload. Any time you can increase the payload size to make fewer sends you will get a performance benefit.
If you know the size and shape of the data that you are sending and receiving, you can also squeeze out some performance by sending byte arrays or arrays of numbers rather than using NSCoding because the NSCoding is going to consume some time to serialize and de-serialize (a step you can skip if you're just sending arrays) and the amount of data you send will be slightly more with NSCoder than it would be with a raw array.
I am trying to implement a real-time application which involves IPC across different modules. The modules are doing some data intensive processing. I am using message queue as the backbone(Activemq) for IPC in the prototype, which is easy(considering I am a totally IPC newbie), but it's very very slow.
Here is my situation:
I have isolated the IPC part so that I could change it other ways in future.
I have 3 weeks to implement another faster version. ;-(
IPC should be fast, but also comparatively easy to pick up
I have been looking into different IPC approaches: socket, pipe, shared memory. However, I have no experience in IPC, and there is definitely no way I could fail this demo in 3 weeks... Which IPC will be the safe way to start with?
Thanks.
Lily
Best results you'll get with Shared Memory solution.
Recently I met with the same IPC benchmarking. And I think my results will be useful for all who want to compare IPC performance.
Pipe benchmark:
Message size: 128
Message count: 1000000
Total duration: 27367.454 ms
Average duration: 27.319 us
Minimum duration: 5.888 us
Maximum duration: 15763.712 us
Standard deviation: 26.664 us
Message rate: 36539 msg/s
FIFOs (named pipes) benchmark:
Message size: 128
Message count: 1000000
Total duration: 38100.093 ms
Average duration: 38.025 us
Minimum duration: 6.656 us
Maximum duration: 27415.040 us
Standard deviation: 91.614 us
Message rate: 26246 msg/s
Message Queues benchmark:
Message size: 128
Message count: 1000000
Total duration: 14723.159 ms
Average duration: 14.675 us
Minimum duration: 3.840 us
Maximum duration: 17437.184 us
Standard deviation: 53.615 us
Message rate: 67920 msg/s
Shared Memory benchmark:
Message size: 128
Message count: 1000000
Total duration: 261.650 ms
Average duration: 0.238 us
Minimum duration: 0.000 us
Maximum duration: 10092.032 us
Standard deviation: 22.095 us
Message rate: 3821893 msg/s
TCP sockets benchmark:
Message size: 128
Message count: 1000000
Total duration: 44477.257 ms
Average duration: 44.391 us
Minimum duration: 11.520 us
Maximum duration: 15863.296 us
Standard deviation: 44.905 us
Message rate: 22483 msg/s
Unix domain sockets benchmark:
Message size: 128
Message count: 1000000
Total duration: 24579.846 ms
Average duration: 24.531 us
Minimum duration: 2.560 us
Maximum duration: 15932.928 us
Standard deviation: 37.854 us
Message rate: 40683 msg/s
ZeroMQ benchmark:
Message size: 128
Message count: 1000000
Total duration: 64872.327 ms
Average duration: 64.808 us
Minimum duration: 23.552 us
Maximum duration: 16443.392 us
Standard deviation: 133.483 us
Message rate: 15414 msg/s
Been facing a similar question myself.
I've found the following pages helpful - IPC performance: Named Pipe vs Socket (in particular) and Sockets vs named pipes for local IPC on Windows?.
It sounds like the concensus is that shared memory is the way to go if you're really concerned about performance, but if the current system you have is a message queue it might be a rather... different structure. A socket and/or named pipe might be easier to implement, and if either meets your specs then you're done there.
On Windows, you can use WM_COPYDATA, a special kind of shared memory-based IPC. This is an old, but simple technique: "Process A" sends a message, which contains a pointer to some data in its memory, and waits until "Process B" processes (sorry) the message, e.g. creates a local copy of the data. This method is pretty fast and works on Windows 8 Developer Preview, too (see my benchmark). Any kind of data can be transported this way, by serializing it on the sender, and deserializing it on the receiver side. It's also simple to implement sender and receiver message queues, to make the communication asynchronous.
You may check out this blog post https://publicwork.wordpress.com/2016/07/17/endurox-vs-zeromq/
Basically it compares Enduro/X, which is built on POSIX queues ( kernel queues IPC ) versus ZeroMQ, which may deliver messages simultaneously on several different transport classes, incl. tcp:// ( network sockets ), ipc://, inproc://, pgm:// and epgm:// for multicast.
From charts you may see that at some point with larger data packets Enduro/X running on queues wins over the sockets.
Both systems are running good with ~400 000 messages per second, but with 5KB messages, kernel queues are running better.
(image source: https://publicwork.wordpress.com/2016/07/17/endurox-vs-zeromq/)
UPDATE:
Another update as answer to bellow comment, I did rerun test to run ZeroMQ on ipc:// too, see the picture:
As we see the ZeroMQ ipc:// is better, but again in some range Enduro/X shows the better results and then again ZeroMQ takes over.
Thus I could say that IPC selection depends on the work you plan to do.
Note that ZeroMQ IPC runs on POSIX pipes. While Enduro/x runs on POSIX queues.