how high frequency trading system connects to exchange - fix-protocol

I'm trying to study about high frequency trading systems. Whats the mechanism that HFT use to connect with the exchange and whats the procedure (does it has to go through a broker or is it direct access, if it's direct access what sort of connection information that i require)
Thanks in advance for your answers.

Understand that there are two different "connections" in an HFT engine. The first is the connection to a market data source. The second is to a clearing resource. As mentioned in kpavlov's answer, a very expensive COLO (co-location) is needed to get as close to the data source/target as possible. Depending on their nominal latency these COLO resources cost thousands of dollars per month.
With both connections, your trading engine must be certified by the provider (ICE, CME, etc) to comply with their requirements. With CME the certification process is automated, with ICE it employs human review. In any case, the certification requires that your software demonstrate conformance to standards and freedom from undesirable network side effects.
You must also subscribe to your data source(s) and clearing service, neither is inexpensive and pricing varies over a pretty wide range. During the subscription process you'll gain access to the service providers technical data specification(s)-- a critical part of designing your trading engine. Using old data that you find on the Internet for design purposes is a recipe for problems later. Subscription also gets you access to the provider(s) test sites. It is on these test sites that you test and debug your engine.
After you think you engine is ready for deployment you begin connecting to the data/clearing production servers. This connection will get you into a place of shadows-- port roulette. Not every port at the provider's network edge has the same latency. Here you'll learn that you can have the shortest latency yet seldom have orders filled first. Traditional load balancing does little to help this and CME has begun deployment of FPGA-based systems to ensure correct temporal sequencing of inbound orders, but it's still early in its deployment process.
Once you're running you then get to learn that mistakes can be very expensive. If you place an order prior to a market pre-open event the order is automatically rejected. Do it too often and the clearing provider will charge you a very stiff penalty. Other things can also get you penalized or even kicked-off the service if your systems are determined to be implementing strategies to block others from access, etc.
All the major exchanges web sites have links to public data and educational resources to help decide if HFT is "for you" and how to go about it.

It usually requires an approval from exchange to grant access from outside. They protect their servers by firewalls so your server/network need to be authorized to access.
Special certification procedure with technician (by phone) is usually required before they authorize you.
Most liquidity providers use FIX protocol or custom APIs. You may consider starting implementing your connector with QuickFix, but it may become a bottleneck later, when your traffic will grow.
Information you need to access by FIX is:
Server IP
Server port
FIX protocol credentials:
Other fields


Real life scenarios of when would anyone choose availability over consistency (Who would be in interested in stale data?)

I was trying to wrap my brain around the CAP theorem. I understand that Network partitions can occur (eventually leading to the nodes in the cluster not able to sync up with the WRITE operations happening on the other nodes.)
In this case, either the Cluster could still be up and the load-balancer in front of the cluster could route the request to any of the nodes and after a WRITE operation on one of the nodes, the other nodes who can't sync with that data, still have STALE data and any subsequent READS to these nodes will serve STALE data.
[So we are Loosing CONSISTENCY as we choose AVAILABILITY (i.e., we have choose the cluster to give STALE responses back.)]
Or we could SHUTDOWN the cluster whenever a network partition occurs! (There by loosing AVAILABILITY as we don't want to hamper consistency among the nodes.)
I have 2 things I would like to know the answer for it:
In Reality, When would anyone choose to be AVAILABLE and still trade off CONSISTENCY? Who on this earth (practically) would be interested in STALE data?
Please help me understand by listing more than one scenarios.
In case, we would like to choose CONSISTENCY over AVAILABILITY,
the cluster is down. Who on earth (real-time scenarios) practically would accept to design their system to be DOWN in order to preserve CONSISTENCY.
Please list some scenarios.
Won't majority of us look for High availability no matter what? what are our options? please enlighten.
If I send you a message on FB and you send one to me, I'd rather prefer to see messages in an incorrect order(message sent at 1pm comes before message sent at 2pm) rather than not seeing them at all(example of AVAILABILITY of messages prefered over read-after-write CONSISTENCY of messages). Another example, If I gather web site metrics, I'd rather skip or drop some signal rather then force my users to wait for a page load while my consistent transaction is stuck.
Keep in mind that consistency doesn't mean STALE data, also data can be inconsistent in different ways(
Financial transactions are a classic example of data that requires consistency over availability. As a bank, I'd rather decline user request for money transfer, than accept it and lose customer's money due to DB being down.
I'd like to point out that CAP theorem is a high-level concept. There are a lot of ways you can treat terms consistency, availability or even partitioning, and different businesses have different requirements. Software engineering as a whole and distributed systems engineering, in particular, is about making trade-offs.
An example where you may choose Availability over Consistency is collaborative editing (e.g. Google Docs). It may be perfectly acceptable (and in fact desirable) to allow users to make local modifications to the documents and deal with conflict resolution once network is restored.
A bank ATM is an example where you'd choose Consistency over Availability. Once ATM is disconnected from the network you would not want to allow withdrawals (thus, no Availability). Or, you could pick partial Availability, and allow deposits or read-only access to your bank statements.

How can I improve response time if the remote server is located very far physical distance

I want to know how to construct servers physically in this situation.
Let's assume that my service provides in the USA.
And my business is quite successful so, I want to expand my business location in Asia.
but I don't want to localized service, so I just got some API server in Asia to provide service which is just use API that located in headquater, but my main components are still in the USA.
But the problem is that my API which is located in Asia needs to call head-quater API which is located in the USA, and the response is quite often slow because of far physical distance.
so In this situation, How can I overcome?
In my opinion, I get some CDN for static contents. but I have no idea how to improve the API response time problem which is originated from physical distance.
If it is a stupid question, please understand, I'm quite a newbie in architect.
Also, How can I construct database replication in this situation.
If I get a replication which is replicate from the USA in Asia, I think the replication performance is quite poor because of phisical distance.
How Amazon or any global service construct it?
Replication performance can be quite poor. It is important to understand how much of your data is changing so that you can estimate the bandwidth required and understand whether your replication can keep up.
Amazon and other global services deal with this via a combination of replication, edge-caching (CDN), and other methodologies that bring the data closer to the consumer.
As a first step, you also might want to look at just making your API more coarse-grained. The fewer calls you have to make, the higher the performance (as the problem is likely latency, not bandwidth). See if you can batch things up instead of handling them one-at-a-time.
You also can look critically at caching. Instead of making your read-only API calls all the time, introduce some cache-control headers to specify the acceptable age of your requests. A lot of data is very static, things like user data, departments, product-info etc... Some of this data can leverage caching layers to become much more performant.
If you want to use AWS and want to host main components in a specific region, then you may think of hosting it yourself in EC2(s) [as Origin Server] in the region of your choice and use Cloudfront (CDN) to serve the content globally. AWS employs their own High Speed Backbone Network to reduce latency between geographically distant locations, by reducing no of Network hops.
From a caching standpoint, as Rob rightly said, Cloudfront performs different caching mechanisms for hot objects, warm objects (edge-caching, regional-caching); Also the Origin servers can send minimum expiration time and maximum expiration time over HTTP Headers to define Caching TTL.
If however, you don't want to use the advantage of High Speed Backbone Network, you should consider application design of your endpoints and functionality keeping latency as a constraint; and use appropriate TTL for caching of objects and define appropriate caching strategy, keeping in mind the R/W ratio of your application.

OPC UA server historical access support

We are looking for a solution for collecting data from different SCADA systems. It seems that OPC UA is a good approach for that. Data collection will be done from a single system to multiple SCADA systems over the internet (https). So, we are planning to develop a OPC UA client that can connect to multiple OPC UA servers. Data will be collected with a given interval. The system should be able to handle if the connection between client and server is lost for a period of time. In that case, I assume we need to get the data by looking into historical data. Hence, we need a server that support HA (Historical Access).
Are there any servers supporting this or do we need to develop our own server implementation?
Or is there a better approach than the one described above?
Any help or hints on this would be appreciated.
How long would you expect the connection to be down for?
While leaning on HA is certainly one way to handle this, I think you'll have a hard time finding any products on the market right now that are actually implementing HA.
Luckily, you can probably deal with this scenario without HA. If you create your subscriptions with a generous lifetime, and create your monitored items with a queue size that, based on the sampling interval, could hold enough data changes to span up to that lifetime, then you upon reconnecting you should receive all of the data changes that happened while the connection between client and server was lost.
If the connection is expected be down for days/weeks/months, then this won't work without support for durable subscriptions, introduced in UA 1.03, but then again you're limited by finding a server that supports durable subscriptions. (Durable subscriptions are basically just a way to make the lifetime of target subscriptions much longer than normally allowed and to instruct the server that they are expected to persist these subscriptions to disk and restore them in the case of e.g. server restart)

Store-and-forward failover solution for ServiceStack web services

I am developing a customer account system for a chain of recycling centers in the Northwest US. One of our key features is that our customers can set up accounts that are credited with their bottle deposit refunds, instead of always disbursing cash. Customers can also drop off bags of recyclables that are processed on-site and credited. Each center runs near capacity and can physically process cans and bottles when offline, so we don't have a lot of leeway for IT infrastructure to shut down everything when the Internet goes out.
Basically, I've been asked to develop a customer account system that will allow credits from a retail center to be posted to accounts, even if telecommunications with our central server breaks down for a period of hours. This will allow the center to keep processing and crediting customers when the pipes get clogged. Certain transactions, like withdraws, do NOT need to occur in this situation, since we can't accurately get the customer's current balance.
We are a 100% Windows shop, and the IT manager and network admin don't want to get near anything *nix. Each retail center has an on-premise dedicated Windows Server, so that seems like a logical place to start.
I'm a huge fan of ServiceStack, and the REST-ful message-based paradigm seems like might work. I'd create a "Credit" message and send it to the local server. A message broker there would log the request and attempt to forward that message to the central server where it is processed. In case the central server were down, I would rely on the MQ's reliable messaging protocol to hold on to it until telecommunications are restored. The overall anticipated volume is 100s to low 1,000s of messages out of each center, so low by modern computing terms.
The Redis MQ Client / Server for ServiceStack looks interesting, but since the Windows Redis server is explicitly labeled "prototype" and "not production quality", there is a 0% chance of being able to leverage it.
So, ultimately the questions are:
Is a reliable messaging system the right type of solution for this problem? Are there other approaches I should consider?
Are there alternatives to Redis that play well with ServiceStack? Is there a "production quality" NoSQL server replacement I can use on Windows?
I've looked briefly at RabbitMQ. Might that be an option? My Googling doesn't show any active integration between it and ServiceStack, so I'm leery of writing something from the ground up.
Ideally the overhead of my solution is low enough we can perform a synchronous update and return a "current balance" receipt to a customer if everything is working well. Is this a realistic?
A production solution for running Redis on windows is to run redis-server inside a Linux VM on windows with Vagrant.
There is current a feature request to add more MQ Options to ServiceStack. Rabbit MQ is expected to be the next MQ adapter to be supported in future.
As a follow-up, MS Open Tech has released a "production-ready" native implementation of Redis 2.8.9. GitHub link.

Horizontal scalability for distributed apps, how to achieve that?

I would like to disregard web applications here, because to scale them horizontally, ie to use multiple server instances together, it is "sufficient" to just duplicate the server software over the machines and just use a sort of router that forwards requests to the "less busy" server machine.
But what if my server application allows users to engage together in realtime ?
If the response to the request of a certain client X depends on the context of a client Y whose connection is managed by another machine then "inter machines" communication is needed.
I'd like to know the kind of "design solutions" that people has used in such cases.
For example, the people at Facebook must have already encountered such situation when enabling the chat feature of their social app.
Thank you in advance for any advise.
One solution to achive that is to use distibuted caches like memcache (Facebook also uses that aproach).
Then all the information which is needed on all nodes is stored in that cache (and a database if it needs to be permanent) an so all nodes can access that information (with a very small latency between the nodes).
You should consider some solutions that provide transparent horizontal database scalability and guarantee ACID semantics. There are many solutions that offer this at various levels. People at Facebook which you reference have solved the problem by accepting eventual consistency but your question leads me to believe that you can't accept eventual consistency.