How google load balancer maintain consistency in routing? - deployment

I have two google cloud instances running on different regions (West and East). Each instances have their own database. I am using Google Load balancer to route the traffic based upon the client's IP address (This is what Google load balancer is doing internally on Network load balancing).
For example, Bob is requesting from the east region and GLB will route the request to east region node only. Similarly, Dave is requesting from the west region and GLB will route the request to west region node.
Scenarios:
1. Bob just sign up and a new entry added to the East region database.
2. Bob tries to fetch his profile but somehow request went to the West (Bob now using a VPN) region and there is no information available.
Is there a way that I can customize GLB? If yes, then I can solve this problem by applying consistency hashing on the load balancer (using userId as a hash function) which will make sure that the request coming from Bob will always go to East region.

You can use the HTTP Load Balancer options: Session Affinity and Client IP Affinity.
There are subtle issues with any method, read the documentation carefully. The biggest issue is for clients behind a NAT (airport, hotel, Starbucks, etc.). Their public IP address is the same for all clients behind the NAT, therefore all traffic will go to the same backend for Client IP based affinity. I recommend using cookies.
Session affinity uses Client IP or Generated Cookie to make traffic decisions. This can keep traffic routed to the same backend.
Session Affinity
Client IP affinity directs requests from the same client IP address to the same backend instance based on a hash of the client's IP address.
Using Client IP Affinity

Related

What is with the reccommendation to use `www` domain on GitHub Pages?

If you look at the documentation for GitHub pages, it has a pretty strong reccommendation to use a www domain for your custom domain site hosted on GitHub Pages.
From here: https://help.github.com/en/articles/about-supported-custom-domains#www-subdomains
A www subdomain is the most commonly used type of subdomain, in which
the www stands for World Wide Web. For example, www.example.com is a
www subdomain because it contains the subdomain part www.
We strongly recommend that you use a www subdomain for these reasons:
It gives your GitHub Pages site the benefit of our Content Delivery Network.
It is more stable because it is not affected by changes to the IP addresses of GitHub's servers.
Pages will load significantly faster because Denial of Service attack protection can be implemented more efficiently.
Does this mean if I do not use a www domain I will not get the benefits of a CDN or DDOS Attack Protection?
What is the technical reason why there is a difference between a www and non-www domain here?
The difference lies in how you point the site to GitHub's servers in DNS.
The simplest use of DNS is to point a domain name, at whatever level, at an IP address, using an A record. The same address will be used by all users, and can be changed only by the owner of the "zone" where the A record was added - in this case you, configuring the zone of your custom domain.
A smarter way is to alias a particular domain name to a different zone - in this case one managed by GitHub - using a CNAME record. The owners of that zone can then change the IP address as needed, and can even give different answers to different users based on their location (which is where the CDN reference comes from).
The key restriction however is that you can't use a CNANE as the root of a zone. See this Server Fault question for the technical details.
If you own "example.com", you can point an A record for the root of that domain at one GitHub IP address (or a few, selected essentially at random by visitors), but will give GitHub more freedom to route the traffic if you point a CNAME for a sub-domain, like "www.example.com".
Some DNS providers offer tools to work around this limitation by adding a fake record for the root of the domain that looks like a CNAME, but may be labelled differently (e.g. ANAME, ALIAS). When asked for the root A record, the DNS provider looks up the actual A records for that domain and returns those. This is useful for records which change over time, but because the address is being looked up by your DNS provider not the actual visitor, it may still prevent GitHub serving the best address for that visitor.
DNS does not provide a reliable mechanism for forwarding on apex/root records (e.g. example.com) but does for subdomains (CNAME). In practice this means that while you can point an A record to a single IP address corresponding to a node on Github's infrastructure, they aren't able to route DNS lookups for your apex record to other IP addresses that are closer to the request (CDN) or use DNS to mitigate the effects of a (D)DoS attack.
Some DNS providers do offer synthetic records (ALIAS, ANAME) that mimic the behavior of a CNAME record with apex domains (e.g. dnsimple), but they're not widely available, introduce additional complexity and latency, and don't provide the same level of geographic routing opportunities to Github et al.

Logging incoming requests at a Pod in Kubernetes

In my cluster, I am interested in service time at the Pod level (time difference between request arrival and departure), thus it would be integral to log incoming and outgoing requests (IP address or request ID, event timestamp).
I am aware of the capability to define custom logs using containers (from Logging Architecture), however I'm looking for a more dynamic solution with a smaller over head (hopefully exposed over the REST API).

Google Cloud Storage static IP address

Is it possible to obtain a static IP address for a Google Cloud Storage bucket for use with DNS? I wish to host it at mydomain.com, and because I also have e-mail at mydomain.com, I cannot use a DNS CNAME record -- I need to use an IP address and an DNS A record.
You can, but doing so requires using Google Cloud Load Balancer: https://cloud.google.com/compute/docs/load-balancing/http/adding-a-backend-bucket-to-content-based-load-balancing. The upside of this approach is that it comes with a number of useful features, like mapping a collection of buckets and compute resources to a single domain, as well as the static IP address you want. The downside is that there's additional cost and complexity.
I recommend just using a subdomain and a CNAME record, if you don't need any of the other features.
Check it with google documentation.
You can manage it on instance page in networking section.

What DNS type should I use to make my REST services available through my domain?

I have some REST services available with on an ip+port address. Now I want to configure a DNS entry to have it available through my domain. I've tried a masked redirect but once I do it I can't access the REST services using the redirected address. What type of DNS entry should I use?
DNS only works at the IP Address level, its only concern is mapping domain names to IP Addresses, there is no way to specify a port number.
If you have a server located at 12.34.56.78, you can use an A record to point to it. There is no way to specify a port in DNS.
Edited to add
While RFC 2782 A DNS RR for specifying the location of services (DNS SRV) does provide a method to use Srv records to specify port numbers, it was ultimately allowed to expire and was never renewed.
Specifically the proposal was rejected because it could break too many things in the HTTP layer.
A message was posted to the IETF message boards explaining the decision.
I was proposing it, but after long discussions in the maillist I've
understood that mandating DNS SRV in WS clients would break too much
assumptions in HTTP world (which commonly just sees above HTTP layer
and not below).
The existence of HTTP proxies is also a big handicap since those
proxies should be upgraded/modified in order to perform DNS SRV
resolution just in case the HTTP request is a WebSocket handshake.
This last argument is enough to not mandate SRV resolution.
(copied from another answer)
There actually is a mechanism called DNS Service Discovery originally specified in RFC 2052 (obsoleted by RFC 2782). This allows for autodiscovery of services through special SRV (type 33) DNS entries, specifying ports and weights (i.e. preferences) for named services. There were some considerations extending this to HTTP URIs, but the respective drafts have ultimately been allowed to expire before they could reach RFC status. Some of the reasons are being mentioned in section 2 of latter one.
While SRV records are seeing active usage in other protocols, HTTP client support for this is quite rare. So if you want to provide your service through a dedicated, non-standard port, your best bet is to specify it in the URL as specified in RFC 3986, section 3.

Tracking requests paths to servers behind a load balancer

Suppose we have two servers A and B behind some load balancer that distributes requests between these servers somehow. What is the best practice of tracking which server processed a request? Suppose we have REST API with one endpoint GET /ping. Is it a good idea to include the host information into headers for example?
What we do usually, is that we configure the LB to include a header only if the client requested it.
When you forge your /ping query, also add a header only known by you , like "X-Debug-Me: true". When this header is present, then either your LB or your server can insert its real hostname into whatever header you want.
Baptiste
are you attempting to track this at the LB or at the origin/API servers?
shouldn't the host information already be in the header? is the LB acting as a reverse-proxy and replacing the requesting hostname with it's own hostname?
i would agree with #baptiste that if you need to track this type of information a custom header is the best way to do it.