How the dead nodes are handled in AWS OpenSearch? - opensearch

Trying to understand what is the right approach to connect to AWS OpenSearch (single cluster, multiple data nodes).
To my understanding, as long as data nodes are behind the load balancer (according to this and other AWS docs: https://aws.amazon.com/blogs/database/set-access-control-for-amazon-elasticsearch-service/), we can not use:
var pool = new StaticConnectionPool(nodes);
and we probably should not use CloudConnectionPool - as originally it was dedicated to elastic search cloud and was left in open search client by mistake?
Hence we use SingleNodeConnectionPool and it works, but I've noticed several exceptions, which indicated that node had DeadUntil set to date one hour in advance - so I was wondering if that is expected behavior, as from client's perspective that is the only node it knows about?
What is correct way to connect to AWS OpenSearch that has multiple nodes and should I be concerned about DeadUntil property?

Related

I can't find CloudWatch metric in Grafana UI query editor/builder

I'm trying to create a Grafana dashboard that will reflect my AWS RDS cluster metrics.
For the simplicity I've chose CloudWatch as a datasource, It works well for showing the 'direct' metrics from the RDS cluster.
Problem is that we've switched to use RDS Proxy due the high number of connections we are required to support.
Now, I'm adjusting my dashboard to reflect few metrics that are lacking, most important is number of actual connections, which in AWS CloudWatch console presented by this query:
SELECT AVG(DatabaseConnections)
FROM SCHEMA("AWS/RDS", ProxyName,Target,TargetGroup)
WHERE Target = 'db:my-db-1'
AND ProxyName = 'my-db-rds-proxy'
AND TargetGroup = 'default'
Problem is that I can't find it anywhere in the CloudWatch Grafana query editor:
The only metric with "connections" is the standard DatabaseConnections which represents the 'direct' connections to the RDS cluster and not the connections to the RDS Proxy.
Any ideas?
That UI editor is generated from hardcoded list of metrics, which may not contain all metrics and dimensions (especially if they have been added recently), so in that case UI doesn't generate them in the selectbox.
But that is not a problem, because that selectbox is not a standard selectbox. It is an input, where you can write your own metric and dimension name. Just click there, write what you need and Hit enter to add (the same is applicable for:
Pro tip: don't use UI query builder (that's for beginners), but switch to Code and write your queries directly (anyway UI builder builds that query under the hood):
It would be nice if you create a Grafana PR - add these metrics and dimensions which are missing in the UI builder to metrics.go.
So for who ever will ever get here you should use ClientConnections and use the ProxyName as the dimension (which I didn't set initially
I was using old Grafana version (7.3.5) which didn't have it built in.

Manually spawn stateful pod instances

I'm working on a project where I need to spawn 1 instance per user (customer).
I figured it makes sense to create some sort of manager to handle that and host it somewhere. Kubernetes seems like a good choice since it can be hosted virtually anywhere and it will automate a lot of things (e.g. ensuring instances keep running on failure).
All entities are in Python and have a corresponding Flask API.
InstanceManager Instance (user1)
.-----------. .--------.
POST /instances/user3 --> | | ---------- | |---vol1
| | '--------'
| | -----.
'...........' \ Instance (user2)
\ .--------.
'- | |---vol2
'--------'
Now I can't seem to figure out how to translate this into Kubernetes
My thinking:
Instance is a StatefulSet since I want the data to be maintained through restarts.
InstanceManager is a Service with a database attached to track user to instance IP (for health checks, etc).
I'm pretty lost on how to make InstanceManager spawn a new instance on an incoming POST request. I did a lot of digging (Operators, namespaces, etc.) but nothing seems straightforward. Namely I don't seem to even be able to do that via kubectl. Am I thinking totally wrong on how Kubernetes works?
I've done some progress and thought to share.
Essentially you need to interact with Kubernetes REST API directly instead of applying a static yaml or using kubectl, ideally with one of the numerous clients out there.
In our case there's two options:
Create a namespace per user and then a service in that namespace
Create a new service with a unique name for each user
The first approach seems more sensible since using namespaces gives a lot of other benefits (network control, resource allocation, etc.).
The service itself can be pointing to a statefulset or a pod depending on the situation.
There's another gotcha (and possibly more). Namespaces, pod names, etc, they all need to conform to RFC 1123. So for namespaces, you can't simply use email addresses or even base64. You'll need to use something like user-100 and have a mapping table to map back to an actual user.

Query node-label topology from Yarn via REST API [MapR 6.1/Hadoop-2.7]

There is a Java and CLI-interface to query Yarn RM for node-to-nodelabel (and inverse) mappings. Is there a way to do this via the REST-API as well?
An initial RM-API search revealed only node-label based job submissions as an option.
Sadly that is actually broken in MapR-Hadoop (6.1 as of 6/6/19), so my code has to work around that, by implementing the correct scheduling itself. This works (barely - more broken APIs here as well) using the YarnClient Java API.
But as I want to schedule jobs against different resource managers at the same time, behind firewalls, the REST-API is the most compelling option to achieve this, and the YarnClient API's RPC backend can't be easily transported.
My current worst-case solution would be to parse the YARN-WebUI in some way.
The only solution I found so far:
Request /ws/v1/cluster/nodes - this gets you all nodes.
FlatMap/Distinct on each node's nodeLabels, if you need just the list of node labels. Filter by nodeLabel, if you need all nodes for a specified label.
This does mean, that you always have to query all nodes, then sort/filter/arrange by NodeLabels, which is a lot of client-side magic. But apparently there's no GetNodesToLabel or even GetClusterNodeLabels to help us out.
I assume getLabelsToNodes is just a client-side implementation, as the protocol doesn't define the API, so that's right out the window for REST, unless implemented in the WebService.

Dedicate a node to a stream - Security rules

Can anyone let me know how to show a stream only in a specific node
i have a 2 nodes cluster.. and i would like to dedicate RIM01 specific to Stream1. RIM02 to Steam2. Meaning any request to that streams or apps in that stream should go to there nodes
So, if a go to RIM01 the Stream2 should be hidden etc...
Central node
RIM02 -- Repository + Engine
RIM03 -- Repository + Engine + Scheduler
i tried lot of security rules like
Filter : ServerNodeConfiguration_,Stream_
(node.#NodeUse="dev") and (node.#NodeType=stream.#StreamType and !resource.stream.Empty())
or
Filter : ServerNodeConfiguration_,Stream_
((resource.resourcetype = "Nodes" and resource.name="RIM01")) and ((resource.name="test"))
but none of them work :/
Thanks
So, at present, load balancing in Qlik Sense applies to Apps, not Streams. Load Balancing routes apps to servers, whereas security rules govern stream visibility. And, unfortunately, there is not a clean mechanism to use node meta-data in security rules. All in all, there isn't a solution for hiding a stream on a given server.
I have the same issue, you can designate the apps are only readable on single node, so depending on how your user stream rights are configured some users may see an empty stream on the node where the app cannot be accessed.
There's some interesting stuff happening with the multi cloud capability where the concept of streams is now collections, which gives lots more flexibility around this type of thing. Alas QEFE capability is only just come with June 2018, and access is limited to certain use cases / customers.

Scala phantom dsl hosts

Taken from
http://outworkers.com/blog/post/a-series-on-phantom-part-1-getting-started-with-phantom
I'm trying to connect to a Cassandra cluster that has multiple nodes like this:
object Defaults {
val hosts = Seq("Cassnode1.company.com", "Cassnode2.company.com", "Cassnode3.company.com")
val Connector = ContactPoints(hosts).keySpace("whatever")
}
If for some reason, one of the nodes does not exist, I get:
Caused by: java.lang.IllegalArgumentException: Cassnode3.company.com: unknown error
If I remove this node from the hosts Seq everything works fine.
I am using phantom dsl version "1.28.12" and I was wondering if this is the expected behavior as I assumed that whenever one of the listed hosts does not exist/is not available the application would use the remaining ones.
Is there a way to test connectivity to the nodes before passing the list to the ContactPoints?
Thanks!
There isn't, the whole point of ContactPoints which just leverage the underlying ClusterBuilder is to deal "with that kind of thing" for you. You can also pass in an error handling function to deal with some of the issues, so that may make things easier.
By gut feeling is that the rest of the nodes have some kind of IP mapping in your /etc/hosts, but you are missing the one for Cassnode3.company.com. Remember they all need to be resolved to an IP address otherwise they are no good to the ClusterBuilder.
I would strongly recommend upgrading to a 2.1.3 version of phantom, but in this particular case the culprit is almost entirely your local dev setup, which just needs an IP mapping for that third URL.