Dedicate a node to a stream - Security rules - qliksense

Can anyone let me know how to show a stream only in a specific node
i have a 2 nodes cluster.. and i would like to dedicate RIM01 specific to Stream1. RIM02 to Steam2. Meaning any request to that streams or apps in that stream should go to there nodes
So, if a go to RIM01 the Stream2 should be hidden etc...
Central node
RIM02 -- Repository + Engine
RIM03 -- Repository + Engine + Scheduler
i tried lot of security rules like
Filter : ServerNodeConfiguration_,Stream_
(node.#NodeUse="dev") and (node.#NodeType=stream.#StreamType and !resource.stream.Empty())
or
Filter : ServerNodeConfiguration_,Stream_
((resource.resourcetype = "Nodes" and resource.name="RIM01")) and ((resource.name="test"))
but none of them work :/
Thanks

So, at present, load balancing in Qlik Sense applies to Apps, not Streams. Load Balancing routes apps to servers, whereas security rules govern stream visibility. And, unfortunately, there is not a clean mechanism to use node meta-data in security rules. All in all, there isn't a solution for hiding a stream on a given server.

I have the same issue, you can designate the apps are only readable on single node, so depending on how your user stream rights are configured some users may see an empty stream on the node where the app cannot be accessed.
There's some interesting stuff happening with the multi cloud capability where the concept of streams is now collections, which gives lots more flexibility around this type of thing. Alas QEFE capability is only just come with June 2018, and access is limited to certain use cases / customers.

Related

How the dead nodes are handled in AWS OpenSearch?

Trying to understand what is the right approach to connect to AWS OpenSearch (single cluster, multiple data nodes).
To my understanding, as long as data nodes are behind the load balancer (according to this and other AWS docs: https://aws.amazon.com/blogs/database/set-access-control-for-amazon-elasticsearch-service/), we can not use:
var pool = new StaticConnectionPool(nodes);
and we probably should not use CloudConnectionPool - as originally it was dedicated to elastic search cloud and was left in open search client by mistake?
Hence we use SingleNodeConnectionPool and it works, but I've noticed several exceptions, which indicated that node had DeadUntil set to date one hour in advance - so I was wondering if that is expected behavior, as from client's perspective that is the only node it knows about?
What is correct way to connect to AWS OpenSearch that has multiple nodes and should I be concerned about DeadUntil property?

How to effectively use Worker, WorkflowClient

Product Use Case - Our product has a typical use case where we will be having n no of users. Each user will have n no of workflows and each workflow can be run at any time(n of time).
I hope this is a typical use case of any workflow product.
can I use a domain to differentiate users (I mean to say that creating a domain per user)?
Can I create one WorkflowClient per user to serve all his workflow executions? Or for each request should I need to create one WorkflowClient? which one is a recommended approach?
What is the recommended approach in creating Worker objects to poll task list?
Please don't mistake me If I have asked anything meaningless
can I use a domain to differentiate users (I mean to say that creating a domain per user)?
Yes, especially when these users are working in different teams or product, using different domain will avoid workflowName/IDs conflicting each others, and also assign independent number of quotas for managing traffic.
Can I create one WorkflowClient per user to serve all his workflow executions? Or for each request should I need to create one WorkflowClient? which one is a recommended approach?
Use one WorkflowClient for each domain, but let all WorkflowClients on the same instance share the same TChannelService to save the TCP connection.
I would start with a single namespace (domain) for all users. Unless your users directly operate their workflow implementations it doesn't buy you much to use multiple namespaces.

Query node-label topology from Yarn via REST API [MapR 6.1/Hadoop-2.7]

There is a Java and CLI-interface to query Yarn RM for node-to-nodelabel (and inverse) mappings. Is there a way to do this via the REST-API as well?
An initial RM-API search revealed only node-label based job submissions as an option.
Sadly that is actually broken in MapR-Hadoop (6.1 as of 6/6/19), so my code has to work around that, by implementing the correct scheduling itself. This works (barely - more broken APIs here as well) using the YarnClient Java API.
But as I want to schedule jobs against different resource managers at the same time, behind firewalls, the REST-API is the most compelling option to achieve this, and the YarnClient API's RPC backend can't be easily transported.
My current worst-case solution would be to parse the YARN-WebUI in some way.
The only solution I found so far:
Request /ws/v1/cluster/nodes - this gets you all nodes.
FlatMap/Distinct on each node's nodeLabels, if you need just the list of node labels. Filter by nodeLabel, if you need all nodes for a specified label.
This does mean, that you always have to query all nodes, then sort/filter/arrange by NodeLabels, which is a lot of client-side magic. But apparently there's no GetNodesToLabel or even GetClusterNodeLabels to help us out.
I assume getLabelsToNodes is just a client-side implementation, as the protocol doesn't define the API, so that's right out the window for REST, unless implemented in the WebService.

is it possible to copyObject from one cloud object storage instance to another. The buckets are in different regions

I would like to use the node sdk to implement a backup and restore mechanism between 2 instances of Cloud Object Storage. I have added a service ID to the instances and added a permissions for the service id to access the buckets present in the instance i want to write to. The buckets will be in different regions. I have tried a variety of endpoints both legacy and non-legacy private and public to achieve this but i usually get Access Denied.
Is what I am trying to do possible with the sdk? if so can someone point me in the right direction?
var config = {
"apiKeyId": "xxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxx",
"endpoint": "s3.eu-gb.objectstorage.softlayer.net",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxx:xxxxxxxxxxx::",
"iam_apikey_name": "auto-generated-apikey-xxxxxxxxxxxxxxxxxxxxxx",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/0xxxxxxxxxxxxxxxxxxxx::serviceid:ServiceIdxxxxxxxxxxxxxxxxxxxxxx",
"serviceInstanceId": "crn:v1:bluemix:public:cloud-object-storage:global:a/xxxxxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxx::",
"ibmAuthEndpoint": "iam.cloud.ibm.com/oidc/token"
}
This should work as long as you are able to properly grant the requesting user access to be able to read the source of the put-copy, so long as you are not using KeyProtect based keys.
So the breakdown here is a bit confusing due to some unintuitive terminology.
A service instance is a collection of buckets. The primary reason for having multiple instances of COS is to have more granularity in your billing, as you'll get a separate line item for each instance. The term is a bit misleading, however, because COS is a true multi-tenant system - you aren't actually provisioning an instance of COS, you're provisioning a sort of sub-account within the existing system.
A bucket is used to segment your data into different storage locations or storage classes. Other behavior, like CORS, archiving, or retention, acts on the bucket level as well. You don't want to segment something that you expect to scale (like customer data) across separate buckets, as there's a limit of ~1k buckets in an instance. IBM Cloud IAM treats buckets as 'resources' and are subject to IAM policies.
Instead, data that doesn't need to be segregated by location or class, and that you expect to be subject to the same CORS, lifecycle, retention, or IAM policies can be separated by prefix. This means a bunch of similar objects share a path, like foo/bar and foo/bas have the same prefix foo/. This helps with listing and organization but doesn't provide granular access control or any other sort of policy-esque functionality.
Now, to your question, the answer is both yes and no. If the buckets are in the same instance then no problem. Bucket names are unique, so as long as there isn't any secondary managed encryption (eg Key Protect) there's no problem copying across buckets, even if they span regions. Keep in mind, however, that large objects will take time to copy, and COS's strong consistency might lead to situations where the operation may not return a response until it's completed. Copying across instances is not currently supported.

How to merge/consolidate responses from multiple RESTful microservices?

Let's say there are two (or more) RESTful microservices serving JSON. Service (A) stores user information (name, login, password, etc) and service (B) stores messages to/from that user (e.g. sender_id, subject, body, rcpt_ids).
Service (A) on /profile/{user_id} may respond with:
{id: 1, name:'Bob'}
{id: 2, name:'Alice'}
{id: 3, name:'Sue'}
and so on
Service (B) responding at /user/{user_id}/messages returns a list of messages destined for that {user_id} like so:
{id: 1, subj:'Hey', body:'Lorem ipsum', sender_id: 2, rcpt_ids: [1,3]},
{id: 2, subj:'Test', body:'blah blah', sender_id: 3, rcpt_ids: [1]}
How does the client application consuming these services handle putting the message listing together such that names are shown instead of sender/rcpt ids?
Method 1: Pull the list of messages, then start pulling profile info for each id listed in sender_id and rcpt_ids? That may require 100's of requests and could take a while. Rather naive and inefficient and may not scale with complex apps???
Method 2: Pull the list of messages, extract all user ids and make bulk request for all relevant users separately... this assumes such service endpoint exists. There is still delay between getting message listing, extracting user ids, sending request for bulk user info, and then awaiting for bulk user info response.
Ideally I want to serve out a complete response set in one go (messages and user info). My research brings me to merging of responses at service layer... a.k.a. Method 3: API Gateway technique.
But how does one even implement this?
I can obtain list of messages, extract user ids, make a call behind the scenes and obtain users data, merge result sets, then serve this final result up... This works ok with 2 services behind the scenes... But what if the message listing depends on more services... What if I needed to query multiple services behind the scenes, further parse responses of these, query more services based on secondary (tertiary?) results, and then finally merge... where does this madness stop? How does this affect response times?
And I've now effectively created another "client" that combines all microservice responses into one mega-response... which is no different that Method 1 above... except at server level.
Is that how it's done in the "real world"? Any insights? Are there any open source projects that are built on such API Gateway architecture I could examine?
The solution which we used for such problem was denormalization of data and events for updating.
Basically, a microservice has a subset of data it requires from other microservices beforehand so that it doesn't have to call them at run time. This data is managed through events. Other microservices when updated, fire an event with id as a context which can be consumed by any microservice which have any interest in it. This way the data remain in sync (of course it requires some form of failure mechanism for events). This seems lots of work but helps us with any future decisions regarding consolidation of data from different microservices. Our microservice will always have all data available locally for it process any request without synchronous dependency on other services
In your case i.e. for showing names with a message, you can keep an extra property for names in Service(B). So whenever a name update in Service(A) it will fire an update event with id for the updated name. The Service(B) then gets consumes the event, fetches relevant data from Service(A) and updates its database. This way even if Service(A) is down Service(B) will function, albeit with some stale data which will eventually be consistent when Service(A) comes up and you will always have some name to be shown on UI.
https://enterprisecraftsmanship.com/2017/07/05/how-to-request-information-from-multiple-microservices/
You might want to perform response aggregation strategies on your API gateway. I've written an article on how to perform this on ASP.net Core and Ocelot, but there should be a counter-part for other API gateway technologies:
https://www.pogsdotnet.com/2018/09/api-gateway-response-aggregation-with.html
You need to write another service called Aggregator which will internally call both services and get the response and merge/filter them and return the desired result. This can be easily achieved in non-blocking using Mono/Flux in Spring Reactive.
An API Gateway often does API composition.
But this is typical engineering problem where you have microservices which is implementing databases per service pattern.
The API Composition and Command Query Responsibility Segregation (CQRS) pattern are useful ways to implement queries .
Ideally I want to serve out a complete response set in one go
(messages and user info).
The problem you've described is what Facebook realized years ago in which they decided to tackle that by creating an open source specification called GraphQL.
But how does one even implement this?
It is already implemented in various popular programming languages and maybe you can give it a try in the programming language of your choice.