Updating AWS Elasticsearch cluster settings - aws-elasticsearch

By default in Elasticsearch, the maximum number of open scrolls is 500 but I need to increase this number. There s no problem in updating "search.max_open_scroll_context" in local machine but AWS Elasticsearch has not allowed to make changes.
While trying to update with answer given in this thread configure-search-max-open-scroll-context, the response is: {"Message":"Your request: '/_cluster/settings' payload is not allowed."} while I can perform such operation in my local Elasticsearch but AWS Elasticsearch doesn't seems to allow such operation. Does anyone has answer to this for AWS Elasticsearch or have faced similar?

This is restricted in AWS ES for customer end.
You need to reach out to AWS Support Team for this. Just let them know the value of "search.max_open_scroll_context" that you are looking for and they will update it from the backend.

Here the link to AWS-supported operations on elasticsearch.
Currently, AWS doesn't support updating "search.max_open_scroll_context" as of now. You can definitely contact AWS support to increase the scroll context count. Alternatively, you can use Search-After API instead of scroll.

Related

Google Cloud Spanner real time Change Data Capture to PubSub/Kafka through Cloud Data Fusion or Others

I would like to achieve a real time change data capture (log-based preferred) pipeline from Google Cloud Spanner to PubSub/Kafka for my downstream real time applications. Could you please let me know if there is a great and cost-effective way to achieve that? I will appreciate any advice and recommendations.
In addition, for Cloud Data Fusion from google, I noticed that it could achieve real time from mysql/postgresql to cloud spanner, but I did not find the way go from cloud spanner to pubsub/kafka in real time.
Also, I found another two ways, which to be listed here for any comments or suggestions.
Use Debezium, a log-based change data capture Kafka connector from the link https://cloud.google.com/architecture/capturing-change-logs-with-debezium#deploying_debezium_on_gke_on_google_cloud
Create a polling service (which may miss some data) to poll data from cloud spanner from the link: https://cloud.google.com/architecture/deploying-event-sourced-systems-with-cloud-spanner
If you have any suggestion or comment on this, I will be really grateful.
There's a open source implementation of a polling service for Cloud Spanner that can also automatically push changes to PubSub here: https://github.com/cloudspannerecosystem/spanner-change-watcher
It is however not log-based. It has some inherent limitations:
It can miss updates if the same record is updated twice within the polling interval. In that case, only the last value will be reported.
It only supports soft deletes.
You could have a look at the samples to see if it is something that might suit your needs at least to some degree: https://github.com/cloudspannerecosystem/spanner-change-watcher/tree/master/samples
Cloud Spanner has a new feature called Change Streams that would allow building a downstream pipeline from Spanner to PubSub/Kafka.
At this time, there's not a pre-packaged Spanner to PubSub/Kafka connector.
The way to read change streams currently is to use the SpannerIO Apache Beam connector that would allow building the pipeline with Dataflow, or also directly querying the API.
Disclaimer: I'm a Developer Advocate that works with the Cloud Spanner team.

How to fetch already rotated logs in Kubernetes?

Currently I tried to fetch already rotated logs within the node using --since-time parameter.
Can anybody suggest what is the command/mechanism to fetch already rotated logs within kubernetes architecture using commands
You can't. Kubernetes does not store logs for you, it's just providing an API to access what's on disk. For long term storage look at things like Loki, ElasticSearch, Splunk, SumoLogic, etc etc.

Integrating MongoDB cloud with AWS autoscaling

I am absolutely not buying Mongodb Atlas :) I want to make the best I can with MongoDB cloud. Today with MongoDB cloud, when I spin up an AWS instance the automation takes care of joining my MongoDB cloud account w/ the API and group key I've embedded in my system image. What I want to do now is take it to the next level and have that instance register itself as an additional replica of an -existing- replica set and sync up. I can't be the first person to want to do this but I'm coming up empty with Google. Can anyone point me to a gist, blog, rant or other example of how to do this?

Google Cloud SQL Postgres - randomly slow queries from Google Compute / Kubernetes

I've been testing Google Cloud SQL with Postgresql, but I have random queries taking ~3s instead of a few ms.
Troubleshooting I did:
The queries themselves aren't problems, rerunning the same query will work.
Indexes are properly set. The database is also very very small, it shouldn't do this, even if there weren't any index.
The Kubernetes container is connecting to the database through SQL Proxy (I followed this https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine). It is not the problem though as I tried to connect directly to the database, with the same issue.
I configured net.ipv4.tcp_keepalive_time to 60 to make sure the connection weren't dropping.
I also have a pool of connection that are never disconnected to make sure it wasn't from that.
When I run queries directly through my local Postgresql client, I never have the problem.
I don't have this issue when developing locally either and connecting to my local database.
What I'm getting at is: I feel there's some weird connection/link issue between my Google Compute instances and my Google SQL instance that I can't seem to figure out.
Any idea?
Edit:
I also noticed these logs in my SQL Cloud instance every 30s:
ERROR: recovery is not in progress
HINT: Recovery control functions can only be executed during recovery.
STATEMENT: SELECT pg_is_xlog_replay_paused(), current_timestamp
That's an interesting problem you are facing. So my knowledge on Kubernetes isn't that great, but I do have a general understanding so let's see if I can provide some suggestions.
To start with, the API that you linked to in your question does mention that it is still in beta. So I do believe there would still be issues to patch in maximizing speed performance.
Secondly, from what I understand, Kubernetes is a great tool for handling stateless workloads. Thus, handling data where state is required for queries would be a slow operation. This article (although not entirely related) does explain some of the pitfalls of Kubernetes (not all the questions are relevant)
Thirdly, could you explain your use case a little bit? Do you really need to use Kubernetes or will another tool like a powerful Compute Engine Instance or or a Dataflow job resolve the the issue? Are you making your database queries through a programming language or an application call?
Thanks, and do let me know!

High Availability AEM Author

I’ve been working with AEM for over a year now and lately I’ve been trying to move into a high availability setup for author.
My problem is when ever I spin up a server, add sites, and spin up another server the data doesn’t persist to the new instance. I know why this doesn’t work in the traditional setup (repository is stored locally on the file system). However, I’ve attempted using the S3 backend, and it results in the same problem where the data doesn’t persist onto the new instance.
Ive read about using mongoMK (https://helpx.adobe.com/experience-manager/6-3/sites/deploying/using/recommended-deploys.html), I.e. mongodb as a store, but they also recommended using S3 as the backend.
My question is, does anyone have any experience with multiple AEM author instances sharing the same data and node stores, if so do you have any suggestions as to how to get this working or resources where I can read about this?
After further research it seems the only option for backend clustering is to use mongodb. My attempts to use mongodb with AEM as a backend have failed. When I attempt to use the crx3 and crx3mongo run modes it looks like AEM hangs after opening a connection to mongo. I have verified that nothing is getting placed into the DB via a show dbs returning 0.000GB for the corresponfing database.