Use same quartz instance both clustered and non clustered way for same set of Quartz table - quartz-scheduler

I have a requirement where some of my quartz job should run in Clustered way (Only one node out of three should run the job) and some jobs to run in non clustered way (All 3 nodes out of 3 should run the job).
Now my questions is can I use the same set of tables in a data source for both this requirement.
Here is what i can do for achieving the same.
2 quartz.properties one for clustered instance and one for non clustered.
Both instances of cluster will start at application startup.
So the jobs configured under the non clustered scheduler will be saved with the name of scheduler as NON_CLST_SCHE in jobs table, in the same table under different scheduler name.
Is this the right way to use the quartz ? Do we face any problem of data corruption problems?
As per quartz Documentation # http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/tutorial-lesson-11.html it says that
Never fire-up a non-clustered instance against the same set of tables that any other instance is running against. You may get serious data corruption, and will definitely experience erratic behavior.
Now if above explanation is true what is the way out for my requirement.
Any help is much appreciated, thanks in advance !

I think that your approach is fine (assuming that all non clustered schedulers that access the same database have a unique scheduler name).
In my oppinion the warning refers to the case when you have multiple, non clustered instances with the same scheduler-name that run against the same database. E.g. a scheduler could only see jobs, triggers and so on from a JDBC-Jobstore if the associated SchedulerName of that job (trigger,...) matches that of the scheduler.

Related

citus: Can I add one more replica for a distributed table

I have a distributed table,but this table only has one replica,only one replica doesn't have
ha, I want and one more replica for the table,can i? how to do?
I have search online help docs,but didn't find any solution.
Replica's in Citus are not an HA solution. For HA you will need to setup any postgres tooling for every member in your cluster to stream WAL to another node. Citus specializes in distributed queries and separates that problem from HA by relying on proven technology available in the postgres ecosystem.
If you want to scale out reads adding a replica can help. However adding replica's have a significantly high impact on write throughput. Before adding replica's please thoroughly test that your database can handle your expected load. And yet again, if HA is your goal, don't add Citus replica's instead, apply postgres HA solutions to every worker and coordinator.
Increasing the replica count of an already distributed table is due to above reasoning not an operation Citus provides out of the box. Easiest would be to create a new table and use an INSERT INTO SELECT clause to reinsert the data into a table with appropriate shard_count and replica's according to your application needs.

Handle replication lag in data pipeline

Given a pipeline of tasks that run in sequence. Each task consumes data from the database, manipulate it, and produces (write) to the same database.
We are using AWS RDS Aurora, and in order to spread the load, the “reading phase” of each task is done within the read replica.
In some cases of high loads, we reach replication lag of 10-15 seconds. This means that by the time the new task consume data, it gets wrong/missing data points.
We know this is not the “right” way to design such pipeline, and it contradicts the idiom “Do not communicate by sharing memory; instead, share memory by communicating”.
Since it’s too much effort to change the design now, we come up with alternative solution:
Create a service that check replication lag value and expose it to all tasks. If the value is greater than x, task will fallback to read from RDS master node.
This is not optimal, and I would like to hear other solution to bypass this issue.
It is worth mentioning that we are using Celery (& Python) to construct such workflow and each task is unaware of the tasks that ran previously.
There will always be data which is inserted into the database but not yet visible, either because it wasn't committed yet, it was committed after your snapshot was started, or due to replication lag. The only real solution is to make your tasks robust to this inevitability.
Create a service that check replication lag value and expose it to all tasks. If the value is greater than x, task will fallback to read from RDS master node.
You want to shed load from the master until the first sign of trouble, then you want to suddenly dump all the load back onto it?
Create a service that check replication lag value and expose it to all tasks. If the value is greater than x, task will fallback to read from RDS master node
Depending on the cause of your replication lag this might make things worse due to further increasing the load on the master node.
If your pipeline allows it you could wait in Task A, after write, until the data propagated to the read replica.

Dynamic number of replicas in a Kubernetes cron-job

I've been looking for days for a way to set-up a cron-job with a dynamic number of jobs.
I've read all these solutions and it seems that, in order to initialise a dynamic number of jobs, I need to do it manually with a script and a job template, but I need it to be automatic.
A bit of context:
I have a database / message queue / whatever can store "items"
I would like to start a job (so a single replica of a container) every 5 minutes to process each item
So, let's say there is a Kafka topic / a db table / a folder containing 5 records / rows / files, I would like Kubernetes to start 5 replicas of the job (with the cron-job) automatically. After 5 minutes, there will be 2 items, so Kubernetes will just start 2 replicas.
The most feasible solution seems to be using a static number of pods and make them process multiple items, but I feel like there is a better way to accomplish my desire keeping it inside Kubernetes that I can't figure due to my lack of experience. 🤔
What would you do to solve this problem?
P.S. Sorry for my English.
There are two ways I can think of:
Using a CronJob that is parallelised (1 work-item/pod or 1+ work-items/pod). This is what you're trying to achieve. Somewhat.
Using a data processing application. This I believe is the recommended approach.
Why and Why Not CronJobs
For (1), there are a few things that I would like to mention. There is no upside to having multiple Job/CronJob items when you are trying to perform the same operation from all of them. You think you are getting parllelism, but not really, you are only increasing management overhead. If your workload grows too large (which it will) there will be too many Job objects in the cluster and the API server will be slowed down drastically.
Job and CronJob items are only for stand-alone work items that need to be performed regularly. They are house-keeping tasks. So, selecting CronJobs for data processing is a very bad idea. Even if you run a parallelized set of pods (as provided here and here in the docs like you mentioned), even then, it would be best suited to have a single Job that handles all the pods that are working on the same work-item. So, you should not be thinking of "scaling Jobs" in those terms. Instead, think of scaling Pods. So, if you really want to move ahead with utilizing the Job and CronJob mechanisms, go ahead, the MessageQueue based design is your best bet. And you will have to reinvent a lot of wheels to get it to work (read below why that is the case).
Recommended Solution
For (2), I only say this since I see you are trying to perform data processing and doing this with a one-off mechanism like a Job will not be a good idea (Jobs are basically stateless, since they perform an operation that can be repeated simply without any repercussions). Say you start a pod, it fails processing, how will other pods know that this item was not processed successfully? What if the pod dies, the Job cannot keep track of the items in your data store, since the Job is not aware of the nature of the work you're performing. Therefore, it is natural for you to pursue a solution where the system components are specifically designed for data processing.
You will want to look into a system that can understand the nature of your data, how to keep track of the processing queues that have been finished successfully, how to restart a new Pod with the same item as input, from the Pod that just crashed etc. This is a lot of application/use-case specific functionality that is best served through the means of an operator or a CustomResource and a controller. And obviously, since this is not a new problem, there is a ton of solutions out there that can perform this the best way for you.
The best course of action would be to have that system in place, deployed with the means of a Deployment pattern, where auto-scaling would be enabled and you will achieve real parallelism that will also be best suited for data processing batch jobs.
And remember, when we talk about scaling in Kubernetes, it is always the pods that scale, not containers, not deployments, not services. Always Pods. That is because at the bottom of the chain, there is always a Pod somewhere that is working on something be it a Job that owns it, or a Deployment or a Service a DaemonSet or whatever. And it is obviously a bad idea to have multiple application containers in a Pod due to so many reasons. (side-car and adapter patterns are just helpers, they don't run the application).
Perhaps this blog that discusses data processing in Kubernetes can help.

kdb+ replication of RDB and HDB

I am trying to figure out how to implement/configure replication of RDB and HDB of kdb+
How to configure RDB and HDB to have two instances with the same data on different hosts?
The simplest approach would be to have the tickerplant log location and HDB location cross-mounted and accessible from both hosts. Then the RDB instance on the second host just has to replay the tickerplant log as usual, and subscribe to the same tickerplant - but the important thing is to not have the second RDB do an end-of-day writedown. Its .u.end should just clear the data from memory.
The second HDB wouldn't have any special conditions that I can think of, however in order for the instance to automatically refresh/reload (and pick up the newest date slice) then the original RDB which is doing the write-down would also need to trigger the refresh/reload of this second HDB.
Couple of other approaches that you could try. Since KDB does not have a replication management system there are chances of data loss or data out of order on replicas. So you will have to think how to handle those situations.
One easy solution to handle those situations is to maintain sequence number for each update. In case replica detects missing sequence or out of order sequence, it can ask primary to send the data for those missing sequences.
Chained ticker plant: You could try setup similar to chained tickerplant style. Secondary ticker plant subscribes to primary and then secondary setup run like a normal setup.
Secondary RDB/TP subscribes to Primary RDB: Secondary RDB or TP gets updates from primary RDB. Secondary HDB would work normal.
Separate process for replication management: If you want multiple replicas then it would increase load on primary if all of them connect to primary. Also, managing data loss scenarios would be difficult.
Instead, you could create a separate process and all replicas will subscribe to this. Primary TP/RDB will send data to this manager service which can then take care of replicas. This way you can also keep all replica issues related handling logic at one place.
Another benefit of this approach is this does not require to change primary TP/RDB services/logic to handle data issues. Manager service will handle everything.

In Oracle RAC, will an application be faster, if there is a subset of the code using a separate Oracle service to the same database?

For example, I have an application that does lots of audit trails writing. Lots. It slows things down. If I create a separate service on my Oracle RAC just for audit CRUD, would that help speed things up in my application?
In other words, I point most of the application to the main service listening on my RAC via SCAN. I take the subset of my application, the audit trail data manipulation, and point it to a separate service listening but pointing same schema as the main listener.
As with anything else, it depends. You'd need to be a lot more specific about your application, what services you'd define, your workloads, your goals, etc. Realistically, you'd need to test it in your environment to know for sure.
A separate service could allow you to segregate the workload of one application (the one writing the audit trail) from the workload of other applications by having different sets of nodes in the cluster running each service (under normal operation). That can help ensure that the higher priority application (presumably not writing the audit trail) has a set amount of hardware to handle its workload even if the lower priority thread is running at full throttle. Of course, since all the nodes are sharing the same disk, if the bottleneck is disk I/O, that segregation of workload may not accomplish much.
Separating the services on different sets of nodes can also impact how frequently a particular service is getting blocks from the local node's buffer cache rather than requesting them from the other node and waiting for them to be shipped over the interconnect. It's quite possible that an application that is constantly writing to log tables might end up spending quite a bit of time waiting for a small number of hot blocks (such as the right-most block in the primary key index for the log table) to get shipped back and forth between different nodes. If all the audit records are being written on just one node (or on a smaller number of nodes), that hot block will always be available in the local buffer cache. On the other hand, if writing the audit trail involves querying the database to get information about a change, separating the workload may mean that blocks that were in the local cache (because they were just changed) are now getting shipped across the interconnect, you could end up hurting performance.
Separating the services even if they're running on the same set of nodes may also be useful if you plan on managing them differently. For example, you can configure Oracle Resource Manager rules to give priority to sessions that use one service over another. That can be a more fine-grained way to allocate resources to different workloads than running the services on different nodes. But it can also add more overhead.