vert.x setup for high load server - vert.x

I have server with 8 CORES
In that server I have 3000 incoming requests per second
What settings are best to use for best performance?
new DeploymentOptions().setInstances(???);
Do I need 8 instances because of my 8 cores server? By default it's 1.
Is that better to use: .setWorker(true).setWorkerPoolSize(???)
Or I need just keep nothing for that? And if I need to set workerPoolSize how much will it be? By default it's 20.
I can't compare it in production, I need predict for that.
Thanks a lot for answer!
And additional question:
Is the following the right way to making async work in vertx pipeline?
pipeline step call function that return Future and in that Future in onComplete will call ctx.next()
If no, how I can do async work in the middle of pipeline?

You should have 16 instances for 8 cores.
Are you using the executeBlocking functionality? If yes - you should set the worker pool size to 128
And additional question: Is the following the right way to making async work in vertx pipeline? pipeline step call function that return Future and in that Future in onComplete will call ctx.next() If no, how I can do async work in the middle of pipeline?
Question doesn't make sense. You should read the docs before putting a service that serves 3k rps in production

Related

Best practices for Informatica Webservice workflow

I have created a Informatica webservice workflow which takes 1 parameter as input. A Webservice provider source definition is used for this and mapping is a one-way type.
Workflow works fine when parameter is being passed. But when the same workflow is triggered from Informatica Power center directly (in which case no parameters are passed), mapping that contains webservice provider source definition takes 3 minutes to complete (Gives Timeout based commit point in the log).
Is it a good practice to run the webservice workflow from power center directly? And is there a way to improve its performance when triggered from power center directly?
Note: I am trying to use 1 workflow for both - 1) Pass the parameter from web 2) Schedule the workflow in Informatica
Answers to your questions below.
Is it a good practice to run the webservice workflow from power center directly?
Of course it depends on requirement - whether you need to extract data automatically from WS or not. If you pass parameter using some session then i dont see much issue here and your session is completing within time.
So, you can create a new session/command task/shell script to create a param file and then use it in original session so it is passed on to WS.
In a complex scenario, you may have to pass multiple values, in such case, i would recommend to use a parent workflow to call original workflow multiple times and change param every time before call.
Is there a way to improve its performance when triggered from power center directly?
It is really depends on few factors.
The web service - Make sure you are using correct input and output columns. Most of the time WS are sensitive to outside call and you need to choose optimized column to extract data for better performance. You can work with WS admin to know correct column.
If informatica flow is complex then depending on bottle neck transformation/s (source, target, expression, lookup, aggregator, sorter), we can check and take actions.
For lookup, you can add new filter to exclude unwanted data, remove unwanted columns etc.
For aggregator, you can use sorter before to improve perf.
... like this

AWS Lambda throttled queue processing

I plan to fetch a list of records from a web service which limits the number of requests that can be made within a certain time frame.
My idea was to setup a simple pipeline like this:
List of URLs -> Lambda Function to fetch JSON -> S3
The part I'm not sure about is how to feed the list of URLs in rate/time limited blocks, e.g take 5 URLs and spawn 5 lambda functions every second.
Ideally I'd like to start this by uploading/sending/queueing the list once and then just let it do it's thing on its own until it has processed the queue completely.
Splitting the problem in two parts.
Trigger: Lambda supports a wide variety. Look for Using AWS Lambda to process AWS events in Lambda FAQs.
I personally would go with Dynamo DB. But S3 will come in a close second.
There might be other options using other streams like Kinesis, but these seem simpler by far.
Throttling: You can set limits on number of lambda instances.
So e.g. if you go with DDB:
You'll dump all your URLs in to a table one row per URL.
This will create events, one per row.
Each event triggers one Lambda call.
Number of parallel Lambda executions/instances are limited by config.

Adding tasks to celery

Background:
I am using celery for building a scheduling system to Crawl the websites on daily basis.We are crawling about 1 million urls (approx) daily. So it's becoming difficult to handle and manage the things at micro level. Celery is one where we thought could handle the current system in much better way than what it is now.
Problem:
I have 1000 urls for a domain. What I am thinking to do is 1000 urls are equally divided into n equal chunks and then for each chunk, create a task and schedule it using celery.To do this, am not able to create (register) the tasks dynamically. And also I need to ensure the politeness policy over here. How to create the tasks on the fly in celery. There is no documentation for the same.
Am I going in right direction in solving this?
What do you mean by creating tasks on the fly?
You do write a task that crawls the website and call it like that:
crawl_website.delay(url='http://example.com')

avoid that builders will run at the same time in buildbot

I need to get data from a server, and this takes time; usually 30 min or more.
I have a builder that gets data from this server; I would like that no other builders on this slave, will run, if I am still running this builder. Once done, the other builder can run concurrently, respecting my settings related to the max concurrent build.
How do I achieve this? I was looking at locks, but the manual does not have a clear example that show how do I setup a builder to block all the others until is done.
Does anyone has an example that I can use to setup my configuration, and where every piece goes? Thanks
A lock example would be:
import sys
from buildbot import locks
build_lock = locks.SlaveLock("slave_builds",
maxCount=sys.maxint)
c['builders'].append(BuilderConfig(name=exclusive_builder,
slavenames=['my_slave'],
factory=my_factory,
locks=[build_lock.access('exclusive')]))
c['builders'].append(BuilderConfig(name=other_builder,
slavenames=['my_slave'],
factory=my_other_factory,
locks=[build_lock.access('counting')]))
A build with this lock will have exclusive access to the slave.
There are not badly explained, more complex examples in Interlocks section of the documentation

WF performance with new 20,000 persisted workflow instances each month

Windows Workflow Foundation has a problem that is slow when doing WF instances persistace.
I'm planning to do a project whose bussiness layer will be based on WF exposed WCF services. The project will have 20,000 new workflow instances created each month, each instance could take up to 2 months to finish.
What I was lead to belive that given WF slownes when doing peristance my given problem would be unattainable given performance reasons.
I have the following questions:
Is this true? Will my performance be crap with that load(given WF persitance speed limitations)
How can I solve the problem?
We currently have two possible solutions:
1. Each new buisiness process request(e.g. Give me a new drivers license) will be a new WF instance, and the number of persistance operations will be limited by forwarding all status request operations to saved state values in a separate database.
2. Have only a small amount of Workflow Instances up at any give time, without any persistance ofso ever(only in case of system crashes etc.), by breaking each workflow stap in to a separate worklof and that workflow handling each business process request instance in the system that is at that current step(e.g. I'm submitting my driver license reques form, which is step one... we have 100 cases of that, and my step one workflow will handle every case simultaneusly).
I'm very insterested in solution for that problem. If you want to discuss that problem pleas be free to mail me at nstjelja#gmail.com
The number of hydrated executing wokflows will be determined by environmental factors memory server through put etc. Persistence issue really only come into play if you are loading and unloading workflows all the time aka real(ish) time in that case workflow may not be the best solution.
In my current project we also use WF with persistence. We don't have quite the same volume (perhaps ~2000 instances/month), and they are usually not as long to complete (they are normally done within 5 minutes, in some cases a few days). We did decide to split up the main workflow in two parts, where the normal waiting state would be. I can't say that I have noticed any performance difference in the system due to this, but it did simplify it, since our system sometimes had problems matching incoming signals to the correct workflow instance (that was an issue in our code; not in WF).
I think that if I were to start a new project based on WF I would rather go for smaller workflows that are invoked in sequence, than to have big workflows handling the full process.
To be honest I am still investigating the performance characteristics of workflow foundation.
However if it helps, I have heard the WF team have made many performance improvements with the new release of WF 4.
Here are a couple of links that might help (if you havn't seem them already)
A Developer's Introduction to Windows Workflow Foundation (WF) in .NET 4 (discusses performance improvements)
Performance Characteristics of Windows Workflow Foundation (applies to WF 3.0)
WF on 3.5 had a performance problem. WF4 does not - 20000 WF instances per month is nothing. If you were talking per minute I'd be worried.