Reactor Flux how to publish in parallel - reactive-programming

Given such example
Flux.range(0, 10)
.publishOn(Schedulers.parallel())
.subscribeOn(Schedulers.parallel())
.doOnNext(i -> System.out.println(i + " " + Thread.currentThread().getName()))
.subscribe();
I thought it would publish all items in parallel and in different threads. But it actually processed in single thread sequentially.
0 parallel-2
1 parallel-2
2 parallel-2
3 parallel-2
4 parallel-2
5 parallel-2
6 parallel-2
7 parallel-2
8 parallel-2
9 parallel-2
I know we can use .parallel().runOn(Schedulers.parallel()), but I want to know how to achieve this with publishOn and subscribeOn. Otherwise would subscribeOn and publishOn always execute on single thread? then what is the purpose of Schedulers.parallel()?

Related

How we schedule Azure Data Factory Pipeline only for first 5 days of every month

How we schedule Trigger in Azure Data Factory for Pipeline only first 5 days of every month
Month Day Day Day Day Day
Jan 1 2 3 4 5
Feb 1 2 3 4 5
........................
DEC 1 2 3 4 5
Create new Trigger and select schedule for trigger type.
In recurrence, select to run every 1 month.
In Advanced recurrence options, select the days of the month to execute the pipeline.

Start with 10 user and add every 5 min 50 user gatling scala

I am beginner in GATLING. I want to execute performance scenarios with below expectations:
Start with 10 user and add every 5 min 50 user. This means:
0 min: 10 user (at once)
5 - 10 min: 60 user
10 - 15 min: 110 user
15 - 20 min: 160 user
below is my simulation setup:
setUp(Scenarios.inject(
nothingFor(5 seconds),
atOnceUsers(Environment.atOnceusersCount.toInt),
rampUsers(Environment.rampUsersCount.toInt) during (Environment.durationForRampusers.toInt seconds)
)
where rampUsersCount = 2
durationForRampusers = 10 seconds
I want to understand how to increase rampUsers count gradually.
Use either incrementUsersPerSec or incrementConcurrentUsers, depending on if you want an open or closed workload model, see official documentation.

Using partitions (window functions) in combination with aggregations in MongoDB

In MongoDB I have documents like below (I cross the names of calls for confidentiality):
Now I need to build a query to return results grouped by the name of the call and for each type of call I need to get the number of calls by month, day and hour. Also, in this query I need to indicate a range between two dates (including time).
In SQL server this is done using window functions (partitions) in combination with aggregations but how can I do the same in Mongo?
I am using MongoDB compass as mongo client.
I need to obtain something as below:
call name month day hour #ByMonth #ByDay #ByHour
GetEmployee January 1 14 10 6 1
GetEnployee January 1 18 10 6 5
GetEmployee January 3 12 10 4 4
GetEmployee March 5 20 8 8 8
GetEmployee April 12 17 45 35 35
GetEmployee April 20 10 45 10 10
For example, for GetEmployee call the distribution is as below:
10 calls done in January
8 calls done in March
45 calls done in April
For the January, the 10 calls are being distributed as below:
6 calls done on 1st January(these 6 calls are distributed as follows: 1 call at 14h and 5 calls at 18h)
4 calls done on 3rd January(these 4 calls are all done at 12h)
and so on for the rest of months.
For example, in SQL Server, if I have below table:
processName initDateTime
processA 2020-06-15 13:31:15.330
processB 2020-06-20 10:00:30.000
processA 2020-06-20 13:31:15.330
...
and so on
The SQL query is:
select
processName,
month(initDateTime),
day(initDateTime),
datepart(hour, initDateTime),
sum(count(*)) over(partition by processName, year(initDateTime), month(initDateTime)) byMonth,
sum(count(*)) over(partition by processName, year(initDateTime), month(initDateTime), day(initDateTime)) byDay,
count(*) byHour
from mytable
group by
processName,
year(initDateTime),
month(initDateTime),
day(initDateTime),
datepart(hour, initDateTime)
So How to do the same in Mongo? above processName and initDateTime fields would be "call" and "created" attributes respectively in mongodb.

Is it possible to aggregate MongoDB data by "clumps" of timestamps?

Can MongoDB's aggregate pipeline or MapReduce functionality query for many documents, each with timestamps, and group them on timestamps that are within a set range of each other, essentially "clumping" them logically?
For example, if I have 5 documents with the following timestamps:
2017-10-15 11:25:00
2017-10-15 11:28:00
2017-10-15 14:59:00
2017-10-15 15:01:00
2017-10-15 15:06:00
2017-10-15 15:13:00
And my goal is to group them into the following documents:
2017-10-15 11:28:00 (2 events)
2017-10-15 15:06:00 (3 events)
2017-10-15 15:13:00 (1 event)
What I'm logically doing is saying "Group all documents timestamped within 5 minutes of each other".
It's important to note that I'm not just saying "group documents by every five minutes", because if that were the case (assuming the five minute groups started on the hour) documents 3 and 4 would probably not be in the same group.
The other tricky thing is handling documents 3 and 4 in relation to 5. There is more than 5 minutes between documents 3 and 5, but since there is less than 5 minutes between 3 and 4, and between 4 and 5, they'd all get grouped.
(I think it would maybe be OK to get this working without that caveat, where the group had 5 as separate from 3 and 4, because the group starts at 3 and goes for a set "5 minutes", excluding 5. But it would be awesome if the group could be extended like that.)

pandas: unwanted format result from groupby... How do I groupby().sum() to give tabular structure

I've searched through the pandas docs and unfortunately, I could not find the answer.
Essentially, after some data wrangling, I have the dataframe
ticker_id close_date sector sector_index
0 1 2014-02-28 00:00:00 Consumer Goods 31.106653
1 1 2014-02-27 00:00:00 Consumer Goods 30.951213
2 2 2014-02-28 00:00:00 Consumer Goods 19.846387
3 2 2014-02-27 00:00:00 Consumer Goods 19.671747
4 3 2014-02-28 00:00:00 Consumer Goods 1208.552000
5 3 2014-02-27 00:00:00 Consumer Goods 1193.352000
6 4 2014-02-28 00:00:00 Consumer Goods 9.893989
7 4 2014-02-27 00:00:00 Consumer Goods 9.857385
8 5 2014-02-28 00:00:00 Consumer Goods 52.196757
9 5 2014-02-27 00:00:00 Consumer Goods 53.101520
10 6 2014-02-28 00:00:00 Services 5.449554
11 6 2014-02-27 00:00:00 Services 5.440019
12 7 2014-02-28 00:00:00 Basic Materials 4149.237000
13 7 2014-02-27 00:00:00 Basic Materials 4130.704000
And I ran groupby
df_all2 = df_all.groupby(['close_date','sector']).sum()
print df_all2
And the outcome is this
ticker_id sector_index
close_date sector
2014-02-27 Basic Materials 7 4130.704000
Consumer Goods 15 1306.933865
Services 6 5.440019
2014-02-28 Basic Materials 7 4149.237000
Consumer Goods 15 1321.595786
Services 6 5.449554
But in this form, I can't upload to mysql properly. So in order to upload to mysql properly, I need to do this and a few other things.
data2 = list(tuple(x) for x in df_all2.values)
but data2 has meaningless garbage.
To make a long story short, how can I get groupby to give me the following outcome (where the close_date are all filled in properly and the column headings are tabular).
close_date sector ticker_id sector_index
2014-02-27 Basic Materials 7 4130.704000
2014-02-27 Consumer Goods 15 1306.933865
2014-02-27 Services 6 5.440019
2014-02-28 Basic Materials 7 4149.237000
2014-02-28 Consumer Goods 15 1321.595786
2014-02-28 Services 6 5.449554
Also, to help the community, how should I modify the title so that other pandas users facing this issue can find your solution, too? I really appreciate your help.
You have to reset_index on a MultiIndex before using to_sql*:
In [11]: df.groupby(['close_date','sector']).sum().reset_index()
Out[11]:
close_date sector ticker_id sector_index
0 2014-02-27 Basic Materials 7 4130.704000
1 2014-02-27 Consumer Goods 15 1306.933865
2 2014-02-27 Services 6 5.440019
3 2014-02-28 Basic Materials 7 4149.237000
4 2014-02-28 Consumer Goods 15 1321.595786
5 2014-02-28 Services 6 5.449554
Alternatively you can use as_index=False in the groupby:
In [12]: df.groupby(['close_date','sector'], as_index=False).sum()
Out[12]:
close_date sector ticker_id sector_index
0 2014-02-27 Basic Materials 7 4130.704000
1 2014-02-27 Consumer Goods 15 1306.933865
2 2014-02-27 Services 6 5.440019
3 2014-02-28 Basic Materials 7 4149.237000
4 2014-02-28 Consumer Goods 15 1321.595786
5 2014-02-28 Services 6 5.449554
*Note: this should be fixed from 0.14 onwards i.e. you should be able to save a MultiIndex to sql.
See How to insert pandas dataframe via mysqldb into database?.