Difference between Supply method act vs tap - publish-subscribe

In Raku documentation about the Supply method act (vs tap)
https://docs.raku.org/type/Supply#method_act it is stated that:
the given code is guaranteed to be executed by only one thread at a
time
My understanding is that a thread must finish with the specific code object before another thread has to run it.
If that is the case, I stumbled upon a different behavior when I tried to implement the feature. Take a look at the following code snippet, where 2 "acts" are created and run in different threads:
#!/usr/bin/env perl6
say 'Main runs in [ thread : ', +$*THREAD, ' ]';
my $b = 1;
sub actor {
print " Tap_$*tap : $^a ", now;
$*tap < 2 ??
do {
say " - Sleep 0.1";
sleep 0.1
}
!!
do {
say " - Sleep 0.2";
sleep 0.2;
}
$b++;
say " Tap_$*tap +1 to \$b $b ", now;
}
my $supply = supply {
for 1..100 -> $i {
say "For Tap_$*tap [ \$i = $i ] => About to emit : $b ", now;
emit $b;
say "For Tap_$*tap [ \$i = $i ] => Emitted : $b ", now;
done if $b > 5
}
}
start {
my $*tap = 1;
once say "Tap_$*tap runs in [ thread : {+$*THREAD} ]";
$supply.act: &actor
}
start {
my $*tap = 2;
once say "Tap_$*tap runs in [ thread : {+$*THREAD} ]";
$supply.act: &actor
}
sleep 1;
and the result is the following (with added time gaps and comments):
1 Main runs in [ thread : 1 ] - Main thread
2 Tap_1 runs in [ thread : 4 ] - Tap 1 thread
3 For Tap_1 [ $i = 1 ] => About to emit : 1 Instant:1603354571.198187 - Supply thread [for tap 1]
4 Tap_1 : 1 Instant:1603354571.203074 - Sleep 0.1 - Tap 1 thread
5 Tap_2 runs in [ thread : 6 ] - Tap 2 thread
6 For Tap_2 [ $i = 1 ] => About to emit : 1 Instant:1603354571.213826 - Supply thread [for tap 2]
7 Tap_2 : 1 Instant:1603354571.213826 - Sleep 0.2 - Tap 2 thread
8
9 -----------------------------------------------------------------------------------> Time +0.1 seconds
10
11 Tap_1 +1 to $b 2 Instant:1603354571.305723 - Tap 1 thread
12 For Tap_1 [ $i = 1 ] => Emitted : 2 Instant:1603354571.305723 - Supply thread [for tap 1]
13 For Tap_1 [ $i = 2 ] => About to emit : 2 Instant:1603354571.30768 - Supply thread [for tap 1]
14 Tap_1 : 2 Instant:1603354571.30768 - Sleep 0.1 - Tap 1 thread
15
16 -----------------------------------------------------------------------------------> Time +0.1 seconds
17
18 Tap_1 +1 to $b 3 Instant:1603354571.410354 - Tap 1 thread
19 For Tap_1 [ $i = 2 ] => Emitted : 4 Instant:1603354571.425018 - Supply thread [for tap 1]
20 Tap_2 +1 to $b 4 Instant:1603354571.425018 - Tap 2 thread
21 For Tap_1 [ $i = 3 ] => About to emit : 4 Instant:1603354571.425018 - Supply thread [for tap 1]
22 For Tap_2 [ $i = 1 ] => Emitted : 4 Instant:1603354571.425995 - Supply thread [for tap 2]
23 Tap_1 : 4 Instant:1603354571.425995 - Sleep 0.1 - Tap 1 thread
24 For Tap_2 [ $i = 2 ] => About to emit : 4 Instant:1603354571.425995 - Supply thread [for tap 2]
25 Tap_2 : 4 Instant:1603354571.426973 - Sleep 0.2 - Tap 2 thread
26
27 -----------------------------------------------------------------------------------> Time +0.1 seconds
28
29 Tap_1 +1 to $b 5 Instant:1603354571.528079 - Tap 1 thread
30 For Tap_1 [ $i = 3 ] => Emitted : 5 Instant:1603354571.52906 - Supply thread [for tap 1]
31 For Tap_1 [ $i = 4 ] => About to emit : 5 Instant:1603354571.52906 - Supply thread [for tap 1]
32 Tap_1 : 5 Instant:1603354571.53004 - Sleep 0.1 - Tap 1 thread
33
34 -----------------------------------------------------------------------------------> Time +0.1 seconds
35
36 Tap_2 +1 to $b 6 Instant:1603354571.62859 - Tap 2 thread
37 For Tap_2 [ $i = 2 ] => Emitted : 6 Instant:1603354571.62859 - Supply thread [for tap 2]
38 Tap_1 +1 to $b 7 Instant:1603354571.631512 - Tap 1 thread
39 For Tap_1 [ $i = 4 ] => Emitted : 7 Instant:1603354571.631512 - Supply thread [for tap 2]
One can easily observe that the code object (subroutine &actor) is running concurrently in 2 threads (for example see output lines 4 & 7).
Can somebody clarify my misunderstanding about the matter?

There's very rarely any difference between tap and act in everyday use of Raku, because almost every Supply that you encounter is a serial supply. A serial supply is one that already enforces the protocol that a value will not be emitted until the previous one has been processed. The implementation of act is:
method act(Supply:D: &actor, *%others) {
self.sanitize.tap(&actor, |%others)
}
Where sanitize enforces the serial emission of values and in addition makes sure that events follow the grammar emit* [done | quit]. Since these properties are usually highly desirable anyway, every built-in way to obtain a Supply provides them, with the exception of being able to create a Supplier and call unsanitized-supply on it. (Historical note: a very early prototype did not enforce these properties so widely, creating more of a need for a method doing what act does. While the need for it diminished as the design involved into what was eventually shipped in the first language release, it got to keep its nice short name.)
The misunderstanding arises from expecting the serialization of events to be per source, whereas in reality it is per subscription. Consider this example:
my $timer = Supply.interval(1);
$timer.tap: { say "A: {now}" };
$timer.tap: { say "B: {now}" };
sleep 5;
Which produces output like this:
A: Instant:1603364746.02766
B: Instant:1603364746.031255
A: Instant:1603364747.025255
B: Instant:1603364747.028305
A: Instant:1603364748.025584
B: Instant:1603364748.029797
A: Instant:1603364749.026596
B: Instant:1603364749.029643
A: Instant:1603364750.027881
B: Instant:1603364750.030851
A: Instant:1603364751.030137
There is one source of events, but we establish two subscriptions to it. Each subscription enforces the serial rule, so if we do this:
my $timer = Supply.interval(1);
$timer.tap: { sleep 1.5; say "A: {now}" };
$timer.tap: { sleep 1.5; say "B: {now}" };
sleep 5;
Then we observe the following output:
A: Instant:1603364909.442341
B: Instant:1603364909.481506
A: Instant:1603364910.950359
B: Instant:1603364910.982771
A: Instant:1603364912.451916
B: Instant:1603364912.485064
Showing that each subscription is getting one event at a time, but merely sharing an (on-demand) source doesn't create any shared backpressure.
Since the concurrency control is associated with the subscription, it is irrelevant if the same closure clone is passed to tap/act. Enforcing concurrency control across multiple subscriptions is the realm of supply/react/whenever. For example this:
my $timer = Supply.interval(1);
react {
whenever $timer {
sleep 1.5;
say "A: {now}"
}
whenever $timer {
sleep 1.5;
say "B: {now}"
}
}
Gives output like this:
A: Instant:1603365363.872672
B: Instant:1603365365.379991
A: Instant:1603365366.882114
B: Instant:1603365368.383392
A: Instant:1603365369.884608
B: Instant:1603365371.386087
Where each event is 1.5s apart, because of the concurrency control implied by the react block.

Related

Mojolicious: long subroutine have no effect on my variable

I have some trouble with very long subroutine with hypnotoad.
I need to run 1 minute subroutine (hardware connected requirements).
Firsly, i discover this behavior with:
my $var = 0;
Mojo::IOLoop->recurring(60 => sub {
$log->debug("starting loop: var: $var");
if ($var == 0) {
#...
#make some long (30 to 50 seconds) communication with external hardware
sleep 50; #fake reproduction of this long communication
$var = 1;
}
$log->debug("ending loop: var: $var");
})
Log:
14:13:45 2018 [debug] starting loop: var: 0
14:14:26 2018 [debug] ending loop: var: 1 #duration: 41 seconds
14:14:26 2018 [debug] starting loop: var: 0
14:15:08 2018 [debug] ending loop: var: 1 #duration: 42 seconds
14:15:08 2018 [debug] starting loop: var: 0
14:15:49 2018 [debug] ending loop: var: 1 #duration: 42 seconds
14:15:50 2018 [debug] starting loop: var: 0
14:16:31 2018 [debug] ending loop: var: 1 #duration: 41 seconds
...
3 problems:
1) Where do these 42 seconds come from? (yes, i know, 42 seconds is the answer of the universe...)
2) Why the IOLoop recuring loses his pace?
3) Why my variable is setted to 1, and just one second after, the if get a variable equal to 0?
When looped job needs 20 seconds or 25 seconds, no problem.
When looped job needs 60 secondes and used with morbo, no problem.
When looped job needs more than 40 seconds and used with hypnotoad (1 worker), this is the behavior explained here.
If i increase the "not needed" time (e.g. 120 seconds IOLoop for 60 seconds jobs, the behaviour is always the same.
It's not a problem about IOLoop, i can reproduce the same behavior outside loop.
I suspect a problem with worker killed and heart-beat, but i have no log about that.

Does powershell's parallel foreach use at most 5 thread?

The throttlelimit parameter of foreach -parallel can control how many processes are used when executing the script. But I can't have more than 5 processes even if I set throttlelimit greater than 5.
The script is executed in multiple powershell processes. So I check PID inside the script. And then group the PIDs so that I can know how many processes are used to execute the script.
function GetPID() {
$PID
}
workflow TestWorkflow {
param($throttlelimit)
foreach -parallel -throttlelimit $throttlelimit ($i in 1..100) {
GetPID
}
}
foreach ($i in 1..8) {
$pids = TestWorkflow -throttlelimit $i
$measure = $pids | group | Measure-Object
$measure.Count
}
The output is
1
2
3
4
5
5
5
5
For $i less or equal to 5, I have $i processes. But for $i greater than 5, I have only 5 processes. Is there any way to increase the number of processes when executing the script?
Edit: In response to #SomeShinyObject's answer, I added another test case. It's a modification of the example given by #SomeShinyObject. I added a function S, which does nothing but sleeping for 10 seconds.
function S($n) {
$s = Get-Date
$s.ToString("HH:mm:ss.ffff") + " start sleep " + $n
sleep 10
$e = Get-Date
$e.ToString("HH:mm:ss.ffff") + " end sleep " + $n + " diff " + (($e-$s).TotalMilliseconds)
}
Workflow Throttle-Me {
[cmdletbinding()]
param (
[int]$ThrottleLimit = 10,
[int]$CollectionLimit = 10
)
foreach -parallel -throttlelimit $ThrottleLimit ($n in 1..$CollectionLimit){
$s = Get-Date
$s.ToString("HH:mm:ss.ffff") + " start " + $n
S $n
$e = Get-Date
$e.ToString("HH:mm:ss.ffff") + " end " + $n + " diff " + (($e-$s).TotalMilliseconds)
}
}
Throttle-Me -ThrottleLimit 10 -CollectionLimit 20
And here's the output. I grouped the output by time (number of seconds), and did a little bit reorder within each group to make it clearer. It's pretty obvious that function S are called 5 by 5, although I set throttlelimit to 10 (first we have start sleep 6..10, 10 seconds later, we have start sleep 1..5, and 10 seconds later start sleep 11..15, and 10 seconds later start sleep 16..20).
03:40:29.7147 start 10
03:40:29.7304 start 9
03:40:29.7304 start 8
03:40:29.7459 start 7
03:40:29.7459 start 6
03:40:29.7616 start 5
03:40:29.7772 start 4
03:40:29.7772 start 3
03:40:29.7928 start 2
03:40:29.7928 start 1
03:40:35.3067 start sleep 7
03:40:35.3067 start sleep 8
03:40:35.3067 start sleep 9
03:40:35.3692 start sleep 10
03:40:35.4629 start sleep 6
03:40:45.3292 end sleep 7 diff 10022.5353
03:40:45.3292 end sleep 8 diff 10022.5353
03:40:45.3292 end sleep 9 diff 10022.5353
03:40:45.3761 end sleep 10 diff 10006.8765
03:40:45.4855 end sleep 6 diff 10022.5243
03:40:45.3605 end 9 diff 15630.1005
03:40:45.3917 end 7 diff 15645.7465
03:40:45.3917 end 8 diff 15661.3313
03:40:45.4229 end 10 diff 15708.2274
03:40:45.5323 end 6 diff 15786.3969
03:40:45.4386 start sleep 5
03:40:45.4542 start sleep 4
03:40:45.4542 start sleep 3
03:40:45.4698 start sleep 2
03:40:45.5636 start sleep 1
03:40:45.4698 start 11
03:40:45.4855 start 12
03:40:45.5011 start 13
03:40:45.5167 start 14
03:40:45.6105 start 15
03:40:55.4596 end sleep 3 diff 10005.4374
03:40:55.4596 end sleep 4 diff 10005.4374
03:40:55.4596 end sleep 5 diff 10021.0426
03:40:55.4752 end sleep 2 diff 10005.3992
03:40:55.5690 end sleep 1 diff 10005.3784
03:40:55.4752 end 3 diff 25698.0221
03:40:55.4909 end 5 diff 25729.302
03:40:55.5065 end 4 diff 25729.2523
03:40:55.5221 end 2 diff 25729.2559
03:40:55.6159 end 1 diff 25823.032
03:40:55.5534 start sleep 11
03:40:55.5534 start sleep 12
03:40:55.5690 start sleep 13
03:40:55.5846 start sleep 14
03:40:55.6784 start sleep 15
03:40:55.6002 start 16
03:40:55.6002 start 17
03:40:55.6159 start 18
03:40:55.6326 start 19
03:40:55.7096 start 20
03:41:05.5692 end sleep 11 diff 10015.8226
03:41:05.5692 end sleep 12 diff 10015.8226
03:41:05.5848 end sleep 13 diff 10015.8108
03:41:05.6004 end sleep 14 diff 10015.8128
03:41:05.6942 end sleep 15 diff 10015.8205
03:41:05.5848 end 12 diff 20099.3218
03:41:05.6004 end 11 diff 20130.5719
03:41:05.6161 end 13 diff 20114.9729
03:41:05.6473 end 14 diff 20130.5962
03:41:05.6942 end 15 diff 20083.7506
03:41:05.6317 start sleep 17
03:41:05.6317 start sleep 16
03:41:05.6473 start sleep 18
03:41:05.6629 start sleep 19
03:41:05.7411 start sleep 20
03:41:15.6320 end sleep 16 diff 10000.3608
03:41:15.6320 end sleep 17 diff 10000.3608
03:41:15.6477 end sleep 18 diff 10000.3727
03:41:15.6633 end sleep 19 diff 10000.3709
03:41:15.7414 end sleep 20 diff 10000.3546
03:41:15.6477 end 16 diff 20047.4375
03:41:15.6477 end 17 diff 20047.4375
03:41:15.6633 end 18 diff 20047.4295
03:41:15.7101 end 19 diff 20077.5169
03:41:15.7414 end 20 diff 20031.7909
I believe your test is a little flawed. According to this TechNet article, $PID is not an available variable in a Script Workflow.
A better test with an explanation can be found on this site
Basically, the ThrottleLimit is set to a theoretical [Int32]::MaxValue. The poster designed a better test to check parallel processing also (slightly modified):
Workflow Throttle-Me {
[cmdletbinding()]
param (
[int]$ThrottleLimit = 10,
[int]$CollectionLimit = 10
)
foreach -parallel -throttlelimit $ThrottleLimit ($n in 1..$CollectionLimit){
"Working on $n"
"{0:hh}:{0:mm}:{0:ss}" -f (Get-Date)
}
}
I can't speak for that specific cmdlet , but as you found PowerShell only uses one thread per PID. The common ways of getting around this are PSJobs and Runspaces. Runspaces are more difficult than PSJobs, but they also have significantly better performance, which makes them the popular choice. The downside is that your entire code has to resolve around them which means an entire rewrite in most cases.
Unfortunately as much as Microsoft pushes PowerShell, it is not made for performance and speed. But it does as good as it can while tied to .NET

MongoDB benchmarking inserts

I am trying to benchmark MongoDB with the JS harness. I am trying to do inserts. The example given in mongo website.
However, I am trying an insert operation, which works totally fine, but gives out wrong queries/sec.
ops = [{op: "insert", ns: "benchmark.bench", safe: false, doc: {"a": 1}}]
The above works fine. Then, I have run the following in mongo shell:
for ( x = 1; x<=128; x*=2){
res = benchRun( { parallel : x ,
seconds : 5 ,
ops : ops
} )
print( "threads: " + x + "\t queries/sec: " + res.query )
}
It gives out:
threads: 1 queries/sec: 0
threads: 2 queries/sec: 0
threads: 4 queries/sec: 0
threads: 8 queries/sec: 0
threads: 16 queries/sec: 0
threads: 32 queries/sec: 1.4
threads: 64 queries/sec: 0
threads: 128 queries/sec: 0
I dont understand why the queries/sec is 0 and not a single doc has been inserted. Is this right was testing performance for inserts?
Answering because I just encountered a similar problem.
Try replacing your print statement with printjson(res).
You will see that res has the following fields:
{
"note" : "values per second",
"errCount" : NumberLong(0),
"trapped" : "error: not implemented",
"insertLatencyAverageMicros" : 8.173300153139357,
"totalOps" : NumberLong(130600),
"totalOps/s" : 25366.173139864142,
"findOne" : 0,
"insert" : 25366.173139864142,
"delete" : 0,
"update" : 0,
"query" : 0,
"command" : 0
}
As you can see, the query count is 0, hence when you print res.query it gives 0. To get the number of insert operations per second you would want to print res.insert. I believe res.query corresponds to the "find" operation.

I dont understand this little perl code (if ...)

Can someone explain me this short pearl code?
$batstr2 = "empty" if( $status2 & 4 );
What say the if statement ?
Already answered many times, for the case if you don't know what is the Bitwise And, here is a small example:
perl -e 'print "dec\t bin\t&4\n";printf "%d\t%8b\t%-8b\n", $_, $_, ($_ & 4) for (0..8);'
prints:
dec bin &4
0 0 0
1 1 0
2 10 0
3 11 0
4 100 100
5 101 100
6 110 100
7 111 100
8 1000 0
as you can see, when the 3rb bit from right is 1 - the $num & 4 is true.
That's using the if as a statement modifier. It's roughly the same as
if ($status & 4) {
$batstr2 = "empty";
}
and exactly the same as
($status & 4) and ($batstr2 = "empty");
a variety of constructs can be used as statement modifiers, including: if, unless, while, until, for, when. These modifiers can't be stacked (foo() if $bar for #baz won't work), you are limited for one modifer per simple statement.
That's a bitwise and - http://perldoc.perl.org/perlop.html#Bitwise-And . $status2 is being used as a bit mask and it sets $batstr2 to 'empty' if the bit is set.
It sets $batstr2 to "empty" if the 3rd least significant bit of $status2 is set - it is a logical AND mask.

Perl: Devel::Gladiator module and memory management

I have a perl script that needs to run in the background constantly. It consists of several .pm module files and a main .pl file. What the program does is to periodically gather some data, do some computation, and finally update the result recorded in a file.
All the critical data structures are declared in the .pl file with our, and there's no package variable declared in any .pm file.
I used the function arena_table() in the Devel::Gladiator module to produce some information about the arena in the main loop, and found that the SVs of type SCALAR and GLOB are increasing slowly, resulting in a gradual increase in the memory usage.
The output of arena_table (I reformat them, omitting the title. after a long enough period, only the first two number is increasing):
2013-05-17#11:24:34 36235 3924 3661 3642 3376 2401 720 201 27 23 18 13 13 10 2 2 1 1 1 1 1 1 1 1 1 1
After running for some time:
2013-05-17#12:05:10 50702 46169 36910 4151 3995 3924 2401 720 274 201 26 23 18 13 13 10 2 2 1 1 1 1 1 1 1 1 1
The main loop is something like:
our %hash1 = ();
our %hash2 = ();
# and some more package variables ...
# all are hashes
do {
my $nowtime = time();
collect_data($nowtime);
if (calculate() == 1) {
update();
}
sleep 1;
get_mem_objects(); # calls arena_table()
} while (1);
Except get_mem_objects, other functions will operate on the global hashes declared by our. In update, the program will do some log rotation, the code is like:
sub rotate_history() {
my $i = $main::HISTORY{'count'};
if ($i == $main::CONFIG{'times'}{'total'}) {
for ($i--; $i >= 1; $i--) {
$main::HISTORY{'data'}{$i} = dclone($main::HISTORY{'data'}{$i-1});
}
} else {
for (; $i >= 1; $i--) {
$main::HISTORY{'data'}{$i} = dclone($main::HISTORY{'data'}{$i-1});
}
}
$main::HISTORY{'data'}{'0'} = dclone(\%main::COLLECT);
if ($main::HISTORY{'count'} < $main::CONFIG{'times'}{'total'}) {
$main::HISTORY{'count'}++;
}
}
If I comment the calls to this function, in the final report given by Devel::Gladiator, only the SVs of type SCALAR is increasing, the number of GLOBs will finally enter a stable state. I doubt the dclone may cause the problem here.
My questions are,
what exactly does the information given by that module mean? The statements in the perldoc is a little vague for a perl newbie like me.
And, what are the common skills to lower the memory usage of long-running perl scripts?
I know that package variables are stored in the arena, but how about the lexical variables? How are the memory consumed by them managed?