Graphite - show "time spent" chart in a intuitive way - grafana

I have a metric to collect "time spent" metric on each task. The task might be triggered every 5~60 minutes. So the collected value is always the last updated value.
e.g., 533, 533, 533, 533, 533, 533, 46, 46, 999, 999, 999 (ms)
As the collected dots above, there are three executions: 533, 46, 999
But it would become three horizontal lines if I didn't apply any function. It's a little bit not intuitive. Is there any way to always show the only first value if the following values are identical?

I thought I figure out how to configure the query:
You can use the derivative function, a function that shows the change from the previous value. Then you can filter out zero to make it more readable.
For example:
removeBelowValue(derivative(your_time_spent_query), 1)
Before:
After:

Related

How to best print output to command window during a loop in Matlab?

I have a loop which iterates through a list of ID numbers paired with a given stress value. The code works fine, except there is no guarantee that the lists have identical lengths. I currently have an if isempty(stress_value) loop with a continue statement if an ID number doesn't have corresponding stress value. All of this takes place in a for id = 1:num_ids loop.
I am now trying to print this id value (class 'double') to the command line, if it doesn't have an assigned stress value, so if the isempty statement is True, prior to continuing out of the loop. As an example, if I set num_ids equal to 101, but the list I'm iterating through only has ID values 1-100, I want to output this 101 ID to the command line.
I've tried printing as an error like this:
error(['The following ID does not have an assigned stress value: ',id])
Here id simply prints as e however when I try this in the command window, which I don't quite understand. When I run it in the script nothing is printed to the command window.
I have also tried simply adding a display command for the id to the loop as follows, however when I run the code nothing shows up again:
disp(id)
Sorry for the simple question, but I have not found an effective way to do this yet, and would appreciate your feedback!
Check the fprintf. You can format your output just like you want.
for id=1:num_ids
% do something
if isempty(stress_value)
fprintf('The following ID does not have an assigned stress value: %d\n',id)
continue
end
% do something
end
The error function will stop code execution.
The display function only prints the value of the variable, without printing the variable name.

Textscan from end of line

I am trying to read a very heavy file. On each line I have some integers but the numbers of integers is not known. I just want to extract the last n items. I couldn't find the right syntax for doing this.
Example:
lineA='10 200 300 400 500';
lineB='300 400 500 550';
pA=textscan(lineA,'%u %u %u');
pB=textscan(lineB,'%u %u %u');
The results should be:
pA={[300]} {[400]} {[500]}
pB={[400]} {[500]} {[550]}
Currently I am not able to know the size of each line and I want to avoid having to. On this example I just read lines but in my actual script I read a file with 10e6 lines and I use the syntax textscan(fid,format,10e6).

kdb - how does the 'where' function work

I want to understand what is happening with the below statement:
sum(til 100) where 000011110000b
That line evaluates to 22, and I can't figure out why. sum(til 100) is 4950. where 000011110000b returns the list 4 5 6 7. The kdb reference page doesn't seem to explain this use case. Why does the above line evaluate to 22?
Also, why does the below line result in error
4950 where 000011110000b
Square brackets are used for function call arguments, rather than parentheses. So the above is being interpreted as:
sum[(til 100) where 000011110000b]
And you can probably see now why that would evaluate to 22, i.e. it is the sum of the 5th,6th,7th,8th values of the list til 100, i.e. (0...99), because where is indexing into til 100 using the boolean list 000011110000b
q)til[100] where 000011110000b
4 5 6 7
As you will have noticed, you can omit square brackets when calling functions; but when doing so you need to be sure it will be parsed/interpreted as intended. In general, code is evaluated right-to-left so in this case (til 100) where 000011110000b was evaluated first, and the result passed to the sum function.
Regarding 4950 where 000011110000b - again if we think right to left, the interpreter is trying to pass the result of where 000011110000b to the function 4590. Although 4590 isn't a named function in kdb - this is the format used for IPC, i.e. executing commands on other kdb processes over TCP. So you are telling kdb to use the OS file descriptor number 4590 (which almost certainly won't correspond to a valid TCP connection) to run the command where 000011110000b. See the "message formats" section here for more details:
http://code.kx.com/q/tutorials/startingq/ipc/

Why is there a collision between John Smith and Sandra dee in these hash table examples?

So an extremely popular example when introduction hash table or hash function is the phonebook example with John Smith and the others.
My question is why is there a collision between John Smith and Sandra Dee?
Looking at this example
http://commons.wikimedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_SP.svg
I'm thinking (521+1234)mod256 would be 152, that is way off (it's 219). I understand this is to demonstrate a collision, but why is there one in the first place? What's the formula inside the hash function?
edit: Also there's another example where they both map to 2 instead.
http://en.wikipedia.org/wiki/Hash_function#mediaviewer/File:Hash_table_4_1_1_0_0_1_0_LL.svg
There is no hash function in these examples, they are just contrived examples. The hashes are totally made up.
The source code that generated your first example is here.
If you look inside of choose_keys_and_hashes you'll see the line:
tb.key_hsh = [ 152, 001, 254, 154, 153 ]
So the hashes are just stored in an array. That is followed by the lines:
if op.collisions :
# Make "Sandra Dee" collide with "John Smith":
tb.key_hsh[3] = tb.key_hsh[0]
So the "collision" is totally fake. The second example seems to be generated from the same script with nkeys = 4.
Faking it is much easier than finding inputs and coming up with a hash function that gives you the desired output.

Perl: Programming Efficiency when computing correlation coefficients for a large set of data

EDIT: Link should work now, sorry for the trouble
I have a text file that looks like this:
Name, Test 1, Test 2, Test 3, Test 4, Test 5
Bob, 86, 83, 86, 80, 23
Alice, 38, 90, 100, 53, 32
Jill, 49, 53, 63, 43, 23.
I am writing a program that given this text file, it will generate a Pearson's correlation coefficient table that looks like this where the entry (x,y) is the correlation between person x and person y:
Name,Bob,Alice,Jill
Bob, 1, 0.567088412588577, 0.899798494392584
Alice, 0.567088412588577, 1, 0.812425393004088
Jill, 0.899798494392584, 0.812425393004088, 1
My program works, except that the data set I am feeding it has 82 columns and, more importantly, 54000 rows. When I run my program right now, it is incredibly slow and I get an out of memory error. Is there a way I can first of all, remove any possibility of an out of memory error and maybe make the program run a little more efficiently? The code is here: code.
Thanks for your help,
Jack
Edit: In case anyone else is trying to do large scale computation, convert your data into hdf5 format. This is what I ended up doing to solve this issue.
You're going to have to do at least 54000^2*82 calculations and comparisons. Of course it's going to take a lot of time. Are you holding everything in memory? That's going to be pretty large too. It will be slower, but it might use less memory if you can keep the users in a database and calculate one user against all the others, then go on to the next and do it against all the others instead of one massive array or hash.
Have a look at Tie::File to deal with the high memory usage of having your input and output files stored in memory.
Have you searched CPAN? My own search yielded another method gsl_stats_correlation for computing Pearsons correlation. This one is in Math::GSL::Statisics. This module binds to the GNU Scientific Library.
gsl_stats_correlation($data1, $stride1, $data2, $stride2, $n) - This function efficiently computes the Pearson correlation coefficient between the array reference $data1 and $data2 which must both be of the same length $n. r = cov(x, y) / (\Hat\sigma_x \Hat\sigma_y) = {1/(n-1) \sum (x_i - \Hat x) (y_i - \Hat y) \over \sqrt{1/(n-1) \sum (x_i - \Hat x)^2} \sqrt{1/(n-1) \sum (y_i - \Hat y)^2} }
You may want to look at PDL:
PDL ("Perl Data Language") gives
standard Perl the ability to compactly
store and speedily manipulate the
large N-dimensional data arrays which
are the bread and butter of scientific
computing
.
Essentially Paul Tomblin has given you the answer: It's a lot of calculation so it will take a long time. It's a lot of data, so it will take a lot of memory.
However, there may be one gotcha: If you use perl 5.10.0, your list assignments at the start of each method may be victims of a subtle performance bug in that version of perl (cf. perlmonks thread).
A couple of minor points:
The printout may actually slow down the program somewhat depending one where it goes.
There is no need to reopen the output file for each line! Just do something like this:
open FILE, ">", "file.txt" or die $!;
print FILE "Name, ", join(", ", 0..$#{$correlations[0]}+1), "\n";
my $rowno = 1;
foreach my $row (#correlations) {
print FILE "$rowno, " . join(", ", #$row) . "\n";
$rowno++;
}
close FILE;
Finally, while I do use Perl whenever I can, with a program and data set such as you describe, it might be the simplest route to simply use C++ with its iostreams (which make parsing easy enough) for this task.
Note that all of this is just minor optimization. There's no algorithmic gain.
I don't know enough about what you are trying to do to give good advice about implementation, but you might look at Statistics::LSNoHistory, it claims to have a method pearson_r that returns Pearson's r correlation coefficient.
Further to the comment above about PDL, here is the code how to calculate the correlation table even for very big datasets quite efficiently:
use PDL::Stats; # this useful module can be downloaded from CPAN
my $data = random(82, 5400); # your data should replace this
my $table = $data->corr_table(); # that's all, really
You might need to set $PDL::BIGPDL = 1; in the header of your script and make sure you run this on a machine with A LOT of memory. The computation itself is reasonably fast, a 82 x 5400 table took only a few seconds on my laptop.