What is the pseudocode translation for the ema function in KDB? - kdb

I'm trying to understanding the underlying function for ema.
When I call it, it returns source in k.
q)ema
k){(*y)(1f-x)\x*y}
q)ema[0.2;til 5]
0 0.2 0.56 1.048 1.6384
So far, so good.
When I try to call it directly, it doesn't seem to like the *. What am I doing wrong?
q){(*y)(1f-x)\x*y}[0.2;til 5]
'*
[0] {(*y)(1f-x)\x*y}[0.2;til 5]
^
How does one describe this k phrase in English/pseudocode?

Mark beat me to the exact answer I would give!
One thing to add to his answer, this ema function is using the special shorthand form documented here: https://code.kx.com/q/ref/accumulators/#alternative-syntax
Thus,
(*y)(1f-x)\x*y
is equivalent to
{z+x*y}\[first 0 1 2 3 4;1f-0.2;0.2*0 1 2 3 4]
which may be a little easier to follow as:
{(0.2*z)+x*y}\[0;0.8;0 1 2 3 4]
or simply
{(0.2*y)+x*0.8}\[0;0 1 2 3 4]
aka 0.8 times previous sum plus 0.2 times new value

You can run k code by prepending with k) in the command prompt
q)k){(*y)(1f-x)\x*y}[0.2;0 1 2 3 4]
0 0.2 0.56 1.048 1.6384
Or by passing a string into "k". This allows you to do it in the middle of a statement. You will have to escape backslashes
q)"k" "{(*y)(1f-x)\\x*y}[0.2;0 1 2 3 4]"
0 0.2 0.56 1.048 1.6384

To add to Mark and Terry's answers: the reason your code is failing is because q does not recognise * as anything other than a dyadic (two argument) function. Whereas the k interpreter will see it as monadic if it is in the right context (which it is in ema).
The .q namespace contains the mapping of k function to q keywords so, in future, if you run into similar operators in k expressions you should be able to look them up in this namespace. For example:
q).q?(*:) // Search for the monadic form of '*'
`first
Here you can see that monadic * is indeed equivalent to first.
This works for other functions:
q).q?(>:)
`hclose
q).q?(^:)
`null
q).q?(-:)
`neg
q).q?(=:)
`group
This is not guaranteed to work in all cases, and more complex expressions may fail. But it is a good first reference point.

Related

replace zero values with previous non-zero values

I need a fast way in Matlab to do something like this (I am dealing with huge vectors, so a normal loop takes forever!):
from a vector like
[0 0 2 3 0 0 0 5 0 0 7 0]
I need to get this:
[NaN NaN 2 3 3 3 3 5 5 5 7 7]
Basically, each zero value is replaced with the value of the previous non-zero one. The first are NaN because there is no previous non-zero element
in the vector.
Try this, not sure about speed though. Got to run so explanation will have to come later if you need it:
interp1(1:nnz(A), A(A ~= 0), cumsum(A ~= 0), 'NearestNeighbor')
Try this (it uses the cummax function, introduced in R2014b):
i1 = x==0;
i2 = cummax((1:numel(x)).*~i1);
x(i1&i2) = x(i2(i3));
x(~i2) = NaN;
Just for reference, here are some similar/identical functions from exchange central and/or SO columns.
nearestpoint ,
try knnimpute function.
Or best of all, a function designed to do exactly your task:
repnan (obviously, first replace your zero values with NaN)
I had a similar problem once, and decided that the most effective way to deal with it is to write a mex file. The c++ loop is extremely trivial. After you'l figure out how to work with mex interface, it will be very easy.

Get range of elements in KDB using variables

Why I can't use variable inside array ranges in KDB?
test:1 2 3 4 5
This example won't work:
pos:3;
test[1 pos]
but this way it will work
test[1 3]
As you can see, when you use test[1 3], (1 3) is a list. So vector variable requires a list.
q) list1:1 3
q) test[list1]
So you have to use:
q)n:3
q)list1:(1;n)
q)test[list1]
q)test[(1;n)] / alternate way
For detail explanation about why only semicolon doesn't work and why we require brackets '()',check my answer for this post:
kdb/q: how to reshape a list into nRows, where nRows is a variable
To understand what you're asking, consider:
1 2 3 7
That is a simple list of integers. Now consider:
a 2 3
Where a is a vector. The above indexes into a. Easy. Now say you want to have that 2 3 list as a variable
b:2 3
a b //works!
You are specifically asking about how to get a range from a list, this is covered in How to get range of elements in a list in KDB?
In that answer, use variables to create your index list and use the result to index into a

`accumarray` makes anomalous calls to its function argument

Short version:
The function passed as the fourth argument to accumarray sometimes gets called with arguments that are not consistent with specifications encoded the first argument to accumarray.
As a result, functions used as arguments to accumarray must test for what are, in effect, anomalous conditions.
The question is: how can an a 1-expression anonymous function test for such anomalous conditions? And more generally: how can write anonymous functions that are robust to accumarray's undocumented behavior?
Full version:
The code below is a drastically distilled version of a problem that ate up most of my workday today.
First some definitions:
idxs = [1:3 1:3 1:3]';
vals0 = [1 4 6 3 5 7 6 Inf 2]';
vals1 = [1 Inf 6 3 5 7 6 4 2]';
anon = #(x) max(x(~isinf(x)));
Note vals1 is obtained from vals0 by swapping elements 2 and 8. The "anonymous" function anon computes the maximum among the non-infinite elements of its input.
Given these definitions, the two calls below
accumarray(idxs, vals0, [], anon)
accumarray(idxs, vals1, [], anon)
which differ only in their second argument (vals0 vs vals1), should produce identical results, since the difference between vals0 and vals1 affects only the ordering of the values in the argument to one of the calls to anon, and the result of this function is insensitive to the ordering of elements in its argument.
As it turns out the first of these two expressions evaluates normally and produces the right result1:
>> accumarray(idxs, vals0, [], anon)
ans =
6
5
7
The second one, however, fails with:
>> accumarray(idxs, vals1, [], anon)
Error using accumarray
The function '#(x)max(x(~isinf(x)))' returned a non-scalar value.
To troubleshoot this problem, all I could come up with2 was to write a separate function (in its own file, of course, "the MATLAB way")
function out = kluge(x)
global ncalls;
ncalls = ncalls + 1;
y = ~isinf(x);
if any(y)
out = max(x(y));
else
{ncalls x}
out = NaN;
end
end
...and ran the following:
>> global ncalls;
>> ncalls = int8(0); accumarray(idxs, vals0, [], #kluge)
ans =
6
5
7
>> ncalls = int8(0); accumarray(idxs, vals1, [], #kluge)
ans =
[2] [Inf]
ans =
6
5
7
As one can see from the output of the last call to accumarray above, the argument to the second call to the kluge callback was the array [Int]. This tells me beyond any doubt that accumarray is not behaving as documented3 (since idxs specifies no arrays of length 1 to be passed to accumarray's function argument).
In fact, from this and other tests I determined that, contrary to what I expected, the function passed to accumarray is called more than max(idxs) (= 3) times; in the expressions involving kluge above it's called 5 times.
The problem here is that if one cannot rely on how accumarray's function argument will actually be called, then the only way to make this function argument robust is to include in it a lot of extra code to perform the necessary checks. This almost certainly will require that the function have multiple statements, which rules out anonymous functions. (E.g. the function kluge above is robust more robust than anon, but I don't know how to fit into an anonymous function.) Not being able to use anonymous functions with accumarray greatly reduces its utility.
So my question is:
how to specify anonymous functions that can be robust arguments to accumarray?
1 I have removed blank lines from MATLAB's typical over-padding in all the MATLAB output shown in this post.
2 I welcome comments with any other troubleshooting suggestions you may have; troubleshooting this problem was a lot harder than it should be.
3
In particular, see items number 1 through 5 right after the line "The function processes the input as follows:".
Short answer
The fourth input argument of accumarray, anon in this case, must return a scalar for any input.
Long answer (and discussion about index sorting)
Consider the output when the indexes are sorted:
>> [idxsSorted,sortInds] = sort(idxs)
>> accumarray(idxsSorted, vals0(sortInds), [], anon)
ans =
6
5
7
>> accumarray(idxsSorted, vals1(sortInds), [], anon)
ans =
6
5
7
Now, all the documentation has to say about this is the following:
If the subscripts in subs are not sorted, fun should not depend on the order of the values in its input data.
How does this relate the trouble with anon? It is a clue, as this forces anon to be called for the complete set of values for a given idx rather than a subset/subarray, as Luis Mendo suggested.
Consider how accumarray would work for a non-sorted list of indexes and values:
>> [idxs vals0 vals1]
ans =
1 1 1
2 4 Inf
3 6 6
1 3 3
2 5 5
3 7 7
1 6 6
2 Inf 4
3 2 2
For both vals0 and vals1, the Inf belongs to the set where idxs equals 2. Since idxs is not sorted, it does not process all values for idxs=2 in one shot, at first. The actual algorithm (implementation) is opaque, but it seems to start by assuming that idxs is sorted, processing each single-valued block of the first argument. This is verifiable by putting a breakpoint in fun, the function reference by fourth input argument. When it encounters a 1 in idxs for the second time, it seems to start over, but with subsequent calls to fun containing all the values for a given index. Presumably accumarray calls some implementation of unique to fully-segment idxs (incidentally, order is not preserved). As kjo suggests, this is the point where accumarray actually processes the inputs as described in the documentation, following steps 1-5 here ("Find out how many unique indices there are..."). As a result, it crashes for vals1, when anon(Inf) is called, but not for vals0, which instead calls anon(4) on the first try.
However, even if it followed those steps exactly on the first go, it would not necessarily be robust if a complete subarray of values contained just Infs (consider that anon([Inf Inf Inf]) returns an empty matrix too). It is a requirement, although an understated one, that fun must return a scalar. What is not clear from the documentation is that it must return a scalar, for any inputs, not just what is expected based on the high-level description of the algorithm.
Workaround:
anon = #(x) max([x(~isinf(x));-Inf]);
The documentation does not say that anon is called only with the whole set1 of vals corresponding to each value of idx as its input. As seen in your example, it does get called with subsets thereof.
So the way to make anon robust seems to be: make sure it gives a scalar output when its input is any subset of vals (or maybe just any subset of each set with same-idx value). In your case, anon(inf) does not return a scalar.
1 It's actually an array, of course, but I think it's easier to describe this in terms of sets (and subsets).

How are the columns and rows counted in pascal function in Functional Programming Principles in Scala at coursera?

I'm learning Scala while going through the Coursera course Functional Programming Principles in Scala.
The first exercise says:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
The numbers at the edge of the triangle are all 1, and each number
inside the triangle is the sum of the two numbers above it. Write a
function that computes the elements of Pascal’s triangle by means of a
recursive process.
Do this exercise by implementing the pascal function in Main.scala,
which takes a column c and a row r, counting from 0 and returns the
number at that spot in the triangle. For example, pascal(0,2)=1,
pascal(1,2)=2 and pascal(1,3)=3.
At the start, I understand, as he refers to the 'numbers' we are all familiar with, but then he goes on to use the term "elements." What does he mean by this? What does he want me to compute?
I assumed that he got bored with the word "number" and thought, after defining the names of the numbers in the triangle as 'numbers' he just wanted to use something new, thus "element," but no matter how I count I cannot get the references to work.
I cannot even really understand the term 'column' seeing as the numbers are not vertically above each other.
Can you please explain how he gets pascal(1,3) == 3?
You're thinking about columns a bit wrong. By "xth column," he means the "xth entry in a given row.
So, if you are looking at the function pascal(c,r), you would want to figure out what the cth number is in the rth row.
So, for example:
pascal(1,2) corresponds to the second entry in the 3rd row
1
1 1
1 *2* 1
pascal(1,3) wants you to look at the second entry in the 4th row.
1
1 1
1 2 1
1 *3* 3 1
Just count from the left. (0,2) is the leftmost number in the row
1 2 1
so (1,3) would be the second number in
1 3 3 1
You can simply make the triangle "rectangle", and everything will become apparent:
cols-> 0 1 2 3 4
row-0 1
row-1 1 1
row-2 1 2 1
row-3 1 3 3 1
row-4 1 4 6 4 1
And you were right in that the triangle's "elements" are made of numbers, though there's a subtle difference, but insubstantial in this case.
P.S. I would personally advice to prefer the course forum for such questions:
It will avoid controversial issues on the honor code.
Your course fellows will have a quicker understanding of the problem at hand
They will have access to material which is not available to those not undertaking the course
It will help to build up a sense of membership amongst the course students, and give you all a chance to create new, possibly fruitful, relashionships
What you're asking is against the Coursera Honor Code: https://www.coursera.org/maestro/auth/normal/tos.php#honorcode
http://www.aiqus.com/questions/41299/coursera-cheating-scala-course
I loved solving this exercise.
My thought process was the following:
Understanding that the problem is a literal description of the binomial coefficient. https://en.wikipedia.org/wiki/Binomial_coefficient
Understanding that the ask is a literal plug into the fomula (!row) / ((!col) * !((row - c))) and the formula is right there in the wiki page
Now the only thing that is missing now is implementing a tail recursive function of factorial
Bonus. if you use the extension method as such
extension (int: Int) {
def ! = factorialTailRec(int)
}
// you get to write
(r.!) / ((c.!) * ((r - c).!))
You get to write almost the identical mathematical formula. And at that moment I realised the similarities between doing maths and programming. And I cried a little with the beauty of it.

Why does crossvalind fail?

I am using cross valind function on a very small data... However I observe that it gives me incorrect results for the same. Is this supposed to happen ?
I have Matlab R2012a and here is my output
crossvalind('KFold',1:1:11,5)
ans =
2
5
1
3
2
1
5
3
5
1
5
Notice the absence of set 4.. Is this a bug ? I expected atleast 2 elements per set but it gives me 0 in one... and it happens a lot that is the values are not uniformly distributed in the sets.
The help for crossvalind says that the form you are using is: crossvalind(METHOD, GROUP, ...). In this case, GROUP is the e.g. the class labels of your data. So 1:11 as the second argument is confusing here, because it suggests no two examples have the same label. I think this is sufficiently unusual that you shouldn't be surprised if the function does something strange.
I tried doing:
numel(unique(crossvalind('KFold', rand(11, 1) > 0.5, 5)))
and it reliably gave 5 as a result, which is what I would expect; my example would correspond to a two-class problem (I would guess that, as a general rule, you'd want something like numel(unique(group)) <= numel(group) / folds) - my hypothesis would be that it tries to have one example of each class in the Kth fold, and at least 2 examples in every other, with a difference between fold sizes of no more than 1 - but I haven't looked in the code to verify this.
It is possible that you mean to do:
crossvalind('KFold', 11, 5);
which would compute 5 folds for 11 data points - this doesn't attempt to do anything clever with labels, so you would be sure that there will be K folds.
However, in your problem, if you really have very few data points, then it is probably better to do leave-one-out cross validation, which you could do with:
crossvalind('LeaveMOut', 11, 1);
although a better method would be:
for leave_out=1:11
fold_number = (1:11) ~= leave_out;
<code here; where fold_number is 0, this is the leave-one-out example. fold_number = 1 means that the example is in the main fold.>
end