How does the scan operator help to identify indices where a rule holds for 2 observations in a row? - kdb

A boolean vector has been created based on some rule and we need to identify the indices where the rule holds for 2 observations in a row. The following code does that
indices:0101001101b
runs:{0 x\x}"f"$;
where 2=runs indices
Could you please help me understand how the scan operator is used in the definition of the runs function? Appreciate your help.

It's using this special shorthand commonly used in calculating exponential moving averages: https://code.kx.com/q/ref/accumulators/#alternative-syntax
So {0 x\x} is equivalent to:
q){z+x*y}\[0;indices;indices]
0 1 0 1 0 0 1 2 0 1
What this is doing is essentially using the booleans as an on/off switch (via the boolean multiplication) for the rolling sum. It adds (z+) until it hits a negative boolean in which case the rolling sum resets back to zero.
In english: nextValue + [currentValue (starting at 0) * nextValue]
When nextValue is 1, 1 gets added. When nextValue is 0 the result is zero (resetting the rolling sum).
Something like this can achieve the same thing, though no less easy to read at a glance (and using two scans instead of one):
q){s-maxs not[x]*s:sums x}indices
0 1 0 1 0 0 1 2 0 1i

Terry has answered your question about how runs works.
Comparing adjacent items is common. You might prefer to use the prior keyword. Certainly easier to see what it is doing.
q)where (and) prior indices
,7

Related

read arrays in Simulink

I need some help of solving that issue: I have 5 different voltage values that change every single tick time - that mean every single moment. I need to sort them and after they been sorted I want to go to another matrix(like this one at the bottom) and to pull out(read) specific column from it, for every state pre define(timing that I am designing..) That mechanism change every single states/moment. How can I do this ?
The Matrix look like(and could be greater...):
0 0 0 1 1 1...
0 1 1 0 0 1...
1 0 1 0 1 0...
1 1 0 1 0 0...
.. .. .. .. .. ..
Thanks, Henry
I am not sure I understood it correctly. So I will edit my answer after you make your question a bit more clear.
I see two separate things:
Reading 5 voltage values which change at each step. You want to sort these values. To do this you can use the sort function of matlab. It is really easy to use and you can look at it here.
This is the part I didn't understand well. After sorting the voltage readings what do you want to do with the matrix ? If you want to access just a specific column of the matrix and save it in a variable you can do it in this way. Let's assume you have a matrix A which is N x N, if you want to access the 10th column of the matrix and store it in a variable called column10 you will do something like: column10 = A(:,10)
I hope this will help you but let me know if this is what you wanted and I will edit my answer according to it.
Fab.

replace zero values with previous non-zero values

I need a fast way in Matlab to do something like this (I am dealing with huge vectors, so a normal loop takes forever!):
from a vector like
[0 0 2 3 0 0 0 5 0 0 7 0]
I need to get this:
[NaN NaN 2 3 3 3 3 5 5 5 7 7]
Basically, each zero value is replaced with the value of the previous non-zero one. The first are NaN because there is no previous non-zero element
in the vector.
Try this, not sure about speed though. Got to run so explanation will have to come later if you need it:
interp1(1:nnz(A), A(A ~= 0), cumsum(A ~= 0), 'NearestNeighbor')
Try this (it uses the cummax function, introduced in R2014b):
i1 = x==0;
i2 = cummax((1:numel(x)).*~i1);
x(i1&i2) = x(i2(i3));
x(~i2) = NaN;
Just for reference, here are some similar/identical functions from exchange central and/or SO columns.
nearestpoint ,
try knnimpute function.
Or best of all, a function designed to do exactly your task:
repnan (obviously, first replace your zero values with NaN)
I had a similar problem once, and decided that the most effective way to deal with it is to write a mex file. The c++ loop is extremely trivial. After you'l figure out how to work with mex interface, it will be very easy.

How are the columns and rows counted in pascal function in Functional Programming Principles in Scala at coursera?

I'm learning Scala while going through the Coursera course Functional Programming Principles in Scala.
The first exercise says:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
The numbers at the edge of the triangle are all 1, and each number
inside the triangle is the sum of the two numbers above it. Write a
function that computes the elements of Pascal’s triangle by means of a
recursive process.
Do this exercise by implementing the pascal function in Main.scala,
which takes a column c and a row r, counting from 0 and returns the
number at that spot in the triangle. For example, pascal(0,2)=1,
pascal(1,2)=2 and pascal(1,3)=3.
At the start, I understand, as he refers to the 'numbers' we are all familiar with, but then he goes on to use the term "elements." What does he mean by this? What does he want me to compute?
I assumed that he got bored with the word "number" and thought, after defining the names of the numbers in the triangle as 'numbers' he just wanted to use something new, thus "element," but no matter how I count I cannot get the references to work.
I cannot even really understand the term 'column' seeing as the numbers are not vertically above each other.
Can you please explain how he gets pascal(1,3) == 3?
You're thinking about columns a bit wrong. By "xth column," he means the "xth entry in a given row.
So, if you are looking at the function pascal(c,r), you would want to figure out what the cth number is in the rth row.
So, for example:
pascal(1,2) corresponds to the second entry in the 3rd row
1
1 1
1 *2* 1
pascal(1,3) wants you to look at the second entry in the 4th row.
1
1 1
1 2 1
1 *3* 3 1
Just count from the left. (0,2) is the leftmost number in the row
1 2 1
so (1,3) would be the second number in
1 3 3 1
You can simply make the triangle "rectangle", and everything will become apparent:
cols-> 0 1 2 3 4
row-0 1
row-1 1 1
row-2 1 2 1
row-3 1 3 3 1
row-4 1 4 6 4 1
And you were right in that the triangle's "elements" are made of numbers, though there's a subtle difference, but insubstantial in this case.
P.S. I would personally advice to prefer the course forum for such questions:
It will avoid controversial issues on the honor code.
Your course fellows will have a quicker understanding of the problem at hand
They will have access to material which is not available to those not undertaking the course
It will help to build up a sense of membership amongst the course students, and give you all a chance to create new, possibly fruitful, relashionships
What you're asking is against the Coursera Honor Code: https://www.coursera.org/maestro/auth/normal/tos.php#honorcode
http://www.aiqus.com/questions/41299/coursera-cheating-scala-course
I loved solving this exercise.
My thought process was the following:
Understanding that the problem is a literal description of the binomial coefficient. https://en.wikipedia.org/wiki/Binomial_coefficient
Understanding that the ask is a literal plug into the fomula (!row) / ((!col) * !((row - c))) and the formula is right there in the wiki page
Now the only thing that is missing now is implementing a tail recursive function of factorial
Bonus. if you use the extension method as such
extension (int: Int) {
def ! = factorialTailRec(int)
}
// you get to write
(r.!) / ((c.!) * ((r - c).!))
You get to write almost the identical mathematical formula. And at that moment I realised the similarities between doing maths and programming. And I cried a little with the beauty of it.

Subselecting matrix and use logical selection (matlab)

I have a line of code in matlab for which i am selecting a subset of a matrix:
A(3:5,1:3);
Now i want to adapt this line, to only select rows for which all three values are larger than zero:
(A(3:5,1:3) > 0);
But apparently i am not doing this right. How do i select part of the matrix, and also make sure that only the rows (for which all three values are) larger than zero are selected?
EDIT: To clarify: lets say that i have a matrix of coordinates called A, that looks like this:
Matrix A [5,3]
3 4 0
0 1 0
0 3 1
0 0 0
4 8 7
Now i want to select only part [3:5,1:3], and of that part i only want to select row 3 and 5. How do i do that?
The expression:
A(find(sum(A(3:5,:),2)~=0),:)
will return only the rows of A(3:5,:) which have a row-sum not equal to zero.
If you had posted syntactically correct Matlab it would have been easier for me to cut and paste your test data into my Matlab session.
I'm modelling this answer off of A(find( A > 0 ))
distances = pdist(find( pdist(medoidContainer(i,1:3)) > 0 ));
This will give you a vector of values in the distances variable. The reason the pdist(medoidContainer(i,1:3) > 0) does not work is because it first, finds the indices specified by i,1:3 in medoidContainer. Then it finds the indices in medoidContainer(i,1:3) that are greater than 0. However, since medoidContainer(i,1:3) and pdist now likely have different dimensions, the comparison does not give the right indexes.

Linspace vs range

I was wondering what is better style / more efficient:
x = linspace(-1, 1, 100);
or
x = -1:0.01:1;
As Oli Charlesworth mentioned, in linspace you divide the interval [a,b] into N points, whereas with the : form, you step-out from a with a specified step size (default 1) till you reach b.
One thing to keep in mind is that linspace always includes the end points, whereas, : form will include the second end-point, only if your step size is such that it falls on it at the last step else, it will fall short. Example:
0:3:10
ans =
0 3 6 9
That said, when I use the two approaches depends on what I need to do. If all I need to do is sample an interval with a fixed number of points (and I don't care about the step-size), I use linspace.
In many cases, I don't care if it doesn't fall on the last point, e.g., when working with polar co-ordinates, I don't need the last point, as 2*pi is the same as 0. There, I use 0:0.01:2*pi.
As always, use the one that best suits your purposes, and that best expresses your intentions. So use linspace when you know the number of points; use : when you know the spacing.
[Incidentally, your two examples are not equivalent; the second one will give you 201 points.]
As Oli already pointed out, it's usually easiest to use linspace when you know the number of points you want and the colon operator when you know the spacing you want between elements.
However, it should be noted that the two will often not give you exactly the same results. As noted here and here, the two approaches use slightly different methods to calculate the vector elements (here's an archived description of how the colon operator works). That's why these two vectors aren't equal:
>> a = 0:0.1:1;
>> b = linspace(0,1,11);
>> a-b
ans =
1.0e-016 *
Columns 1 through 8
0 0 0 0.5551 0 0 0 0
Columns 9 through 11
0 0 0
This is a typical side-effect of how floating-point numbers are represented. Certain numbers can't be exactly represented (like 0.1) and performing the same calculation in different ways (i.e. changing the order of mathematical operations) can lead to ever so slightly different results, as shown in the above example. These differences are usually on the order of the floating-point precision, and can often be ignored, but you should always be aware that they exist.