How are the columns and rows counted in pascal function in Functional Programming Principles in Scala at coursera? - scala

I'm learning Scala while going through the Coursera course Functional Programming Principles in Scala.
The first exercise says:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
The numbers at the edge of the triangle are all 1, and each number
inside the triangle is the sum of the two numbers above it. Write a
function that computes the elements of Pascal’s triangle by means of a
recursive process.
Do this exercise by implementing the pascal function in Main.scala,
which takes a column c and a row r, counting from 0 and returns the
number at that spot in the triangle. For example, pascal(0,2)=1,
pascal(1,2)=2 and pascal(1,3)=3.
At the start, I understand, as he refers to the 'numbers' we are all familiar with, but then he goes on to use the term "elements." What does he mean by this? What does he want me to compute?
I assumed that he got bored with the word "number" and thought, after defining the names of the numbers in the triangle as 'numbers' he just wanted to use something new, thus "element," but no matter how I count I cannot get the references to work.
I cannot even really understand the term 'column' seeing as the numbers are not vertically above each other.
Can you please explain how he gets pascal(1,3) == 3?

You're thinking about columns a bit wrong. By "xth column," he means the "xth entry in a given row.
So, if you are looking at the function pascal(c,r), you would want to figure out what the cth number is in the rth row.
So, for example:
pascal(1,2) corresponds to the second entry in the 3rd row
1
1 1
1 *2* 1
pascal(1,3) wants you to look at the second entry in the 4th row.
1
1 1
1 2 1
1 *3* 3 1

Just count from the left. (0,2) is the leftmost number in the row
1 2 1
so (1,3) would be the second number in
1 3 3 1

You can simply make the triangle "rectangle", and everything will become apparent:
cols-> 0 1 2 3 4
row-0 1
row-1 1 1
row-2 1 2 1
row-3 1 3 3 1
row-4 1 4 6 4 1
And you were right in that the triangle's "elements" are made of numbers, though there's a subtle difference, but insubstantial in this case.
P.S. I would personally advice to prefer the course forum for such questions:
It will avoid controversial issues on the honor code.
Your course fellows will have a quicker understanding of the problem at hand
They will have access to material which is not available to those not undertaking the course
It will help to build up a sense of membership amongst the course students, and give you all a chance to create new, possibly fruitful, relashionships

What you're asking is against the Coursera Honor Code: https://www.coursera.org/maestro/auth/normal/tos.php#honorcode
http://www.aiqus.com/questions/41299/coursera-cheating-scala-course

I loved solving this exercise.
My thought process was the following:
Understanding that the problem is a literal description of the binomial coefficient. https://en.wikipedia.org/wiki/Binomial_coefficient
Understanding that the ask is a literal plug into the fomula (!row) / ((!col) * !((row - c))) and the formula is right there in the wiki page
Now the only thing that is missing now is implementing a tail recursive function of factorial
Bonus. if you use the extension method as such
extension (int: Int) {
def ! = factorialTailRec(int)
}
// you get to write
(r.!) / ((c.!) * ((r - c).!))
You get to write almost the identical mathematical formula. And at that moment I realised the similarities between doing maths and programming. And I cried a little with the beauty of it.

Related

How does the scan operator help to identify indices where a rule holds for 2 observations in a row?

A boolean vector has been created based on some rule and we need to identify the indices where the rule holds for 2 observations in a row. The following code does that
indices:0101001101b
runs:{0 x\x}"f"$;
where 2=runs indices
Could you please help me understand how the scan operator is used in the definition of the runs function? Appreciate your help.
It's using this special shorthand commonly used in calculating exponential moving averages: https://code.kx.com/q/ref/accumulators/#alternative-syntax
So {0 x\x} is equivalent to:
q){z+x*y}\[0;indices;indices]
0 1 0 1 0 0 1 2 0 1
What this is doing is essentially using the booleans as an on/off switch (via the boolean multiplication) for the rolling sum. It adds (z+) until it hits a negative boolean in which case the rolling sum resets back to zero.
In english: nextValue + [currentValue (starting at 0) * nextValue]
When nextValue is 1, 1 gets added. When nextValue is 0 the result is zero (resetting the rolling sum).
Something like this can achieve the same thing, though no less easy to read at a glance (and using two scans instead of one):
q){s-maxs not[x]*s:sums x}indices
0 1 0 1 0 0 1 2 0 1i
Terry has answered your question about how runs works.
Comparing adjacent items is common. You might prefer to use the prior keyword. Certainly easier to see what it is doing.
q)where (and) prior indices
,7

Extending Rabin-Karp algorithm to hash a 2D matrix

I'm trying to solve a problem here, it asks to find the size of the biggest common subsquare between two matrices.
e.g.
Matrix #1
3 3
1 2 0
1 2 1
1 2 3
Matrix #2
3 3
0 1 2
1 1 2
3 1 2
Answer: 2
Biggest common subsquare is:
1 2
1 2
I know that Rabin-Karp algorithm can be extended to work on a 2D matrix, but I can't understand how exactly can we do that, I tried to understand the author's code in the editorial, but its too complicated, I also did some search for a good explanation, but I couldn't find a clear one.
Can anyone simply explain how can I use Rabin-Karp algorithm to hash a matrix, I know I will hash rows and columns, but I can't see how to mix their hashes together to come up with a hashed matrix, and how the rolling hash function will be handled in this case ?

Dot Product: * Command vs. Loop gives different results

I have two vectors in Matlab, z and beta. Vector z is a 1x17:
1 0.430742139435890 0.257372971229541 0.0965909090909091 0.694329541928697 0 0.394960106863064 0 0.100000000000000 1 0.264704325268675 0.387774594078319 0.269207605609567 0.472226643323253 0.750000000000000 0.513121013402805 0.697062571025173
... and beta is a 17x1:
6.55269487769363e+26
0
0
-56.3867588816768
-2.21310778926413
0
57.0726052009847
0
3.47223691057151e+27
-1.00249317882651e+27
3.38202232046686
1.16425987969027
0.229504956512063
-0.314243264212449
-0.257394312588330
0.498644243389556
-0.852510642195370
I'm dealing with some singularity issues, and I noticed that if I want to compute the dot product of z*beta, I potentially get 2 different solutions. If I use the * command, z*beta = 18.5045. If I write a loop to compute the dot product (below), I get a solution of 0.7287.
summation=0;
for i=1:17
addition=z(1,i)*beta(i);
summation=summation+addition;
end
Any idea what's going on here?
Here's a link to the data: https://dl.dropboxusercontent.com/u/16594701/data.zip
The problem here is that addition of floating point numbers is not associative. When summing a sequence of numbers of comparable magnitude, this is not usually a problem. However, in your sequence, most numbers are around 1 or 10, while several entries have magnitude 10^26 or 10^27. Numerical problems are almost unavoidable in this situation.
The wikipedia page http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems shows a worked example where (a + b) + c is not equal to a + (b + c), i.e. demonstrating that the order in which you add up floating point numbers does matter.
I would guess that this is a homework assignment designed to illustrate these exact issues. If not, I'd ask what the data represents to suss out the appropriate approach. It would probably be much more productive to find out why such large numbers are being produced in the first place than trying to make sense of the dot product that includes them.

matlab percentage change between cells

I'm a newbie to Matlab and just stumped how to do a simple task that can be easily performed in excel. I'm simply trying to get the percent change between cells in a matrix. I would like to create a for loop for this task. The data is setup in the following format:
DAY1 DAY2 DAY3...DAY 100
SUBJECT RESULTS
I could only perform getting the percent change between two data points. How would I conduct it if across multiple days and multiple subjects? And please provide explanation
Thanks a bunch
FOR EXAMPLE, FOR DAY 1 SUBJECT1(RESULT=1), SUBJECT2(RESULT=4), SUBJECT3(RESULT=5), DAY 2 SUBJECT1(RESULT=2), SUBJECT2(RESULT=8), SUBJECT3(RESULT=10), DAY 3 SUBJECT1(RESULT=1), SUBJECT2(RESULT=4), SUBJECT3(RESULT=5).
I WANT THE PERCENT CHANGE SO OUTPUT WILL BE DAY 2 SUBJECT1(RESULT=100%), SUBJECT2(RESULT=100%), SUBJECT3(RESULT=100%). DAY3 SUBJECT1(RESULT=50%), SUBJECT2(RESULT=50%), SUBJECT3(RESULT=50%)
updated:
Hi thanks for responding guys. sorry for the confusion. zebediah49 is pretty close to what I'm looking for. My data is for example a 10 x 10 double. I merely wanted to get the percentage change from column to column. For example, if I want the percentage change from rows 1 through 10 on all columns (from columns 2:10). I would like the code to function for any matrix dimension (e.g., 1000 x 1000 double) zebediah49 could you explain the code you posted? thanks
updated2:
zebediah49,
(data(1:end,100)- data(1:end,99))./data(1:end,99)
output=[data(:,2:end)-data(:,1:end-1)]./data(:,1:end-1)*100;
Observing the code above, How would I go about modifying it so that column 100 is used as the index against all of the other columns(1-99)? If I change the code to the following:
(data(1:end,100)- data(1:end,:))./data(1:end,:)
matlab is unable because of exceeding matrix dimensions. How would I go about implementing that?
UPDATE 3
zebediah49,
Worked perfectly!!! Originally I created a new variable for the index and repmat the index to match the matrices which was not a good idea. It took forever to replicate when dealing with large numbers.
Thanks for you contribution once again.
Thanks Chris for your contribution too!!! I was looking more on how to address and manipulate arrays within a matrix.
It's matlab; you don't actually want a loop.
output=input(2:end,:)./input(1:end-1,:)*100;
will probably do roughly what you want. Since you didn't give anything about your matlab structure, you may have to change index order, etc. in order to make it work.
If it's not obvious, that line defines output as a matrix consisting of the input matrix, divided by the input matrix shifted right by one element. The ./ operator is important, because it means that you will divide each element by its corresponding one, as opposed to doing matrix division.
EDIT: further explanation was requested:
I assumed you wanted % change of the form 1->1->2->3->1 to be 100%, 200%, 150%, 33%.
The other form can be obtained by subtracting 100%.
input(2:end,:) will grab a sub-matrix, where the first row is cut off. (I put the time along the first dimension... if you want it the other way it would be input(:,2:end).
Matlab is 1-indexed, and lets you use the special value end to refer to the las element.
Thus, end-1 is the second-last.
The point here is that element (i) of this matrix is element (i+1) of the original.
input(1:end-1,:), like the above, will also grab a sub-matrix, except that that it's missing the last column.
I then divide element (i) by element (i+1). Because of how I picked out the sub-matrices, they now line up.
As a semi-graphical demonstration, using my above numbers:
input: [1 1 2 3 1]
input(2,end): [1 2 3 1]
input(1,end-1): [1 1 2 3]
When I do the division, it's first/first, second/second, etc.
input(2:end,:)./input(1:end-1,:):
[1 2 3 1 ]
./ [1 1 2 3 ]
---------------------
== [1.0 2.0 1.5 0.3]
The extra index set to (:) means that it will do that procedure across all of the other dimension.
EDIT2: Revised question: How do I exclude a row, and keep it as an index.
You say you tried something to the effect of (data(1:end,100)- data(1:end,:))./data(1:end,:). Matlab will not like this, because the element-by-element operators need them to be the same size. If you wanted it to only work on the 100th column, setting the second index to be 100 instead of : would do that.
I would, instead, suggest setting the first to be the index, and the rest to be data.
Thus, the data is processed by cutting off the first:
output=[data(2:end,2:end)-data(2:end,1:end-1)]./data(2:end,1:end-1)*100;
OR, (if you neglect the start, matlab assumes 1; neglect the end and it assumes end, making (:) shorthand for (1:end).
output=[data(2:,2:end)-data(2:,1:end-1)]./data(2:,1:end-1)*100;
However, you will probably still want the indices back, in which case you will need to append that subarray back:
output=[data(1,1:end-1) data(2:,2:end)-data(2:,1:end-1)]./data(2:,1:end-1)*100];
This is probably not how you should be doing it though-- keep data in one matrix, and time or whatever else in a separate array. That makes it much easier to do stuff like this to data, without having to worry about excluding time. It's especially nice when graphing.
Oh, and one more thing:
(data(:,2:end)-data(:,1:end-1))./data(:,1:end-1)*100;
is identically equivalent to
data(:,2:end)./data(:,1:end-1)*100-100;
Assuming zebediah49 guessed right in the comment above and you want
1 4 5
2 8 10
1 4 5
to turn into
1 1 1
-.5 -.5 -.5
then try this:
data = [1,4,5; 2,8,10; 1,4,5];
changes_absolute = diff(data);
changes_absolute./data(1:end-1,:)
ans =
1.0000 1.0000 1.0000
-0.5000 -0.5000 -0.5000
You don't need the intermediate variable, you can directly write diff(data)./data(1:end,:). I just thought the above might be easier to read. Getting from that result to percentage numbers is left as an exercise to the reader. :-)
Oh, and if you really want 50%, not -50%, just use abs around the final line.

Why does crossvalind fail?

I am using cross valind function on a very small data... However I observe that it gives me incorrect results for the same. Is this supposed to happen ?
I have Matlab R2012a and here is my output
crossvalind('KFold',1:1:11,5)
ans =
2
5
1
3
2
1
5
3
5
1
5
Notice the absence of set 4.. Is this a bug ? I expected atleast 2 elements per set but it gives me 0 in one... and it happens a lot that is the values are not uniformly distributed in the sets.
The help for crossvalind says that the form you are using is: crossvalind(METHOD, GROUP, ...). In this case, GROUP is the e.g. the class labels of your data. So 1:11 as the second argument is confusing here, because it suggests no two examples have the same label. I think this is sufficiently unusual that you shouldn't be surprised if the function does something strange.
I tried doing:
numel(unique(crossvalind('KFold', rand(11, 1) > 0.5, 5)))
and it reliably gave 5 as a result, which is what I would expect; my example would correspond to a two-class problem (I would guess that, as a general rule, you'd want something like numel(unique(group)) <= numel(group) / folds) - my hypothesis would be that it tries to have one example of each class in the Kth fold, and at least 2 examples in every other, with a difference between fold sizes of no more than 1 - but I haven't looked in the code to verify this.
It is possible that you mean to do:
crossvalind('KFold', 11, 5);
which would compute 5 folds for 11 data points - this doesn't attempt to do anything clever with labels, so you would be sure that there will be K folds.
However, in your problem, if you really have very few data points, then it is probably better to do leave-one-out cross validation, which you could do with:
crossvalind('LeaveMOut', 11, 1);
although a better method would be:
for leave_out=1:11
fold_number = (1:11) ~= leave_out;
<code here; where fold_number is 0, this is the leave-one-out example. fold_number = 1 means that the example is in the main fold.>
end