Passing long values into MLlib's Rating() method - scala

I am trying to build a recommender system using Spark's MLlib library. (using Scala)
In order to be able to use the ALS train method , I need to build a rating matrix using the Rating() method (which is a part the package org.apache.spark.mllib.recommendation.Rating). The method requires an int be passed as the user id . However the dataset i am working with has 11 digit id's and hence throws an error when I try to pass it.
Does anyone know if there is some way around this where I can pass a long value into the Rating method ? Or someway to override this method ? Or someway to uniquely convert the 11 digit number to 10 or 9 digits while keeping it an int?
Any help will be greatly appreciated. Thanks

This will depend, I think, on the range of your ids. Can you simply take the Id modulo Int.MaxValue? That is:
(id % Int.MaxValue).toInt
or can you just hash it to an Int?
id.hashCode

Related

When subclassing "double" with new properties in MATLAB, is there an easy way to access the data value?

Say I have a class subclassing double, and I want to add a string (Similar to the 'extendDouble' in the documentation). Is there an easy way to access the actual numeric value without the extra properties, particular for reassigning? Or if I want to change the value, will I have to recreate the value as a new member of the class with the new value and the same string?
e.g.
classdef myDouble < double
properties
string
end
methods
function obj = myDouble(s)
% Construct object (simplified)
obj.string = s;
end
end
end
----------
x = myDouble(2,'string')
x =
2 string
x = 3
x =
3 string
Short answer: NO. There is no easy way to access a single member of a class when the class contains more than one member. You'll always have to let MATLAB know which part of the class you want to manipulate.
You have multiple questions in your post but let's tackle the most interesting one first:
% you'd like to instanciate a new class this way (fine)
x = myDouble(2,'string')
x =
2 string
% then you'd like to easily refer to the only numeric part of your class
% for assignment => This can NEVER work in MATLAB.
x = 3
x =
3 string
This can never work in MATLAB because of how the interpreter works. Consider the following statements:
% direct assignment
(1) dummy = 3
% indexed assignments
(2) dummy(1) = 3
(3) dummy{1} = 3
(4) dummy.somefieldname = 3
You would like the simplicity of the first statement for assignment, but this is the one we cannot achieve. The statement 2, 3 and 4 are all possible with some fiddling with subasgn and subsref.
The main difference between (1) and [2,3,4] is this:
Direct assignment:
In MATLAB, when you execute a direct assignment to a simple variable name (without indexing with () or {} or a field name) like dummy=3, MATLAB does not check the type of dummy beforehand, in fact it does not even check whether the variable dummy exists at all. No, with this kind of assignment, MATLAB goes the quickest way, it immediately create a new variable dummy and assign it the type and value accordingly. If a variable dummy existed before, too bad for it, that one is lost forever (and a lot of MATLAB users have had their fingers bitten once or twice by this behavior actually as it is an easy mistake to overwrite a variable and MATLAB will not raise any warning or complaint)
Indexed assignments:
In all the other cases, something different happens. When you execute dummy(1)=3, you are not telling MATLAB "create a new dummy variable with that value", you are telling MATLAB, "find the existing dummy variable, find the existing subindex I am telling you, then assign the value to that specific subindex". MATLAB will happlily go on, if it finds everything it does the sub-assignment, or it might complains/error about any kind of misassignment (wrong index, type mismatch, indices length mismatch...).
To find the subindex, MATLAB will call the subassgn method of dummy. If dummy is a built-in class, the subassgn method is also built in and usually under the hood, if dummy is a custom class, then you can write your own subassgn and have full control on how MATLAB will treat the assignment. You can check for the type of the input and decide to apply to this field or another if it's more suitable. You can even do some range check and reject the assignment altogether if it is not suitable, or just assign a default value. You have full control, MATLAB will not force you to anything in your own subassgn.
The problem is, to trigger MATLAB to relinquish control and give the hand to your own subassgn, you have to use an indexed assignment (like [2,3 or 4] above). You cannot do that with type (1) assignment.
Other considerations: You also ask if you can change the numeric part of the class without creating a new object. The answer to that is no as well. This is because of the way value classes work in matlab. There could be a long explanation of what happens under the hood, but the best example is from the MATLAB example you referenced yourself. If we look at the class definition of ExtendDouble, then observe the custom subassgn method which will perform the change of numeric value, what happens there is:
obj = ExtendDouble(b,obj.DataString);
So even Mathworks, to change the numeric value of their extended double class, have to recreate a brand new one (with a new numeric value b, and transfering the old string value obj.DataString).

kdb/q: apply the function, pass the return value to the function again, multiple rounds

I have a list of symbols, say
`A`B`C
. I have a table tab0; A function that takes in a table plus a string as arguments.
tab1: f[tab0;`A]
tab2: f[tab1;`B]
tab3: f[tab2;`C]
I only care about the final values. But my list of symbols can be long and can have variable length, so I don't want to hardcode above. How do I achieve it?
I think it has something to do with https://code.kx.com/q/ref/accumulators/ but I really struggle to figure out the syntax.
This is exactly the use case for the binary application of over (/) (https://code.kx.com/q/ref/accumulators/#binary-application)
So you should use:
f/[tab0;`A`B`C]

`[~,ui] = Unique(Day)` what is this doing?

When looking at Matlab code I have stumbled upon the following line of code:
[~,ui] = Unique(Day)
(Where Day is the vector containing a numeric value of day like so: 1,2,3, etc.)
What is it doing? I have noticed that it creates some kind of unique identifiers for the numeric value of the day (i.e. for 1 to 31) as well as a variable called Volume. What is Volume?
[~,ui] = Unique(Day) evaluates the function Unique with input argument Day.
This function has 2 outputs, and if you want to use both, you would write
[a,b]=Unique(Day). However, if you need only second output, you can put ~ instead of the first argument. So, your first output will not be saved.
It is impossible to answer, what Volume means, because you didn't provide the code of the function Unique.

Reflectively look up enum value by String in Swift 2

I'm writing an XML-based descriptor for UIKit and am wondering if there's any slim possibility at all of taking a string like "UIStackViewAlignmentCenter" or "UIStackViewAlignment.Center" and converting it into the appropriate constant value.
I'm really expecting this is impossible, but wanted to ask just in case.
My fallback plan is to create a helper class that allows me to register strings like "UIStackViewAlignmentCenter" and map them to values, but this is going to be painstaking adding all of the possible constants. :(

Function returning 2 types based on input in Perl. Is this a good approach?

i have designed a function which can return 2 different types based on the input parameters
ex: &Foo(12,"count") -> returns record count from DB for value 12
&Foo(12,"details") -> returns resultset from DB for value 12 in hash format
My question is is this a good approach? in C# i can do it with function overload.
Please think what part of your code gets easier by saying
Foo(12, "count")
instead of
Foo_count(12)
The only case I can think of is when the function name ("count") itself is input data. And even then do you probably want to perform some validation on that, maybe by means of a function table lookup.
Unless this is for an intermediate layer that just takes a command name and passes it on, I'd go with two separate functions.
Also, the implementation of the Foo function would look at the command name and then just split into a private function for every command anyway, right?
additionally you might consider the want to make foo return the details if you wanted a list.
return wantarray ? ($num, #details) : $num;