Assign a value to a single SFrame element - sframe

I want to assign a value to a single element (i.e. single row and column) in an SFrame.
I am using the Python Notebook and importing graphlab.
I created an SFrame with dimensions 16364 rows x 37 columns.
The column 'test' contains zeros.
I have used the following syntax to set the value:
sf[1]['test'] = 3;
If I then type:
sf[1]['test']
then I see the correct value, i.e "3"
But if I type:
sf
then I just see values of zero for all rows of column 'test'
Also same for sf.head() or sf['test'] or sf['test'].head()
I don't understand why one syntax shows the value of "3" where an alternative one does not. Is the value in sf[1]['test'] 3 or 0 ?

SFrames are immutable, so they don't actually support item assignment. The reason for the difference you see here is because
sf[1]['test']
isn't actually referring to the SFrame at all. "sf[1]" returns a dictionary with keys that match to the SFrame's column names, and values that match the second row of the SFrame. When you assign a number to "sf[1]['test']", you are changing the value of the "test" key in the dictionary that was returned, so the SFrame "sf" is not involved in the assignment. The correct way to reference only the second value of the column "test" and assign the value "3" is this:
sf['test'][1] = 3
which would return this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-c52dab41d5dd> in <module>()
----> 1 sf['test'][1] = 3
TypeError: 'SArray' object does not support item assignment

Related

Scala - Not enough arguments for method count

I am fairly new to Scala and Spark RDD programming. The dataset I am working with is a CSV file containing list of movies (one row per movie) and their associated user ratings (comma delimited list of ratings). Each column in the CSV represents a distinct user and what rating he/she gave the movie. Thus, user 1's ratings for each movie are represented in the 2nd column from the left:
Sample Input:
Spiderman,1,2,,3,3
Dr.Sleep, 4,4,,,1
I am getting the following error:
Task4.scala:18: error: not enough arguments for method count: (p: ((Int, Int)) => Boolean)Int.
Unspecified value parameter p.
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count()
when I execute the few lines below. For the program below, the second line of code splits all values delimited by "," and produces this:
( Spiderman, [[1,0],[2,1],[-1,2],[3,3],[3,4]] )
( Dr.Sleep, [[4,0],[4,1],[-1,2],[-1,3],[1,4]] )
On the third line, taking the count() throws an error. For each movie (row), I am trying to get the number of common elements. In the above example, [-1, 2] is clearly a common element shared by both Spiderman and Dr.Sleep.
val textFile = sc.textFile(args(0))
var movieRatings = textFile.map(line => line.split(","))
.map(movingRatingList => (movingRatingList(0), movingRatingList.drop(1)
.map(ranking => if (ranking.isEmpty) -1 else ranking.toInt).zipWithIndex));
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count() )).saveAsTextFile(args(1));
My target output of line 3 is as follows:
( Spiderman, Dr.Sleep, 1 ) --> Between these 2 movies, there is 1 common entry.
Can somebody please advise ?
To get the number of elements in a collection, use length or size. count() returns number of elements which satisfy some additional condition.
Or you could avoid building the complete intersection by using count to count the elements of the first collection which the second contains:
movieRating1._2.count(movieRating2._2.contains(_))
The error message seems pretty clear: count takes one argument, but in your call, you are passing an empty argument list, i.e. zero arguments. You need to pass one argument to count.

getting the value from a checkbox in Matlab 2018

I am upgrading my Matlab from 2013b to 2018b and have found out that MathWorks have made quite a few changes to the GUI's.
One problem I am having is getting the value of checkbox. The line below is the code I used to use but now it doesn't work.
if get(handles.check_perf_attr,'Value') == 1
The error message is,
Undefined operator '==' for input arguments of type 'cell'.
So I tried the line below to just get the value that is being returned and then apply some logic.
tValue = get(handles.check_perf_attr,'Value');
However tValue is 2 x 1 cell which in (1, 1) = 0 & (2, 1) = 1. I don't really understand this as surely a checkbox can only be one value true (1) or false (0)?
get returns a cell array with values when applied to an array of handles.
Thus, I think your problem is that handles.check_perf_attr contains two handles, not one.
"Dot notation is a new syntax to access object properties starting in R2014b."
so try
if handles.check_perf_attr.Value == 1
or
tValue = handles.check_perf_attr.Value;

How to find all values greater than 0 in a cell array in Matlab

I want to find and save all values greater than 0 in an array and save them in a variable called "times". How do I do that? And what is the difference between saving the indices of those cells versus the actual values of the cells?
This is what I have tried, but it must be worng because I get the error:
Undefined operator '>' for input arguments of type
'cell'.
clear all, close all
[num,txt,raw] = xlsread('test.xlsx');
times = find(raw(:,5)>0)
To access the contents of a cell you must use {} instead of ():
idx = find([raw{:, 5}] > 0);
But this gives you the index of the cells of raw containing a positive value. If you want the values instead, you can access them and collect them in a numeric array in this way:
times = [raw{idx, 5}];

Iterating Over Unique Values in Matlab

I've been trying to follow this answer in order to obtain unique strings from a given cell array. However, I'm running into trouble when iterating over these values. I have tried for loops as follows:
[unique_words, ~, occurrences] = unique(words);
unique_counts = hist(occurrences, 1:max(occurrences));
for a=1:numel(unique_words)
word = unique_words{a}
count = unique_counts{a}
result = result + a_struct.(unique_words{a}) + unique_counts{a}
end
When trying to reference the items like this, I receive the error:
Cell contents reference from a non-cell array object.
Changing the curly brackets to round brackets for unique_couts yields the error:
Reference to non-existent field 'N1'.
Changing both unique_words and unique_counts to round brackets yields:
Argument to dynamic structure reference must evaluate to a valid field name.
How am I to iterate over the results of unique?
unique_words is a cell array. unique_counts is a vector. So unique_words should be accessed using curly brackets and unique_counts using round ones. The error that you are getting in this case is related to the a_struct (which is not defined in the question) not having the corresponding field, not the access method.

For Matlab, Import data from xlsx, how to get 1st row as variable name, and rest of the column as data for variable name

In my excel file. The very first row in each column is a string. The rest of the column is data for that string, i.e
'time'
1
2
3
4
I want to take the first row in excel, and make that the variable name in Matlab, and the rest of the column data is numerical data for that variable. So in Matlab, time would be a column vector of numbers 1, 2 ,3 ,4.
I can't get this to work.
how about
[val nms] = xlsread( xlsFileName );
assert( size(val,2) == size(nms,2), 'mismatch number of columns and number of names');
for ci=1:size(val,2)
eval( [ nms{ci}, ' = val(:,ci);' ] ); % name the column
end
What makes this work:
This code calls xlsread with two output variables. This way xlsread puts the numeric data into the first variable, and text data into the second. See xlsread doc for more info.
Using eval to assign values to variable (time) which name is stored in another variable (nms{1}). The argument of the eval commands is a string time = val(:,1);, which the Matlab command that assigns the values of the first column of data (val(:,1)) to a new variable named time.