Create ordinal array with multiple groups - matlab

I need to categorize a dataset according to different age groups. The categorization depends on whether the Sex is Male or Female. I first subset the data by gender and then use the ordinal function (dataset is from a Matlab example). The following code crashes on the last line when I try to vertically concatenate the subsets:
load hospital;
subset_m=hospital(hospital.Sex=='Male',:);
subset_f=hospital(hospital.Sex=='Female',:);
edges_f=[0 20 max(subset_f.Age)];
edges_m=[0 30 max(subset_m.Age)];
labels_m = {'0-19','20+'};
labels_f = {'0-29','30+'};
subset_m.AgeGroup= ordinal(subset_m.Age,labels_m,[],edges_m);
subset_f.AgeGroup = ordinal(subset_f.Age,labels_f,[],edges_f);
vertcat(subset_m,subset_f);
Error using dataset/vertcat (line 76)
Could not concatenate the dataset variable 'AgeGroup' using VERTCAT.
Caused by:
Error using ordinal/vertcat (line 36)
Ordinal levels and their ordering must be identical.

Edit
It seems that a vital part was missing in the question, here is the answer to the corrected question. You need to use join rather than vertcat, for example:
joinFull = join(subset_f,subset_m,'LeftKeys','LastName','RightKeys','LastName','type','rightouter','mergekeys',true)
Solution of original problem
It seems like you are actually trying to work with the wrong variable. If I change all instances of hospitalCopy into hospital then everything works fine for me.
Perhaps you copied hospital and edited it, thus losing the validity of the input.
If you really need to have hospitalCopy make sure to assign to it directly after load hospital.
If this does not help, try using clear all before running the code and make sure there is no file called 'hospital' in your current directory.

Related

Cimplicity Screen - one object/button that is dependent on hundreds of points

So I have created a huge screen that essentially just shows the robot status for every robot in this factory (individually)… At the very end of the project, they decided they want one object on the screen that blinks if any of the 300 robots fault. I am trying to think of a way to make this work. Maybe a global script of some kind? Problem is, I do not do much scripting in Cimplicity, so any help is appreciated.
All the points that are currently used on this screen (to indicate a fault) have very similar names… as in, the beginning is the same… so I was thinking of a script that could maybe recognize if a bit is high based on PART of it's string name characteristic. The end will change a little each time, but I am sure there is a way to only look for part of a string and negate the rest. If the end has to be hard coded, that's fine.
You can use a Python script in Cimplicity.
I will not go into detail on the use of python in Cimplicity, which is well described in the documentation indicated above.
Here's an example of what can be done... note that I don't have a way to test it and, of course, this will work if the name of your robots in the declaration follows the format Robot_1, Robot_2, Robot_3 ... Robot_10 ... Robot_300 and it also depends on the Name and the Type of the fault variable... as you didn't define it, I imagine it can be an integer, with ZERO indicating no error. But if you use something other than that, you can easily change it.
import cimplicity
(...)
OneRobotWithFault = False
# Here you get the values and check for fault
for i in range(0, 300):
pointName = f'MyFactory.Robot_{i}.FaultCode'
robotFaultCode = cimplicity.point_get(pointName)
if robotFaultCode > 0:
OneRobotWithFault = True
break
# Set the status to the variable "WeHaveRobotWithFault"
cimplicity.point_set("WeHaveRobotWithFault", OneRobotWithFault)

Creating a structure within a structure with a dynamic name

I have large data sets which i want to work with in matlab.
I have a struct called Trail containing serveral structures called trail1, trail2 ...
which then contain several matrices. I now want to add another point to for instance trail1
I can do that with Trail.trail1.a2rotated(i,:) = rotpoint'; the problem is that i have to do it in a loop where the trail number as well as the a2rotated changes to e.g. a3rot...
I tired to do it like that
name ="trail"+num2str(z)+".a2rotated"+"("+i+",:)";
name = convertStringsToChars(name);
Trail.(name) = rotpoint'
But that gives me the error: Invalid field name: 'trail1.a2rotated(1,:)'.
Does someone have a solution?
The name in between brackets after the dot must be the name of a field of the struct. Other indexing operations must be done separately:
Trail.("trail"+z).a2rotated(i,:)
But you might be better off making trail(z) an array instead of separate fields with a number in the name.

How to create a variable in Matlab where certain subjects are coded as 1?

I want to create a variable called 'flag_artifact' where certain subjects from my dataset (for whom I know have bad quality images) are coded as e.g., 1. My dataset is stored in a table T with a certain number of rows and 'subject' is the 1st column in the table.
I managed to do it by creating a for loop. Surely there is a more efficient way to do this by perhaps directly creating the variable and using less lines of code? Could anyone give some advice?
Thank you very much! Here is what I have:
flag_artifact = {'T_300'}; %flagging subject number 300 for example
for i = 1:size(T,1)
if isequal(table2cell(T(i, 1)), flag_artifact)
T(i, 1) = {'1'};
end
end
However, when creating the variable flag_artifact = {'T_300'}, I would like it to include more than one subject. I tried using flag_artifact = {'T_300'}; {'T_301'}, as well as flag_artifact = {'T_300', 'T_301'} but it doesn't work because these subject identifiers do not get replaced with 1s.

How to set a name for the cell

I have a cell array. when I want create a cell array that its name is : '0691008752' in this case an error :"Invalid field name"
cellUsers.('0691008752') = ....
I know the reason for this error is that a number is called. But I do not know how I can set this name for the cell.
I agree with the comments above, prepending the field with a letter is the best solution for this problem..
One way to make this consistent is to use:
fname = matlab.lang.makeValidName('0691008752')
Its not widely known but you can have fields which begin with number - its bad practice and will almost certainly leads to bugs....
So how to do it, 1st you need to use mex, if you see the mathworks mex example and modify the appropriate line:
memcpy(fieldnames[0],"Doublestuff",sizeof("Doublestuff"));
to:
memcpy(fieldnames[0],"01234",sizeof("01234"));
After compiling and running you get:
Note: You can only access it through dynamic fields names. To update the field you must use mex.

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]