Matlab split text column in a table

Matlab split text column in a table - matlab

I have a table object in MatLab with a text column. This text column is a "tag" and contains underscores two split the tag.
I'd like to create a column with the second element of the tag. I used strsplit but It didn't work. Also I tried regexp but it gives me a cell object with 126 cells objects inside, and I don't know how to extract the second element of every cell.
Any suggestion?
Example:
a = {'a_b'; 'a_c';'a_n';'a_t'}
t = table(a)
I just want a vector with the second element.
Thanks.

How about
t=[t rowfun(#(x) x{1}(3),t)]
with 1 being the column and 3 being the element you want. For undefined length of the string parts it gets a little bit more tricky
t=[t rowfun(#(X) X{1}(strfind(X{1},'_')+1:end),t,'OutputFormat','cell')];
strfind() gets the '_' element so (find+1:end) is the rest of the string. as they can be of different length everything has to a cell as Output and then be added to the table. if the column changes you have to adopt the code in both {1}

Related

MATLAB Column headers not valid variable name

I am working in MATLAB and trying to add units to the Column headers to a table of values then I will insert into SQLite Database but I have a column names of German characters (e.g 'ß', 'ä'), but this is invalid because of the special characters. According to everything I've found thus far has said that column headers must be valid variable names, e.g. alphanumeric and "_" only.
But I can not change my original database column names so does anyone know of a work-around it?
My code of building a table and sending into database is:
insertData = cell2table(full_matrix,'VariableNames',colnames);
insert(conn,tableName,colnames,insertData);
And some of my column names:
'maß','kapazität', 'räder'
Thank you very much for helping.

Do you have to create the table first? I would try just passing the cell array of data directly to insert like so:
insert(conn, tableName, colnames, full_matrix);
The above assumes that it is the cell2table call that is giving you an error related to the special characters. If it's the insert call, then I guess MATLAB won't let you create databases with column names that don't conform to its variable naming conventions. If that's the case, you'll have to convert the column names to something valid, which you can do with either genvarname (for older MATLAB versions) or matlab.lang.makeValidName (suggested for versions R2014a and newer):
colNames = {'maß','kapazität', 'räder'};
validNames = genvarname(colNames);
% or...
validNames = matlab.lang.makeValidName(colNames, 'ReplacementStyle', 'hex');
validNames =
1×3 cell array
'ma0xDF' 'kapazit0xE4t' 'r0xE4der'
Note that the above solutions replace the invalid characters with their hex equivalents. You could also change the 'ReplacementStyle' to replace them with underscores or delete them altogether. I would go with the hex values because it gives you the option of converting the column names back to their original string values if you need those for anything later. Here's how you could do that using regexprep, hex2dec, and char:
originalNames = regexprep(validNames, '0x([\dA-F]{2})', '${char(hex2dec($1))}');
originalNames =
1×3 cell array
'maß' 'kapazität' 'räder'

Can table variable names start with a number character?

I am running something like this:
table = readtable(path,'ReadVariableNames',true);
In the .csv file all of the cells of the first row contain string identifiers like '99BM', '105CL', etc. Everything starts with a number. The command above gives a table with variable names like 'x99BM', 'x105CL', etc.
Is it possible to get rid of this 'x'? I need to compare these identifiers with a column of another table that is clear of this 'x'.

No, table variable names can't start with a number since they follow the same naming conventions as normal variables. The 'x' is automatically added by readtable to create a valid name. You should have noticed this warning when calling readtable:
Warning: Variable names were modified to make them valid MATLAB identifiers.
The original names are saved in the VariableDescriptions property.
So, you can't get rid of the 'x' within the table. But, if you need to do any comparisons, you can do them against the original values saved in the VariableDescriptions property, which will have this format:
>> T.Properties.VariableDescriptions
ans =
1×2 cell array
'Original column heading: '99BM'' 'Original column heading: '105CL''
You can parse these with a regular expression, for example:
originalNames = regexp(T.Properties.VariableDescriptions, '''(.+)''', 'tokens', 'once');
originalNames = vertcat(originalNames{:});
originalNames =
2×1 cell array
'99BM'
'105CL'
And then use these in any string comparisons you need to do.

No. As mentioned by gnovice, the readtable function automatically renames invalid names. It does this by calling matlab.lang.makevalidname and setting the output as the column name.
If I understand correctly, you're comparing the contents of a column from one table with the names of the columns of another table. In that case contains(['x' <contents of single row of column>], table.VariableNames) will prepend x to the value in a row of the table column (for this implementation, you need to loop through every row of the table) and then compare this string with the variable names of the table. You can also do this in a single line with arrayfun or something but I am doing this from memory right now and can't recall the correct syntax.

converting 1x1 matrix to a variable

I read the data from the csv which contains two columns id which text/string and the cancer which is 1/0. please see the code be
M = readtable('data.csv');
I try to access the very first value using
row= M(n,1); //It's from the ID column which is text
But it comes in the form of a 1x1matrix, and I am unable to put it in a single variable.
for example I want after the above line works row should contain a string in it like. row = 'patientID'. Now is there anyway to convert it into a single value?

Use row = M{n,1}. Note the curly braces.
The curly braces say "get the contents of the table", as opposed to the circular brackets you had been using which say "get me a portion of the table, as a table".

How to import column of numbers from mixed string numerical text file

Variations of this question have already been asked several times, for example here. However, I can't seem to get this to work for my data.
I have a text file with 3 columns. First and third columns are floating point numbers. Middle column is strings. I'm only interested in getting the first column really.
Here's what I tried:
filename=fopen('heartbeatn1nn.txt');
A = textscan(filename,'%f','HeaderLines',0);
fclose(filename);
When I do this A comes out as just a single number--the first element in the column. How do I get the whole column? I've also tried this with the '.tsv' file extension, same result.
Also tried:
filename=fopen('heartbeatn1nn.txt');
formatSpec='%f';sizeA=[1 Inf];
A = fscanf(filename,formatSpec,sizeA);
fclose(filename);
with same result.
Could the file size be a problem? Not sure how many rows but probably quite a few since file size is 1.7M.

Assuming the columns in your text file are separated by single whitespace characters your format specification should look like this:
A = textscan(filename,'%f %s %f');
A now contains the complete file content. To obtain the first column:
first_col = A{:,1};
Alternatively, you can tell textscan to skip the unneeded fields with the * option:
first_col = cell2mat( textscan(filename, '%f %*s %*f') );
This returns only the first column.

How to randomly select from a list of 47 names that are entered from a data file?

I have managed to input a number data file into a matrix but have been unable to do so for any data that is not a number.
I have a list of 47 names and supposed to generate a random name from the list. I have tried to use the function textscan but was not going anywhere. Also how do I generate a random name from the list? All I have been able to do was generate a random number between 1 to 47.
Appreciate the replies. I should have said I need it in MATLAB sorry.
Here is a sample list of data in my data file
name01
name02
name03
and the code to read it:
fid = fopen('names.dat','rt');
headerChars = fgetl(fid);
data = fscanf(fid,'%f,%f,%f,%f',[4 47]).';
fclose(fid);
The above is what I have to read the data file into a matrix but it is only reading the first line. (Yes it was modified from a previous post here on this forums :/)

Edit: As per the helpful comments from mtrw, and the fixed formatting of the sample data file, I've updated my answer with more detail.
With a single name (i.e. "Bob", "Bob Smith", or "Smith, Bob") on each line of the file, you can use the function TEXTSCAN by specifying '%s' as the format argument (to denote reading a string) and the newline character '\n' as the 'Delimiter' (the character that separates the strings in the file):
fid = fopen('namefile.txt','r');
names = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
Then it's a matter of randomly picking one of the names. You can use the function RANDI to generate a random integer in the range from 1 to the number of names read from the file (found using the NUMEL function):
names = names{1}; %# Get the contents from the cell returned by TEXTSCAN
selectedName = names{randi(numel(names))};

Sounds like you're halfway home. Take that random number and use it as an index for the list.
For example, if you randomly generate the number 23 then fetch the 23rd entry in the list which gives you a random name draw.

Use the RANDOMBETWEEN function to get a random number within your range. Use INDEX to get the actual cell value. For instance:
=INDEX(A1:A47, RANDBETWEEN(1, 47))
The above will work for your specific case of 47 names, assuming they're in column A. In general, you'd want something like:
=INDEX(MyNames, RANDBETWEEN(ROW(MyNames), ROW(MyNames) + ROWS(MyNames) - 1))
This assumes you've named your range of cells "MyNames" (for example, by selecting all the cells in your range and setting a name in the naming box). The above formula works by using the ROW function to return the top row of the MyNames array and the ROWS function to get the total rows in MyNames.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Matlab split text column in a table - matlab

Related

MATLAB Column headers not valid variable name

Can table variable names start with a number character?

converting 1x1 matrix to a variable

How to import column of numbers from mixed string numerical text file

How to randomly select from a list of 47 names that are entered from a data file?

Categories

Resources