MATLAB Column headers not valid variable name - matlab

I am working in MATLAB and trying to add units to the Column headers to a table of values then I will insert into SQLite Database but I have a column names of German characters (e.g 'ß', 'ä'), but this is invalid because of the special characters. According to everything I've found thus far has said that column headers must be valid variable names, e.g. alphanumeric and "_" only.
But I can not change my original database column names so does anyone know of a work-around it?
My code of building a table and sending into database is:
insertData = cell2table(full_matrix,'VariableNames',colnames);
insert(conn,tableName,colnames,insertData);
And some of my column names:
'maß','kapazität', 'räder'
Thank you very much for helping.

Do you have to create the table first? I would try just passing the cell array of data directly to insert like so:
insert(conn, tableName, colnames, full_matrix);
The above assumes that it is the cell2table call that is giving you an error related to the special characters. If it's the insert call, then I guess MATLAB won't let you create databases with column names that don't conform to its variable naming conventions. If that's the case, you'll have to convert the column names to something valid, which you can do with either genvarname (for older MATLAB versions) or matlab.lang.makeValidName (suggested for versions R2014a and newer):
colNames = {'maß','kapazität', 'räder'};
validNames = genvarname(colNames);
% or...
validNames = matlab.lang.makeValidName(colNames, 'ReplacementStyle', 'hex');
validNames =
1×3 cell array
'ma0xDF' 'kapazit0xE4t' 'r0xE4der'
Note that the above solutions replace the invalid characters with their hex equivalents. You could also change the 'ReplacementStyle' to replace them with underscores or delete them altogether. I would go with the hex values because it gives you the option of converting the column names back to their original string values if you need those for anything later. Here's how you could do that using regexprep, hex2dec, and char:
originalNames = regexprep(validNames, '0x([\dA-F]{2})', '${char(hex2dec($1))}');
originalNames =
1×3 cell array
'maß' 'kapazität' 'räder'

Related

Table fields digit change when replacing data of different types

Based on this post (Table fields in format in PDF report generator - Matlab), I run the script on a table. In the script, I extract one column, run the function on it, and replace the old column with this new one in identical but copied table (of the original). Strangely, I get that the modified column data is not stored correctly in the new table. Why? I suspect that is it is because of the categorical. How can I fix it?
Code:
clc;
clear all;
dataInput = webread('https://people.sc.fsu.edu/~jburkardt/data/csv/hw_25000.csv');
tempCol = (f(table2array(dataInput(:,2))));
newDataInput = dataInput;
newDataInput(:,2) = table(tempCol);
function FivDigsStr = f(x)
%formatting to character array with 5 significant digits and then splitting.
%at each tab. categorical is needed to remove ' 's that appear around char
%in the output PDF file with newer MATLAB versions
%e.g. with R2018a, there are no ' ' in the output file but ' ' appears with R2020a
FivDigsStr = categorical(split(sprintf('%0.5G\t',x)));
%Removing the last (<undefined>) value (which is included due to \t)
FivDigsStr = FivDigsStr(1:end-1);
end
You cannot replace values of one type with values of some other type. Instead, just rewrite the column. i.e use:
newDataInput.(newDataInput.Properties.VariableNames{2}) = tempCol;
instead of:
newDataInput(:,2) = table(tempCol);

Can table variable names start with a number character?

I am running something like this:
table = readtable(path,'ReadVariableNames',true);
In the .csv file all of the cells of the first row contain string identifiers like '99BM', '105CL', etc. Everything starts with a number. The command above gives a table with variable names like 'x99BM', 'x105CL', etc.
Is it possible to get rid of this 'x'? I need to compare these identifiers with a column of another table that is clear of this 'x'.
No, table variable names can't start with a number since they follow the same naming conventions as normal variables. The 'x' is automatically added by readtable to create a valid name. You should have noticed this warning when calling readtable:
Warning: Variable names were modified to make them valid MATLAB identifiers.
The original names are saved in the VariableDescriptions property.
So, you can't get rid of the 'x' within the table. But, if you need to do any comparisons, you can do them against the original values saved in the VariableDescriptions property, which will have this format:
>> T.Properties.VariableDescriptions
ans =
1×2 cell array
'Original column heading: '99BM'' 'Original column heading: '105CL''
You can parse these with a regular expression, for example:
originalNames = regexp(T.Properties.VariableDescriptions, '''(.+)''', 'tokens', 'once');
originalNames = vertcat(originalNames{:});
originalNames =
2×1 cell array
'99BM'
'105CL'
And then use these in any string comparisons you need to do.
No. As mentioned by gnovice, the readtable function automatically renames invalid names. It does this by calling matlab.lang.makevalidname and setting the output as the column name.
If I understand correctly, you're comparing the contents of a column from one table with the names of the columns of another table. In that case contains(['x' <contents of single row of column>], table.VariableNames) will prepend x to the value in a row of the table column (for this implementation, you need to loop through every row of the table) and then compare this string with the variable names of the table. You can also do this in a single line with arrayfun or something but I am doing this from memory right now and can't recall the correct syntax.

converting 1x1 matrix to a variable

I read the data from the csv which contains two columns id which text/string and the cancer which is 1/0. please see the code be
M = readtable('data.csv');
I try to access the very first value using
row= M(n,1); //It's from the ID column which is text
But it comes in the form of a 1x1matrix, and I am unable to put it in a single variable.
for example I want after the above line works row should contain a string in it like. row = 'patientID'. Now is there anyway to convert it into a single value?
Use row = M{n,1}. Note the curly braces.
The curly braces say "get the contents of the table", as opposed to the circular brackets you had been using which say "get me a portion of the table, as a table".

Matlab : Read a file name in string format from a .csv file

I am having a .csv file which contains let's say 50 rows.
At the beginning of each row I have a file name in the following format 001_02_03.bmp followed by values separated by commas. Something like this :
001_02_03.bmp,20,30,45,10,40,20
Can someone tell me how can I read the first column from the data?
I know how to obtain the data from the second column onward. I am using the csvread function like this X = csvread('filename.csv', 0, 1);. If I try to read the first column in the same manner it outputs an error, saying the csvread does not support string format.
Use textscan, ie:
fid1 = fopen(csvFileName);
X = textscan(fid1, '%s%f%f%f%f%f%f', 'Delimiter', ',');
fclose(fid1);
FirstCol = X{1, 1};
A little more detail? csvread only works with purely numeric data, so you can't use it to get in data with a .bmp, or underscores for that matter. Thus we use textscan. The funny looking format string input to textscan is just saying that the columns are, in order, of type string %s, then the next 6 columns are of type double %f%f%f%f%f%f (or you might choose to alter this to reflect an integer datatype - I personally rarely bother unless the quantity of data is huge or floating point precision is a problem).
Note, if you just wanted to get the first column and ignore the rest, you can replace the format string with %s% %*[^\n]. A final point, if your csv file has a header line, you can skip it using the HeaderLines optional input to textscan.

How to randomly select from a list of 47 names that are entered from a data file?

I have managed to input a number data file into a matrix but have been unable to do so for any data that is not a number.
I have a list of 47 names and supposed to generate a random name from the list. I have tried to use the function textscan but was not going anywhere. Also how do I generate a random name from the list? All I have been able to do was generate a random number between 1 to 47.
Appreciate the replies. I should have said I need it in MATLAB sorry.
Here is a sample list of data in my data file
name01
name02
name03
and the code to read it:
fid = fopen('names.dat','rt');
headerChars = fgetl(fid);
data = fscanf(fid,'%f,%f,%f,%f',[4 47]).';
fclose(fid);
The above is what I have to read the data file into a matrix but it is only reading the first line. (Yes it was modified from a previous post here on this forums :/)
Edit: As per the helpful comments from mtrw, and the fixed formatting of the sample data file, I've updated my answer with more detail.
With a single name (i.e. "Bob", "Bob Smith", or "Smith, Bob") on each line of the file, you can use the function TEXTSCAN by specifying '%s' as the format argument (to denote reading a string) and the newline character '\n' as the 'Delimiter' (the character that separates the strings in the file):
fid = fopen('namefile.txt','r');
names = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
Then it's a matter of randomly picking one of the names. You can use the function RANDI to generate a random integer in the range from 1 to the number of names read from the file (found using the NUMEL function):
names = names{1}; %# Get the contents from the cell returned by TEXTSCAN
selectedName = names{randi(numel(names))};
Sounds like you're halfway home. Take that random number and use it as an index for the list.
For example, if you randomly generate the number 23 then fetch the 23rd entry in the list which gives you a random name draw.
Use the RANDOMBETWEEN function to get a random number within your range. Use INDEX to get the actual cell value. For instance:
=INDEX(A1:A47, RANDBETWEEN(1, 47))
The above will work for your specific case of 47 names, assuming they're in column A. In general, you'd want something like:
=INDEX(MyNames, RANDBETWEEN(ROW(MyNames), ROW(MyNames) + ROWS(MyNames) - 1))
This assumes you've named your range of cells "MyNames" (for example, by selecting all the cells in your range and setting a name in the naming box). The above formula works by using the ROW function to return the top row of the MyNames array and the ROWS function to get the total rows in MyNames.