How to randomly select from a list of 47 names that are entered from a data file? - matlab

I have managed to input a number data file into a matrix but have been unable to do so for any data that is not a number.
I have a list of 47 names and supposed to generate a random name from the list. I have tried to use the function textscan but was not going anywhere. Also how do I generate a random name from the list? All I have been able to do was generate a random number between 1 to 47.
Appreciate the replies. I should have said I need it in MATLAB sorry.
Here is a sample list of data in my data file
name01
name02
name03
and the code to read it:
fid = fopen('names.dat','rt');
headerChars = fgetl(fid);
data = fscanf(fid,'%f,%f,%f,%f',[4 47]).';
fclose(fid);
The above is what I have to read the data file into a matrix but it is only reading the first line. (Yes it was modified from a previous post here on this forums :/)

Edit: As per the helpful comments from mtrw, and the fixed formatting of the sample data file, I've updated my answer with more detail.
With a single name (i.e. "Bob", "Bob Smith", or "Smith, Bob") on each line of the file, you can use the function TEXTSCAN by specifying '%s' as the format argument (to denote reading a string) and the newline character '\n' as the 'Delimiter' (the character that separates the strings in the file):
fid = fopen('namefile.txt','r');
names = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
Then it's a matter of randomly picking one of the names. You can use the function RANDI to generate a random integer in the range from 1 to the number of names read from the file (found using the NUMEL function):
names = names{1}; %# Get the contents from the cell returned by TEXTSCAN
selectedName = names{randi(numel(names))};

Sounds like you're halfway home. Take that random number and use it as an index for the list.
For example, if you randomly generate the number 23 then fetch the 23rd entry in the list which gives you a random name draw.

Use the RANDOMBETWEEN function to get a random number within your range. Use INDEX to get the actual cell value. For instance:
=INDEX(A1:A47, RANDBETWEEN(1, 47))
The above will work for your specific case of 47 names, assuming they're in column A. In general, you'd want something like:
=INDEX(MyNames, RANDBETWEEN(ROW(MyNames), ROW(MyNames) + ROWS(MyNames) - 1))
This assumes you've named your range of cells "MyNames" (for example, by selecting all the cells in your range and setting a name in the naming box). The above formula works by using the ROW function to return the top row of the MyNames array and the ROWS function to get the total rows in MyNames.

Related

Can table variable names start with a number character?

I am running something like this:
table = readtable(path,'ReadVariableNames',true);
In the .csv file all of the cells of the first row contain string identifiers like '99BM', '105CL', etc. Everything starts with a number. The command above gives a table with variable names like 'x99BM', 'x105CL', etc.
Is it possible to get rid of this 'x'? I need to compare these identifiers with a column of another table that is clear of this 'x'.
No, table variable names can't start with a number since they follow the same naming conventions as normal variables. The 'x' is automatically added by readtable to create a valid name. You should have noticed this warning when calling readtable:
Warning: Variable names were modified to make them valid MATLAB identifiers.
The original names are saved in the VariableDescriptions property.
So, you can't get rid of the 'x' within the table. But, if you need to do any comparisons, you can do them against the original values saved in the VariableDescriptions property, which will have this format:
>> T.Properties.VariableDescriptions
ans =
1×2 cell array
'Original column heading: '99BM'' 'Original column heading: '105CL''
You can parse these with a regular expression, for example:
originalNames = regexp(T.Properties.VariableDescriptions, '''(.+)''', 'tokens', 'once');
originalNames = vertcat(originalNames{:});
originalNames =
2×1 cell array
'99BM'
'105CL'
And then use these in any string comparisons you need to do.
No. As mentioned by gnovice, the readtable function automatically renames invalid names. It does this by calling matlab.lang.makevalidname and setting the output as the column name.
If I understand correctly, you're comparing the contents of a column from one table with the names of the columns of another table. In that case contains(['x' <contents of single row of column>], table.VariableNames) will prepend x to the value in a row of the table column (for this implementation, you need to loop through every row of the table) and then compare this string with the variable names of the table. You can also do this in a single line with arrayfun or something but I am doing this from memory right now and can't recall the correct syntax.

How to read a number from text file via Matlab

I have 1000 text files and want to read a number from each file.
format of text file as:
af;laskjdf;lkasjda123241234123
$sakdfja;lskfj12352135qadsfasfa
falskdfjqwr1351
##alskgja;lksjgklajs23523,
asdfa#####1217653asl123654fjaksj
asdkjf23s#q23asjfklj
asko3
I need to read the number ("1217653") behind "#####" in each txt file.
The number will follow the "#####" closely in all text file.
"#####" and the close following number just appear one time in each file.
clc
clear
MyFolderInfo = dir('yourpath/folder');
fidin = fopen(file_name,'r','n','utf-8');
while ~feof(fidin)
tline=fgetl(fidin);
disp(tline)
end
fclose(fidin);
It is not finish yet. I am stuck with the problem that it can not read after the space line.
This is another approach using the function regex. This will easily provide a more advanced way of reading files and does not require reading the full file in one go. The difference from the already given example is basically that I read the file line-by-line, but since the example use this approach I believe it is worth answering. This will return all occurences of "#####NUMBER"
function test()
h = fopen('myfile.txt');
str = fgetl(h);
k = 1;
while (isempty(str) | str ~= -1 ) % Empty line returns empty string and EOF returns -1
res{k} = regexp(str,'#####\d+','match');
k = k+1;
str = fgetl(h);
end
for k=1:length(res)
disp(res{k});
end
EDIT
Using the expression '#####(\d+)' and the argument 'tokens' instead of 'match' Will actually return the digits after the "#####" as a string. The intent with this post was also, apart from showing another way to read the file, to show how to use regexp with a simple example. Both alternatives can be used with suitable conversion.
Assuming the following:
All files are ASCII files.
The number you are looking to extract is directly following #####.
The number you are looking for is a natural number.
##### followed by a number only occurs once per file.
You can use this code snippet inside a for loop to extract each number:
regx='#####(\d+)';
str=fileread(fileName);
num=str2double(regexp(str,regx,'tokens','once'));
Example of for loop
This code will iterate through ALL files in yourpath/folder and save the numbers into num.
regx='#####(\d+)'; % Create regex
folderDir='yourpath/folder';
files=cellstr(ls(folderDir)); % Find all files in folderDir
files=files(3:end); % remove . and ..
num=zeros(1,length(files)); % Pre allocate
for i=1:length(files) % Iterate through files
str=fileread(fullfile(folderDir,files{i})); % Extract str from file
num(i)=str2double(regexp(str,regx,'tokens','once')); % extract number using regex
end
If you want to extract more ''advanced'' numbers e.g. Integers or Real numbers, or handle several occurrences of #####NUMBER in a file you will need to update your question with a better representation of your text files.

looping and ordering extracted data in MatLab

I have hundreds of data files, named file001~file400 for example, and should pick a numeric value from each. I know how to pick each number, but, since there are too many files, I need to loop the command and order all extracted numbers with regard to the number of corresponding file. I appreciate any help.
The problem is to loop over all files. This can be done in the following way:
for i = 1:400
filename = sprintf('file%03d',i);
// do the number picking, etc. using the filename.
end
EDIT: Per request, for filenames FT00100 to FT05320, we we make two small changes, one in the loop range and one in the first parameter to sprintf:
for i = 100:5320
filename = sprintf('FT%05d',i);
// do the number picking, etc. using the filename.
end

How to import column of numbers from mixed string numerical text file

Variations of this question have already been asked several times, for example here. However, I can't seem to get this to work for my data.
I have a text file with 3 columns. First and third columns are floating point numbers. Middle column is strings. I'm only interested in getting the first column really.
Here's what I tried:
filename=fopen('heartbeatn1nn.txt');
A = textscan(filename,'%f','HeaderLines',0);
fclose(filename);
When I do this A comes out as just a single number--the first element in the column. How do I get the whole column? I've also tried this with the '.tsv' file extension, same result.
Also tried:
filename=fopen('heartbeatn1nn.txt');
formatSpec='%f';sizeA=[1 Inf];
A = fscanf(filename,formatSpec,sizeA);
fclose(filename);
with same result.
Could the file size be a problem? Not sure how many rows but probably quite a few since file size is 1.7M.
Assuming the columns in your text file are separated by single whitespace characters your format specification should look like this:
A = textscan(filename,'%f %s %f');
A now contains the complete file content. To obtain the first column:
first_col = A{:,1};
Alternatively, you can tell textscan to skip the unneeded fields with the * option:
first_col = cell2mat( textscan(filename, '%f %*s %*f') );
This returns only the first column.

How to avoid the repeated paragraghs of long txt files being ignored for importdata in matlab

I am trying to import all double from a txt file, which has this form
#25x1 string
#9999x2 double
.
.
.
#(repeat ten times)
However, when I am trying to use import Wizard, only the first
25x1 string
9999x2 double.
was successfully loaded, the other 9 were simply ignored
How may I import all the data? (Does importdata has a maximum length or something?)
Thanks
It's nothing to do with maximum length, importdata is just not set up for the sort of data file you describe. From the help file:
For ASCII files and spreadsheets, importdata expects
to find numeric data in a rectangular form (that is, like a matrix).
Text headers can appear above or to the left of the numeric data,
as follows:
Column headers or file description text at the top of the file, above
the numeric data. Row headers to the left of the numeric data.
So what is happening is that the first section of your file, which does match the format importdata expects, is being read, and the rest ignored. Instead of importdata, you'll need to use textscan, in particular, this style:
C = textscan(fileID,formatSpec,N)
fileID is returned from fopen. formatspec tells textscan what to expect, and N how many times to repeat it. As long as fileID remains open, repeated calls to textscan continue to read the file from wherever the last read action stopped - rather than going back to the start of the file. So we can do this:
fileID = fopen('myfile.txt');
repeats = 10;
for n = 1:repeats
% read one string, 25 times
C{n,1} = textscan(fileID,'%s',25);
% read two floats, 9999 times
C{n,2} = textscan(fileID,'%f %f',9999);
end
You can then extract your numerical data out of the cell array (if you need it in one block you may want to try using 'CollectOutput',1 as an option).