Textscan from end of line - matlab

I am trying to read a very heavy file. On each line I have some integers but the numbers of integers is not known. I just want to extract the last n items. I couldn't find the right syntax for doing this.
Example:
lineA='10 200 300 400 500';
lineB='300 400 500 550';
pA=textscan(lineA,'%u %u %u');
pB=textscan(lineB,'%u %u %u');
The results should be:
pA={[300]} {[400]} {[500]}
pB={[400]} {[500]} {[550]}
Currently I am not able to know the size of each line and I want to avoid having to. On this example I just read lines but in my actual script I read a file with 10e6 lines and I use the syntax textscan(fid,format,10e6).

Related

How to pass multiple comment style to skip the header of a text file?

I am trying to read hundreds of .dat file by skipping header lines (I do not know how many of them I need to skip beforehand). Header lines very from 1 to 20 and have at beginning either or "$" oder "!".
A sample data (left column - node, right column - microstructure) has always two columns and looks like the following:
!===
!Comment
$Material
1 1.452E-001
2 1.446E-001
3 1.459E-001
I tried the following codeline, assuming I know beforehand that there 3 lines in header:
fid = fopen('Graphite_Node_Test.dat') ;
data = textscan(fid,'%f %f','HeaderLines',3) ;
fclose(fid);
This solution works if the number of header lines is known. How can I change the code so that it can read the .dat file without knowing the number of header lines beginning with either "$" or "!" sign?

Index out of bounds after reading a text file

I have the following simple code, and I tried to use one of the indices from the .txt file. The index that I want is at (4,1) while the size of my matrix in the .txt file is (8,4). When I run the code, MATLAB give me the following error;
Attempted to access q(4,1); index out of
bounds because size(q)=[1,601]
Can someone help me understand why I receive the error and how to fix it?
Here is the code:
q = fileread('sv11edit.txt');
toe = q(4,1)
The answer will depend on the format of the file sv11edit.txt. However, fileread returns a string of characters. In this case, it gives you a string that is 601 characters long. You receive an error because you assume that q is 8 by 4, but this is not the case.
Check what is being stored in q before you try anything like the second line of your code. The function load may be a better alternative to fileread.

How to avoid the repeated paragraghs of long txt files being ignored for importdata in matlab

I am trying to import all double from a txt file, which has this form
#25x1 string
#9999x2 double
.
.
.
#(repeat ten times)
However, when I am trying to use import Wizard, only the first
25x1 string
9999x2 double.
was successfully loaded, the other 9 were simply ignored
How may I import all the data? (Does importdata has a maximum length or something?)
Thanks
It's nothing to do with maximum length, importdata is just not set up for the sort of data file you describe. From the help file:
For ASCII files and spreadsheets, importdata expects
to find numeric data in a rectangular form (that is, like a matrix).
Text headers can appear above or to the left of the numeric data,
as follows:
Column headers or file description text at the top of the file, above
the numeric data. Row headers to the left of the numeric data.
So what is happening is that the first section of your file, which does match the format importdata expects, is being read, and the rest ignored. Instead of importdata, you'll need to use textscan, in particular, this style:
C = textscan(fileID,formatSpec,N)
fileID is returned from fopen. formatspec tells textscan what to expect, and N how many times to repeat it. As long as fileID remains open, repeated calls to textscan continue to read the file from wherever the last read action stopped - rather than going back to the start of the file. So we can do this:
fileID = fopen('myfile.txt');
repeats = 10;
for n = 1:repeats
% read one string, 25 times
C{n,1} = textscan(fileID,'%s',25);
% read two floats, 9999 times
C{n,2} = textscan(fileID,'%f %f',9999);
end
You can then extract your numerical data out of the cell array (if you need it in one block you may want to try using 'CollectOutput',1 as an option).

How to get the number of columns of a csv file?

I have a huge csv file that I want to load with matlab. However, I'm only interested in specific columns that I know the name.
As a first step, I would like to just check how many columns the csv file has. How can I do that with matlab?
As Jonesy and erelender suggest, I would think this will do it:
fid=fopen(filename);
tline = fgetl(fid);
fclose(fid);
length(find(tline==','))+1
Since you don't seem to know what kind of carriage return character (or character encoding?) is being used then I would suggest progressively sampling your file until you encounter a recognizable CR character. One way to do this is to loop over something like
A = fscanf(fileID, ['%' num2str(N) 'c'], sizeA);
where N is the number of characters to read. At each iteration test A for presence of carriage return characters, stop if one is encountered. Once you know where the carriage return is just repeat with the right N and perform the length(find...) operation, or alternately accumulate the number of commas at each iteration. You may want to check that your file is being read along rows (is it always?), check a few samples to make sure it is.
1-) Read the first line of file
2-) Count the number of commas, or seperator characters if it is not comma
3-) Add 1 to the count and the result is the number of columns in the file.
If the csv has only numeric value you can use:
M=csvread('file_name.csv');
[row,col]=size(M);

How to randomly select from a list of 47 names that are entered from a data file?

I have managed to input a number data file into a matrix but have been unable to do so for any data that is not a number.
I have a list of 47 names and supposed to generate a random name from the list. I have tried to use the function textscan but was not going anywhere. Also how do I generate a random name from the list? All I have been able to do was generate a random number between 1 to 47.
Appreciate the replies. I should have said I need it in MATLAB sorry.
Here is a sample list of data in my data file
name01
name02
name03
and the code to read it:
fid = fopen('names.dat','rt');
headerChars = fgetl(fid);
data = fscanf(fid,'%f,%f,%f,%f',[4 47]).';
fclose(fid);
The above is what I have to read the data file into a matrix but it is only reading the first line. (Yes it was modified from a previous post here on this forums :/)
Edit: As per the helpful comments from mtrw, and the fixed formatting of the sample data file, I've updated my answer with more detail.
With a single name (i.e. "Bob", "Bob Smith", or "Smith, Bob") on each line of the file, you can use the function TEXTSCAN by specifying '%s' as the format argument (to denote reading a string) and the newline character '\n' as the 'Delimiter' (the character that separates the strings in the file):
fid = fopen('namefile.txt','r');
names = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
Then it's a matter of randomly picking one of the names. You can use the function RANDI to generate a random integer in the range from 1 to the number of names read from the file (found using the NUMEL function):
names = names{1}; %# Get the contents from the cell returned by TEXTSCAN
selectedName = names{randi(numel(names))};
Sounds like you're halfway home. Take that random number and use it as an index for the list.
For example, if you randomly generate the number 23 then fetch the 23rd entry in the list which gives you a random name draw.
Use the RANDOMBETWEEN function to get a random number within your range. Use INDEX to get the actual cell value. For instance:
=INDEX(A1:A47, RANDBETWEEN(1, 47))
The above will work for your specific case of 47 names, assuming they're in column A. In general, you'd want something like:
=INDEX(MyNames, RANDBETWEEN(ROW(MyNames), ROW(MyNames) + ROWS(MyNames) - 1))
This assumes you've named your range of cells "MyNames" (for example, by selecting all the cells in your range and setting a name in the naming box). The above formula works by using the ROW function to return the top row of the MyNames array and the ROWS function to get the total rows in MyNames.