Avoid list being truncated by Matlab - matlab

Edit 1 - This question has been solved and it was due to a typo thanks to Floris for spotting this.
I have a one line matrix in Matlab which it is truncating and causing me to loose data.
My code reads:
[status,Vf_rpm_string] = system (fragment_velocity_string);
Vf_rpm_shape=regexprep(Vf_rpm_string,'\n',' ');
Vf_rpm_vector=str2num(Vf_rpm_string);
Vf_rpm= reshape(Vf_rpm_vector,[],1);
The code conducts a system command and stores the result, the result is a Matrix of numbers and sometimes the last line in the matrix has less columns than the previous lines. Matlab doesn't like this, as it does not know what to do with the empty few columns in the last line. So I have to remove the new line character from the results (\n) and replace it with a space.
This was working fine until the results from the system command were too large and so when I remove the new line character (\n) and replace it with a space creating a one line matrix it is too long for Matlab and it truncates it and I loose a lot of my data. So when I convert the returned data (which is returned as a string) to a number it gives me an empty matrix, then the reshape command is pointless at this point.
This is how it reads in Matlab:
20.65866342... Output truncated. Text exceeds maximum line length of 25,000 characters for Command Window display.
So the 20.65866342 is the last value before I start to loose data. I know it says it is too large for the command window but still the variable does not store all the data and it is lost.
Does anyone have any solutions to avoid this truncation?
Or does anyone want to suggest an alternative method for me to convert my data?
I am using Matlab 2012b and Windows 7
Thanks for your time.

Could the problem be that you strip the newlines, but the stripped string isn't the one you are parsing?
[status,Vf_rpm_string] = system (fragment_velocity_string);
Vf_rpm_shape=regexprep(Vf_rpm_string,'\n',' ');
Vf_rpm_vector=str2num(Vf_rpm_string);
Vf_rpm= reshape(Vf_rpm_vector,[],1);
That third line of code should be
Vf_rpm_vector=str2num(Vf_rpm_shape);
if I understand the logic of your code.

Related

Octave - Convert number-strings from CSV import file to numerical matrix

I'm writing a code to import data from a CSV file and convert it into an Octave matrix. The imported data can be seen in the following snap:
In the next step I added the following command to delete the commas and "":
meas_raw_temp = strrep(meas_raw_temp,',',' ')
And then I get the data format in the following form:
The problem is that Octave still sees the data as 1 single 1-dimensional array. i.e., when I use the size command I get a single number, i.e. 2647. What I need to have is a matrix output, with each line of the snaps being a row of the matrix, and with each element separated.
Any thoughts?
Here's what's happening.
You have a 1-dimensional (rows only) cell array. Each element (i.e. cell) in the cell array contains a single string.
Your strings contain commas and literal double-quotes in them. You have managed to get rid of them all by replacing them in-place with an 'empty string'. Good. However that doesn't change the fact that you still have a single string per cell.
You need to create a for loop to process each cell separately. For each cell, split the string into its components (i.e. 'tokens') using ' ' (i.e. space) as the delimiter. You can use either strsplit or strtok appropriately to achieve this. Note that the 'tokens' you get out of this process are still of 'string' type. If you want numbers, you'll need to convert them (e.g. using str2double or something equivalent).
For each cell you process, find a way to fill the corresponding row of a preallocated matrix.
As Adriaan has pointed out in the comments, the exact manner in which you follow the steps above programmatically can vary, therefore I'm not going to provide the range of possible ways that you could do so, and I'm limiting the answer to this question to the steps above, which is how you should think about solving your problem.
If you attempt these steps and get stuck on a 'programmatic' aspect of your implementation, feel free to ask another stackoverflow question.

Reading data from .txt file into Matlab

I have been trying in vain for days to do one seemingly simple thing--I want to read data from a .txt file that looks like this:
0.221351321
0.151351321
0.235165165
8.2254546 E-7
into Matlab. I've been able to load the data in the .txt file as a column vector using the fscanf command, like so:
U=fscanf(FileID, '%e')
provided that I go through the file first and remove the space before the 'E' wherever scientific notation occurs in the data set.
Since I have to generate a large number of such sets, it would be impractical to have to do a search-and-replace for every .txt file.
Is there a way for matlab to read the data as it appears, as in the above example (with the space preceding 'E'), and put it into a column vector?
For anyone who knows PARI-GP, an alternate fix would be to have the output devoid of spaces in the first place--but so far I haven't found a way to erase the space before 'E' in scientific notation, and I can't predict if a number in scientific notation will appear or not in the data set.
Thank you!
Thank you all for your help, I have found a solution. There is a way to eliminate the space from PARI-GP, so that the output .txt file has no spaces to begin with. I had the output set to "prettymatrix". One needs to enter the following:
? \o{0}
to change the output to "Raw," which eliminates the space before the "E" in scientific notation.
Thanks again for your help.
A simple way, may not be the best, is to read line by line, remove the space and convert back to floating point number.
For example,
x = []
tline = fgetl(FileID);
while ischar(tline)
x = [x str2num(tline(find(~isspace(tline))))]
tline = fgetl(FileID);
end
One liner:
data = str2double(strsplit(strrep(fileread('filename.txt'),' ',''), '\n'));
strrep removes all the spaces, strsplit takes each line as a separate string, and str2double coverts the strings to numbers.

Octave / Matlab - Reading fixed width file

I have a fixed width file format (original was input for a Fortran routine). Several lines of the file look like the below:
1078.0711005.481 932.978 861.159 788.103 716.076
How this actually should read:
1078.071 1005.481 932.978 861.159 788.103 716.076
I have tried various methods, textscan, fgetl, fscanf etc, however the problem I have is, as seen above, sometimes because of the fixed width of the original files there is no whitespace between some of the numbers. I cant seem to find a way to read them directly and I cant change the original format.
The best I have come up with so far is to use fgetl which reads the whole line in, then I reshape the result into an 8,6 array
A=fgetl
A=reshape(A,8,6)
which generates the following result
11
009877
703681
852186
......
049110
787507
118936
So now I have the above and thought I might be able to concatenate the rows of that array together to form each number, although that is seeming difficult as well having tried strcat, vertcat etc.
All of that seems a long way round so was hoping for some better suggestions.
Thanks.
If you can rely on three decimal numbers you can use a simple regular expression to generate the missing blanks:
s = '1078.0711005.481 932.978 861.159 788.103 716.076';
s = regexprep(s, '(\.\d\d\d)', '$1 ');
c = textscan(s, '%f');
Now c{1} contains your numbers. This will also work if s is in fact the whole file instead of one line.
You haven't mentioned which class of output you needed, but I guess you need to read doubles from the file to do some calculations. I assume you are able to read your file since you have results of reshape() function already. However, using reshape() function will not be efficient for your case since your variables are not fixed sized (i.e 1078.071 and 932.978).
If I did't misunderstand your problem:
Your data is squashed in some parts (i.e 1078.0711005.481 instead
of 1078.071 1005.481).
Fractional part of variables have 3 digits.
First of all we need to get rid of spaces from the string array:
A = A(~ismember(A,' '));
Then using the information that fractional parts are 3 digits:
iter = length(strfind(A, '.'));
for k=1:iter
[stat,ind] = ismember('.', A);
B(k)=str2double(A(1:ind+3));
A = A(ind+4:end);
end
B will be an array of doubles as a result.

How do I read the HITRAN2012 database into MATLAB?

The HITRAN database is a listing of molecular rotational-vibrational transitions. It is given in a text file where each line is 160 characters, with fixed width fields defining molecule, isotope, etc. The format is well documented, and there is even a program on the MathWorks File Exchange that will read in the database and simulate a portion of the spectrum. However, I need to read in a specific portion of the spectrum and then use it to do some fitting to a measured spectrum, so I need something much more custom.
As given in the comment section of that function, as well as elsewhere, the following line should read each line in properly:
database = which('HITRAN2012.par');
fid = fopen(database);
hitran = textscan(fid,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6c%12c%1c%7f%7f','delimiter','','whitespace','');
fclose(fid);
The first two fields denote the molecule code, which runs from 1-47, and the isotope code which runs from 1-9.
Unfortunately, molecules 1-9 do not have a leading zero, and no matter what I do, it seems to silently confuse MATLAB. If I load in the entire database and then type
unique(hitran{1})
I do not get the numbers 1-47, but I get 10-92 with a few numbers missing. As far as I can figure, when MATLAB encounters a leading space, it shifts the line over and then pads the end, so that ' 12' becomes '12', but I'm not exactly sure. I have also tried
hitran = textscan(fid,'%160c','delimiter','\n','whitespace','');
and then tried to parse the resulting strings, but that also sometimes gets confused by the first space.
For instance, the first water line looks like
exampleHitranLine = ' 14 0.007002 1.165E-32 2.071E-14.05870.305 818.00670.590.000000 0 0 0 0 0 0 7 5 2 7 5 3 005540 02227 5 2 0 90.0 90.0';
The first bit of code comes across this line and returns '14' instead of ' 1' and '4'. If I just read in a subset that only contains molecule 1 (as in this example), then the second method of reading works fine. If I try to read in the entire database, however, the lines with molecule 1-9 are shifted the the left, which messes up all the other fields.
I should note that I've tried reading the numerical fields both as floats and as integers, and neither gives satisfactory results. The entire database in text form is nearly 700 MB, and so I need something that works as efficiently as possible.
What am I doing wrong?
I have a new file on the FileExchange that will read in HITRAN 2004+ format data. Please try it out and let me know if there are any issues with it.
I don't have an answer as to why this is happening, but I do have a solution. If anyone has an answer as to why, I'd be happy to accept it.
It is the leading space that is screwing things up. MATLAB is being a little too clever, and when textscan encounters a leading space, it decides that it's extra and discards it and moves on to the next two characters. To get it to properly read in the file, I had to go line by line and test whether the first character is a space and then replace it with a leading zero, like this:
database = which('HITRAN2012_First100Lines.par');
fileParams = dir(database);
K = fileParams.bytes/162;
hitran = cell(K,19);
fid = fopen(database);
for k = 1:K
hitranTemp = fgetl(fid);
if abs(hitranTemp(1)) == 32;
hitranTemp(1) = '0';
end
hitran(k,:) = deal(textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6c%12c%1c%7f%7f','delimiter','','whitespace',''));
end
fclose(fid);
I'm working in MATLAB 2013a. Should I consider this to be a bug and report it? Is there some reason that the leading space should be gobbled up like this?
Update:
My workaround above was slow, but worked. Then I had to process the HITEMP database, which is several times larger, so I finally did submit a support ticket to MathWorks. The workaround suggested by MathWorks technical support is to read everything in as text and then convert. This saves a lot of disk reads and works.
fileParams = dir(database);
fid = fopen(database);
hitran = textscan(fid,'%2c%1c%12c%10c%10c%5c%5c%10c%4c%8c%15c%15c%15c%15c%6c%12c%1c%7c%7c','delimiter','','whitespace','');
fclose(fid);
moleculeNumber = uint8(str2num(hitran{1}));
isotopologueNumber = uint8(str2num(hitran{2});
vacuumWavenumber = str2num(hitran{3});
...
etc.
Depending on the application, for larger databases one would probably want to do this in chunks rather than all at once.
He also said he would forward the behavior to the development team for consideration in a future update.

Is there a compact view for matrices in matlab?

I want to have a look at a large matrix in MATLAB such that all columns are printed in one single line rather than spread out over several lines.
Is such thing possible? That would be great to know.
Try disp(matrixName(:)). The matrixName(:) command turns your matrix into a long vector in column-major order, so it basically just shows you the first column, followed by the second, the third, etc.
If that does not do the trick, you could look into the doprint command.
EDIT: You could also save the matrix to a text file and view the file. You do this like so:
fileID = fopen('C:/path/to/file/myMatrix.txt');
fprintf(fileID, formatString, myMat);
fclose(fileID);
fopen documentation
fprintf documentation
Additional information can be found here
The formatString variable in the above tells fprintf how the data should be displayed. If you have a really big matrix with tons of columns, where all of the values are floats, the easiest way to create this string is to use something like:
formatString = strcat(repmat('%f ', 1, size(myMat, 2)), '\n');
This will create a long string specifying that each element in your matrix is a float, and where it goes, and then cap it off with a line feed so that the next row of your matrix starts on the next line.
Suppress your original matrix with a semicolon and then use the "disp" command to show your matrix however you want.
for i = 1 : length(matrix(1,:))
disp(matrix(:,i))
end
Some "obvious" answers:
You can choose a smaller font - then more values will fit in a line
You can play with the format command to have less digits displayed
(my favourite) Use the variable viewer - via "open selection" or Ctrl-D when the name of a variable is highlighted. This will show your matrix in an excel-like table.