I have a csv file which looks like the following when I open it in notebook...
val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12,val13,result
63,1,1,145,233,1,2,150,0,2.33,0,6,F
67,1,4,160,286,0,2,108,1,1.5,2,3,3,T
67,1,4,120,229,0,2,129,1,2.6,2,2,7,T
37,1,3,130,250,0,0,187,0,3.5,3,0,3,F
I would like to read this data into matlab and have found this question that really looks promising. My code for this implementation is follows...
fid = fopen(path);
out = textscan(fid,'%f%f%f%f%f%f%f%f%f%f%f%f%f','HeaderLines',1,'delimiter',',','CollectOutput',1);
fclose(fid);
However, this only seems to read in the first line into matlab. How can I get it to read in the whole file?
out{1}
ans =
63.0000 1.0000 1.0000 145.0000 233.0000 1.0000 2.0000 150.0000 0 2.3000 3.0000 0 6.0000
After banging my head on my desk for a while it hit me that the problem might be the fact that I haven't specified the result string in the format specifier. This is data that I don't need in my code and, therefore, I left it out. Adding and addition %s on the end allowed all the data to be read out.
Note for future: Specify all the fields in the format specifier and ignore them when coding.
The actual code should look like the following...
out = textscan(fid,'%f%f%f%f%f%f%f%f%f%f%f%f%f%s','HeaderLines',1,'delimiter',',','CollectOutput',1);
What often happens with textscan is that if you use the wrong specifier, or where there is something unexpected in the file is that textscan reads as much of the file as it can, then stops when it gets to something it can't parse properly. Unfortunately, it stops silently, without errors. Failure to read the full file, or output which appears to have stopped mid-way through a line, are common symptoms of this issue. If you don't need the strings, you can tell textscan to skip over them with *:
out = textscan(fid,''%f%f%f%f%f%f%f%f%f%f%f%f%f%*s','HeaderLines',1,'delimiter',',','CollectOutput',1);
It can be easier when constructing longer format specifiers to use repmat:
out = textscan(fid, [repmat('%f',[1,13]),'%*s'],'HeaderLines',1,'delimiter',',','CollectOutput',1);
Related
I am reading a csv file into memory in my MATLAB program, and the last line of the file is not being read.
The end of the csv file looks like this:
30000,0.99534,1.4E-07,0.001945
40000,0.997967,4.74E-08,0.000656
50000,0.998953,2.02E-08,0.000279
75000,0.999713,4.19E-09,5.8E-05
100000,1,1.36E-09,1.9E-05
When I use readmatrix from the r2019a standard library, it works and reads every line. When I used csvread with only the filename as an argument, for some reason the last line of the file is not read.
When I use csvread, this is the result.
>> dat = csvread('../data/black_body.csv');
>> dat(end, :)
ans =
1.0e+04 *
7.5000 0.0001 0.0000 0.0000
And in the file black_body.csv, the final line is
100000,1,1.36E-09,1.9E-05
Why is matlab not reading the last line of the file?
edit: Here is the link to the csv file.
link
I have checked the CSV file and it has a problem on the fourth line.
There is a "." which makes a shift on the whole data after this line.
Original CSV:
800,1.6E-05,0..991126E-7,0.001372
Revised CSV:
800,1.6E-05,0.991126E-7,0.001372
After the CSV file correction, I was able to get the correct result using csvread.
dat(end, :)
ans =
1.0e+05 *
1.0000 0.0000 0.0000 0.0000
When I import my data (numerical matrix of NYSE stock data), the data isn't loaded properly:
the final part of my CSV data disp() displayed should be -
9.76, 10, 9.99, 9.94, 9.97,9.944,9.95,10,9.956,10.01
What I get when I call the disp(importDataResult) is -
0.0100 0.0099 0.0099 0.0100 etc..
Have you got any idea why when I import the data it is transformed completely? The below link contains my zipped CSV file so you can see the problem (I completely understand if you can't be bothered checking this out, but I'd be interested to know if the same problem applies to others' MATLAB / computers).
https://www.sendspace.com/file/slif0y
The code I'm using is:
function [ c ] = CreateCov_Test()
c = csvread('nyse_data_matrix_no_tags.csv');
disp(c);
end
Here is a screenshot of the issue:
https://s32.postimg.org/os74qfrlx/matlab_screen.png
Thank you very much!
Matlab is not transforming any data. The configuration of who Matlab is displaying variables is controlled with format, the default being format short.
An excerpt from the documentation:
format may be used to switch between different output display formats of all float variables as follows:
format SHORT Scaled fixed point format with 5 digits.
So what does Scaled fixed point format with 5 digits mean, well lets see
>> a = [0.1 10000 100]
>> disp(a)
1.0e+04 *
0.0000 1.0000 0.1000
Note the 1.0e+04 *, its a multiplier for all data in the matrix. When displaying a large matrix, this multiplier is often hidden (as in your case), which admittedly can be rather confusing.
I have several text files: (participant1, participant2, participant3,....participant5)
I have made these files using a loop. My loop for that is something like this:
%The subinfo_vect is a prompt that allows users to input what number they are, so every time there is a new file.
%This appends the results x y z h within file
for i = 1
empty_mat = zeros(0);
filename=['participant', subinfo_vect, '.txt'];
dlmwrite(filename, [x,y,z,h], '-append');
end
This code creates files corresponding to our prompt (subinfo_vect). Now I was wondering how to loop through these files (6 in total) so that we can catch the result and find the mean of those. To clarify the results, each file (txt) looks like this (below) and I need to find the mean of column 2 and 3:
n =
1.0000 1.0000 1.2986 1.3973
1.0000 0 0.4159 0.5138
1.0000 1.0000 0.3955 0.4924
1.0000 0 0.3574 0.4539
1.0000 1.0000 0.3489 0.4458
1.0000 1.0000 0.4403 0.5372
How do I loop through 6 files that look like the above so that I can get the mean of all 6 in sequence? Any ideas?
What I have so far is a manual input of loading all the files. I am manually reading those files by adding:
dlmread('participant1.txt') <-- This however is manual, I want the computer to do it automatically without me giving the command, so something where I can just input a looping folder and it will read all the files one by one? Using a for loop?
Can you please help me with this
Assuming you have saved your .txt files in a folder called myFolder, then:
fileList=dir([myFolder '/*.txt']);
fileList={fileList.name}; %just extracting names for convenience.
for i=1:length(fileList)
contents=dlmread([myFolder '/' fileList{i}]); %do something
end
Set a variable looping_dir as a string containing the directory name where your files are saved in, and then loop over it. You can try something like this:
files = dir([looping_dir '*.txt']); % get files ending .txt in given directory
for f = 1:numel(files)
data = dlmread([looping_dir files(f).name]);
% do calculations...
end
I am currently writing a matlab program and the initial stage involves calling up .csv files from within folders. For reasons unkown, matlab will not read the files (checked using csvreader, dataimport and fopen). Note-it is definitely a csv file!
However, I opened one of the files, pressed 'save as', gave it the same name and fileformat. The only noticeable thing that happened is that the filesize reduced significantly and then matlab could magically open it but I have no idea why.
Can anyone shed any light on why this is happening? I would just open and resave the files except the data is related to a large number of samples which would make the manual process very long. If it is of relevance, the data is outputted from an Instron.
Many thanks :)
EDIT
So this is a sample of one of the files called '2mm.csv', opened using Notepad (first 10 lines of ~111,000):
Time,Extension,Load
(s),(mm),(N)
"0.00000","51.97554","0.09549"
"1.00000","52.13438","0.24999"
"2.00000","52.30102","0.13996"
"3.00000","52.46782","0.19513"
"4.00000","52.63449","0.15348"
"5.00000","52.80097","0.26828"
"6.00000","52.96780","0.32510"
"7.00000","53.13446","0.67119"
"8.00000","53.30105","4.56026"
"9.00000","53.46772","17.80811"
This is code I use to open it and the result:
>> importdata('2mm.csv',',',2)
ans =
'Time,Extension,Load'
'(s),(mm),(N)'
Note that it has only captured the first 2 lines and has not delimited the comma.
So I opened the file in MS Excel, saved it as 2mmv2.csv and put the same code in. I was was given a structure as expected:
>> importdata('2mmv2.csv',',',2);
>> ans.data(1:10,:)
ans =
0 51.9755 0.0955
1.0000 52.1344 0.2500
2.0000 52.3010 0.1400
3.0000 52.4678 0.1951
4.0000 52.6345 0.1535
5.0000 52.8010 0.2683
6.0000 52.9678 0.3251
7.0000 53.1345 0.6712
8.0000 53.3010 4.5603
9.0000 53.4677 17.8081
While I can now call up the file, I am none the wiser as to why this is the case.
Try this:
file=fopen('test.csv');
c=textscan(file,'%f%f%f','HeaderLines',2,'CollectOutput',true, ...
'delimiter', {',','"'},'MultipleDelimsAsOne',true);
fclose(file);
dat=c{1}
I am trying to obtain a 5X2 matrixfrom a text file.
For Example :
0_0 1_0
0_200 1_200
0_400 1_400
0_600 1_600
0_800 1_800
This is the code I am currently using:
[filename,pathname]= uigetfile({'*.txt'});
set(handles.temp1,'String',fullfile(pathname,filename));
chosenfile=get(handles.temp1,'String');
fid=fopen(chosenfile);
allcoordinates=textscan(fid,'%s,%s','whitespace','\n');
fclose(fid);
This code would produce a 5X1 matrix as shown below :
0_01_0
0_2001_200
0_4001_400
0_6001_600
0_8001_800
Sadly, the approach that works best with interpretation of files, is to be very conservative with relying on the capabilities of 'canned routines' like textscan, dlmread and similar.
This is not because these routines are implemented badly, it's because there is very little standardization in number formatting in text files, and basically, everybody just invents a new standard on the spot.
You just can't design a routine that always works correctly for all text files. I think The Mathworks did a very decent job with their dlmread and similar, however, you have just presented yet another standard in number formatting that is overly difficult to interpret in one go with textscan or dlmread or others. Therefore, be conservative: just read it without too much hassle, and do the conversion yourself.
For example:
%// Read data
fid = fopen('yourFile.txt', 'r');
C = textscan(fid, '%s %s');
fclose(fid);
%// Replace all underscores with '.', and convert to 'double'
C = str2double(strrep([C{:}],'_','.'))
Results:
C =
0 1.0000
0.2000 1.2000
0.4000 1.4000
0.6000 1.6000
0.8000 1.8000