Excess of data when reading file in MatLab - matlab

faced with the problem, txt file have 1000 double values, but when I have loaded this file, I get 2000, other values are zero values.
Help me please, I just started learning Matlab.
clear all
file = 'C:\data.txt';
x = load(file);
x = sort(x(:));
x
x = 1.0e+010 *
0.0000
0.0000
0.0000
0.0000
...

Use the Matlab function textscan. This function goes line by line and parses data into C as a cell array. You can also control the number of lines it reads in based on the number N.
C = textscan(fileID,formatSpec,N)
The Matlab website has many examples on how to call this function.

Related

MATLAB does not read csv file completely

I am reading a csv file into memory in my MATLAB program, and the last line of the file is not being read.
The end of the csv file looks like this:
30000,0.99534,1.4E-07,0.001945
40000,0.997967,4.74E-08,0.000656
50000,0.998953,2.02E-08,0.000279
75000,0.999713,4.19E-09,5.8E-05
100000,1,1.36E-09,1.9E-05
When I use readmatrix from the r2019a standard library, it works and reads every line. When I used csvread with only the filename as an argument, for some reason the last line of the file is not read.
When I use csvread, this is the result.
>> dat = csvread('../data/black_body.csv');
>> dat(end, :)
ans =
1.0e+04 *
7.5000 0.0001 0.0000 0.0000
And in the file black_body.csv, the final line is
100000,1,1.36E-09,1.9E-05
Why is matlab not reading the last line of the file?
edit: Here is the link to the csv file.
link
I have checked the CSV file and it has a problem on the fourth line.
There is a "." which makes a shift on the whole data after this line.
Original CSV:
800,1.6E-05,0..991126E-7,0.001372
Revised CSV:
800,1.6E-05,0.991126E-7,0.001372
After the CSV file correction, I was able to get the correct result using csvread.
dat(end, :)
ans =
1.0e+05 *
1.0000 0.0000 0.0000 0.0000

Trouble with the data read from UR5 robot's TCP/IP port

I wrote the MATLAB code below to read some stream of data from the TCP/IP 30003 port of UR5.The kind of result I am getting which is different from what is on my Robot (I am using URSim 3.5.3 Virtual environment) but it supposes to be same. Although it looks close to it since there are 6 data as expected.
I believe this is possible because someone used python to achieve it http://www.zacobria.com/universal-robots-knowledge-base-tech-support-forum-hints-tips/knowledge-base/client-interfaces-cartesian-matlab-data/, but I am instructed to use MATLAB.
MATLAB CODE
clear all
HOST = '192.168.56.101';
PORT_30003 = 30003;
while (true)
s = tcpclient(HOST, PORT_30003);
disp('connected and starting program');
disp('data received:');
data = read(s, 80, 'double');
disp(data(56:61));
pause(1);
end
RESULT OBTAINED:
>> ur5
connected and starting program
data received:
1.0e-15 *
-0.4406 0.0000 -0.0000 0.0000 0.0000 -0.0000
connected and starting program
data received:
1.0e-15 *
-0.4406 0.0000 -0.0000 0.0000 0.0000 -0.0000
Same continuously since the robot isn't moved but is so different from the values in my robot. The 1st line is for the Cartesian coordinates (X,Y,Z,RX,RY,RZ) but the values on my robot at the time of running the program were: -120.11mm, -431.76mm, 146.07mm, 0.0012, -3.1664, -0.0395 and the 2nd one is for Gripper state (X,Y,Z,RX,RY,RZ). Do anyone has idea whether it is a conversion problem? and how can I rectify it?
The Universal Robots Matlab interface (real time client / port 30003) sends a data of packet structure:
int LENGTH
double PAYLOAD_0
double PAYLOAD_1
double ...
double PAYLOAD_n
The length can vary by software version, so it is suggested to read the length first, and then read the remaining double values. If you do not read the complete length of the package, there might be some "remainders of the last package" in your input buffer, and then the next package you read will be nonsense.
Therefore I would recommend reading length, buffering remaining package, and then extracting the packages of interest.
length = read(s, 1, 'int64')
payload = read(s, length, 'double')
disp(payload(x:y))
You can find the official documentation of the UR TCP interfaces here.
Note that the port 30003 interface sometimes gets extended, with new variabls at the end. The complete length in doubles should be something like 139 doubles.

MATLAB CSV Import Warping Data

When I import my data (numerical matrix of NYSE stock data), the data isn't loaded properly:
the final part of my CSV data disp() displayed should be -
9.76, 10, 9.99, 9.94, 9.97,9.944,9.95,10,9.956,10.01
What I get when I call the disp(importDataResult) is -
0.0100 0.0099 0.0099 0.0100 etc..
Have you got any idea why when I import the data it is transformed completely? The below link contains my zipped CSV file so you can see the problem (I completely understand if you can't be bothered checking this out, but I'd be interested to know if the same problem applies to others' MATLAB / computers).
https://www.sendspace.com/file/slif0y
The code I'm using is:
function [ c ] = CreateCov_Test()
c = csvread('nyse_data_matrix_no_tags.csv');
disp(c);
end
Here is a screenshot of the issue:
https://s32.postimg.org/os74qfrlx/matlab_screen.png
Thank you very much!
Matlab is not transforming any data. The configuration of who Matlab is displaying variables is controlled with format, the default being format short.
An excerpt from the documentation:
format may be used to switch between different output display formats of all float variables as follows:
format SHORT Scaled fixed point format with 5 digits.
So what does Scaled fixed point format with 5 digits mean, well lets see
>> a = [0.1 10000 100]
>> disp(a)
1.0e+04 *
0.0000 1.0000 0.1000
Note the 1.0e+04 *, its a multiplier for all data in the matrix. When displaying a large matrix, this multiplier is often hidden (as in your case), which admittedly can be rather confusing.

I want to loop through files that have been made corresponding to a prompt

I have several text files: (participant1, participant2, participant3,....participant5)
I have made these files using a loop. My loop for that is something like this:
%The subinfo_vect is a prompt that allows users to input what number they are, so every time there is a new file.
%This appends the results x y z h within file
for i = 1
empty_mat = zeros(0);
filename=['participant', subinfo_vect, '.txt'];
dlmwrite(filename, [x,y,z,h], '-append');
end
This code creates files corresponding to our prompt (subinfo_vect). Now I was wondering how to loop through these files (6 in total) so that we can catch the result and find the mean of those. To clarify the results, each file (txt) looks like this (below) and I need to find the mean of column 2 and 3:
n =
1.0000 1.0000 1.2986 1.3973
1.0000 0 0.4159 0.5138
1.0000 1.0000 0.3955 0.4924
1.0000 0 0.3574 0.4539
1.0000 1.0000 0.3489 0.4458
1.0000 1.0000 0.4403 0.5372
How do I loop through 6 files that look like the above so that I can get the mean of all 6 in sequence? Any ideas?
What I have so far is a manual input of loading all the files. I am manually reading those files by adding:
dlmread('participant1.txt') <-- This however is manual, I want the computer to do it automatically without me giving the command, so something where I can just input a looping folder and it will read all the files one by one? Using a for loop?
Can you please help me with this
Assuming you have saved your .txt files in a folder called myFolder, then:
fileList=dir([myFolder '/*.txt']);
fileList={fileList.name}; %just extracting names for convenience.
for i=1:length(fileList)
contents=dlmread([myFolder '/' fileList{i}]); %do something
end
Set a variable looping_dir as a string containing the directory name where your files are saved in, and then loop over it. You can try something like this:
files = dir([looping_dir '*.txt']); % get files ending .txt in given directory
for f = 1:numel(files)
data = dlmread([looping_dir files(f).name]);
% do calculations...
end

MATLAB reading large text log files with errors

I'm trying to analyze a large text log file (11 GB). All of the data are numerical values, and a snippit of the data are listed below.
-0.0623 0.0524 -0.0658 -0.0015 0.0136 -0.0063 0.0259 -0.003
-0.0028 0.0403 0.0009 -0.0016 -0.0013 -0.0308 0.0511 0.0187
0.0894 0.0368 0*0243 0.0279 0.0314 -0.0212 0.0582 -0.0403 //<====row 3, weird ASCII char * is present
-0.0548 0.0132 0.0299 0.0215 0.0236 0.0215 0.003 -0.0641
-0.0615 0.0421 0.0009 0.0457 0.0018 -0.0259 0.041 0.031
-0.0793 0.01 //<====row 6, the data is misaligned here
0.0278 0.0053 -0.0261 0.0016 0.0233 0.0719
0.0143 0.0163 -0.0101 -0.0114 -0.0338 -0.0415 0.0143 0.129
-0.0748 -0.0432 0.0044 0.0064 -0.0508 0.0042 0.0237 0.0295
0.040 -0.0232 -0.0299 -0.0066 -0.0539 -0.0485 -0.0106 0.0225
Every set of data consists of 2048 rows, and each row has 8 columns.
Here comes the problem: when the data is transformed from binary files to text files using the logging software, a lot of the data are distorted. Take the data above as an example, row 3 column 3 there is a " * " present in the data. And in row 6, one row of data is broken into two rows, one row has 2 data and the other row has 6 data.
I am currently struggling reading this large text files using MATLAB. Since the file itself is so large, I can only use textscan to read the data.
for example:
C = textscan(fd,'%f%f%f%f%f%f%f%f',1,'Delimiter','\t ');
However, I cannot use '%f' as format since there contains several weird ASCII characters such as " * " or " ! " in the data. These distorted data cannot be treated as floating point numbers.
So I choose to use:
C = textscan(fd,'%s%s%s%s%s%s%s%s',1,'Delimiter','\t ');
and then I transfer those strings into doubles to be processed. However, this encounters the problem of broken lines. When it reaches row 6, it gives:
[-0.0793],[0.01],[],[],[],[],[],[];
[0.0278],[0.0053],[-0.0261],[0.0016],[0.0233],[0.0719],[0.0143],[0.0163];
while it is supposed to look like:
-0.0793 0.01 0.0278 0.0053 -0.0261 0.0016 0.0233 0.0719 ===> one set
0.0143 0.0163 -0.0101 -0.0114 -0.0338 -0.0415 0.0143 0.129 ===> another set
Then the data will be offset by one row and the columns are messed up.
Then I try to do:
C = textscan(fd,'%s',1,'Delimiter','\t ');
to read one element at one time. If this element is NaN, it will textscan the next one until it sees something other than NaN. Once it obtains 2048 non-empty elements, it will store those 2048 data into a matrix to be processed. After being processed, this matrix is cleared.
This method works well for the first 20% of the whole file....BUT,
since the file itself is 11GB which is very large, after reading about 20% of the file, MATLAB shows:
Error using ==> textscan
Out of memory. Type HELP MEMORY for your options.
(some people suggest using %f while doing textscan, but it won't work because there are some ASCII chars which are causing problem)
Any suggestions to deal with this file?
EDIT:
I have tried:
C = textscan(fd,'%s%s%s%s%s%s%s%s',2048,'Delimiter','\t ');
Although the result is incorrect due to the misalignment of data (like row 6), this code indeed does not cause the "Out of memory" problem. Out of memory problem only occurs when I try to use
C= textscan(fd,'%s',1,'Delimiter','\t ').
to read the data one entry by one entry. Anyone has any idea why this memory problem happens?
This might seem silly, but are you preallocating an array for this data? If the only issue (as it seems to be) with your last function is memory, perhaps
C = zeros(2048,8);
will alleviate your problem. Try inserting that line before you call textscan. I know that MATLAB often exhorts programmers to preallocate for speed; this is just a shot in the dark, but preallocating memory may fix your issue.
Edit: also see this MATLAB Central discussion of a similar issue. It may be that you will have to run the file in chunks, and then concatenate the arrays when each chunk is finished.
Try something like the code below. It preallocates space and reads numRow* numColumns from the textfile at a time. If you can initialize the bigData matrix then it shouldn't run out of memory ... I think.
Note: That I used 9 for #rows since your sample data had 9 complete rows you will want to use 2024 I presume. This also might need some end of file checks etc. and some error handling. Also any numbers w/ odd ascii text in the will turn into NaN.
Note 2: This still might not work or be very very slow. I had a similar problem reading large text files (10-20GB) that were slightly more complicated. I had to abandon reading them in Matlab. Instead I used Perl for an initial pass which output to binary. Then used Matlab to read the binary back into data. The 2 step approach ended up saving lots and lots of runtime. Link in case you are interested
function bigData = readData(fileName)
fid = fopen(fileName,'r');
numBlocks = 1; %Somehow determine # of blocks??? not sure if you know of a way to determine this
r = 9; %Replace 9 with your size 2048
c = 8;
bigData = zeros(r*numBlocks,8);
for k = 1:numBlocks
[dataBlock, rFlag] = readDataBlock(fid,r,c);
if rFlag
%Or some kind of error.
break
end
bigData((k-1)*r+1:k*r,:) = dataBlock;
end
fclose(fid);
function [dataBlock, rFlag]= readDataBlock(fid,r,c)
C= textscan(fid,'%s',r*c,'Delimiter','\t '); %replace 9*8 by the size of the data block.
dataBlock = [];
if numel(C{1}) == r*c
dataBlock = reshape(str2double(C{1}),9,8);
rFlag = false;
else
rFlag = true;
% ?? Throw an error or whatever is appropriate
end
While I don't really know how to solve your problems with the broken data, I can give some advice how to process big text data. Read it in batches of multiple lines and write the output directly to the hard drive. In your case the second might be unnecessary, if everything is working you could try to replace data with a variable.
The code was originally written for a different purpose, I deleted the parser for my problem and replaced it with parsed_data=buffer; %TODO;
outputfile='out1.mat';
inputfile='logfile1';
batchsize=1000; %process 1000 lines at once
data=matfile(outputfile,'writable',true); %Simply delete this line if you dant "data" to be a variable in your memory
h=dir(inputfile);
bytes_to_read=h.bytes;
data.out={};
input=fopen(inputfile);
buffer={};
while ftell(input)<bytes_to_read
buffer=[buffer,textscan(input,'%s',batchsize-numel(buffer))];
parsed_data=buffer; %TODO;
data.out(end+1,1)={parsed_data};
buffer={}; %In the default case your empty your buffer her.
%If a incomplete line read here partially, leave it in the buffer.
end
fclose(input);