<MATLAB> How can I save huge data to .txt file? - matlab

I made a 865850 by 4464 matrix.
Then I need to save it to a .txt file.
For that, I use fprintf, but I met a hard obstacle....
There are 4464 columns. How can I designate their formatspec?
They are all integers.
Now I know just one way...
fprintf(fid, '%10d %10d.....%10d', Zeros); (4464times..)
Is the only way to save them?
Thank you~!!
clear all; close all;
loop = 1;
Zeros = zeros(15000, 4464);
fileID = fopen('data2.txt','r');
while loop < 4200
Data = fscanf(fileID, '%d %d %d:%d %d\n', [5, 100000]);
Data = Data';
DataA = Data(:,1);
DataB = Data(:,2);
DataC = Data(:,3);
DataD = Data(:,4);
DataE = Data(:,5);
for m=1:100000
r = DataA(m);
c = ((DataB(m)-1)*24*6 + DataC(m)*6 + DataD(m))+1;
Zeros(r,c) = DataE(m);
end
for n=1:4464
Zeros1{n}=Zeros(:, n);
fileID2 = fopen('result.txt','a');
fprintf(fileID2, '%10d %10d\n ', Zeros1{1}, Zeros1{2});
end
fclose(fileID2);
loop = loop + 1;
end

don't use printf with the whole row. Use the CSV export, or iterate over each element of each row and print it isolatedly.
I frequently like to add that for data of this size, textual storage is a bad idea. No one will ever open this in a text editor and think "Oh, this is practical". Everyone will have a bad time carrying around hundreds of megabytes of unnecessary file size. simply use the savemat methods to store the data if you plan to open it in matlab, or use a binary format, for example by just doing fwrite on the data to a file with a sensible binary representation of your numbers.

You could also just use the built-in MATLAB ASCII save format (instead of printf):
>> foo = magic( 4 )
foo =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
>> save( 'foo.txt', '-ascii', 'foo' )

Related

Reading data from a Text File into Matlab array

I am having difficulty in reading data from a .txt file using Matlab.
I have to create a 200x128 dimension array in Matlab, using the data from the .txt file. This is a repetitive task, and needs automation.
Each row of the .txt file is a complex number of form a+ib, which is of form a[space]b. A sample of my text file :
Link to text file : Click Here
(0)
1.2 2.32222
2.12 3.113
.
.
.
3.2 2.22
(1)
4.4 3.4444
2.33 2.11
2.3 33.3
.
.
.
(2)
.
.
(3)
.
.
(199)
.
.
I have numbers of rows (X), inside the .txt file surrounded by brackets. My final matrix should be of size 200x128. After each (X), there are exactly 128 complex numbers.
Here is what I would do. First thing, delete the "(0)" types of lines from your text file (could even use a simple shells script for that). This I put into the file called post2.txt.
# First, load the text file into Matlab:
A = load('post2.txt');
# Create the imaginary numbers based on the two columns of data:
vals = A(:,1) + i*A(:,2);
# Then reshape the column of complex numbers into a matrix
mat = reshape(vals, [200,128]);
The mat will be a matrix of 200x128 complex data. Obviously at this point you can put a loop around this to do this multiple times.
Hope that helps.
You can read the data in using the following function:
function data = readData(aFilename, m,n)
% if no parameters were passed, use these as defaults:
if ~exist('aFilename', 'var')
m = 128;
n = 200;
aFilename = 'post.txt';
end
% init some stuff:
data= nan(n, m);
formatStr = [repmat('%f', 1, 2*m)];
% Read in the Data:
fid = fopen(aFilename);
for ind = 1:n
lineID = fgetl(fid);
dataLine = fscanf(fid, formatStr);
dataLineComplex = dataLine(1:2:end) + dataLine(2:2:end)*1i;
data(ind, :) = dataLineComplex;
end
fclose(fid);
(edit) This function can be improved by including the (1) parts in the format string and throwing them out:
function data = readData(aFilename, m,n)
% if no parameters were passed, use these as defaults:
if ~exist('aFilename', 'var')
m = 128;
n = 200;
aFilename = 'post.txt';
end
% init format stuff:
formatStr = ['(%*d)\n' repmat('%f%f\n', 1, m)];
% Read in the Data:
fid = fopen(aFilename);
data = fscanf(fid, formatStr);
data = data(1:2:end) + data(2:2:end)*1i;
data = reshape(data, n,m);
fclose(fid);

Can I read a gigantic text file with Parallel Computing?

I have multiple text files that are about 2GB in size (approximately 70 million lines). I also have a quad-core machine and access to the Parallel Computing toolbox.
Typically you might open a file and read lines as so:
f = fopen('file.txt');
l = fgets(f);
while ~ isempty(l)
% do something with l
l = fgets(f);
end
I wanted to distribute the "do something with l" across my 4 cores, but that of course requires the use of a parfor loop. That would require that I "slurp" the 2GB file (to borrow a Perl term) into MATLAB a priori, instead of processing on the fly. I don't actually need l, just the result of the processing.
Is there a way to read lines out of a text file with parallel computing?
EDIT: It's worth mentioning that I can find the exact number of lines ahead of time (!wc -l mygiantfile.txt).
EDIT2: The structure of the file is as follows:
15 1180 62444 e0e0 049c f3ec 104
So 3 decimal numbers, 3 hex numbers, and 1 decimal number. Repeat this for 70 million lines.
As requested, I'm showing an example of memory-mapped files using memmapfile class.
Since you didn't provide the exact format of the data file, I will create my own. The data I am creating is a table of N rows, each consisting of 4 columns:
first is a double scalar value
second is a single value
third is a fixed-length string representing a uint32 in HEX notation (e.g: D091BB44)
fourth column is a uint8 value
The code to generate the random data, and write it to binary file structured as described above:
% random data
N = 10;
data = [...
num2cell(rand(N,1)), ...
num2cell(rand(N,1,'single')), ...
cellstr(dec2hex(randi(intmax('uint32'), [N,1]),8)), ...
num2cell(randi([0 255], [N,1], 'uint8')) ...
];
% write to binary file
fid = fopen('file.bin', 'wb');
for i=1:N
fwrite(fid, data{i,1}, 'double');
fwrite(fid, data{i,2}, 'single');
fwrite(fid, data{i,3}, 'char');
fwrite(fid, data{i,4}, 'uint8');
end
fclose(fid);
Here is the resulting file viewed in a HEX editor:
we can confirm the first record (note that my system uses Little-endian byte ordering):
>> num2hex(data{1,1})
ans =
3fd4d780d56f2ca6
>> num2hex(data{1,2})
ans =
3ddd473e
>> arrayfun(#dec2hex, double(data{1,3}), 'UniformOutput',false)
ans =
'46' '35' '36' '32' '37' '35' '32' '46'
>> dec2hex(data{1,4})
ans =
C0
Next we open the file using memory-mapping:
m = memmapfile('file.bin', 'Offset',0, 'Repeat',Inf, 'Writable',false, ...
'Format',{
'double', [1 1], 'd';
'single', [1 1], 's';
'uint8' , [1 8], 'h'; % since it doesnt directly support char
'uint8' , [1 1], 'i'});
Now we can access the records as an ordinary structure array:
>> rec = m.Data; % 10x1 struct array
>> rec(1) % same as: data(1,:)
ans =
d: 0.3257
s: 0.1080
h: [70 53 54 50 55 53 50 70]
i: 192
>> rec(4).d % same as: data{4,1}
ans =
0.5799
>> char(rec(10).h) % same as: data{10,3}
ans =
2B2F493F
The benefit is that for large data files, is that you can restrict the mapping "viewing window" to a small subset of the records, and move this view along the file:
% read the records two at-a-time
numRec = 10; % total number of records
lenRec = 8*1 + 4*1 + 1*8 + 1*1; % length of each record in bytes
numRecPerView = 2; % how many records in a viewing window
m.Repeat = numRecPerView;
for i=1:(numRec/numRecPerView)
% move the window along the file
m.Offset = (i-1) * numRecPerView*lenRec;
% read the two records in this window:
%for j=1:numRecPerView, m.Data(j), end
m.Data(1)
m.Data(2)
end
Some matlab's built-in functions support multithreading - the list is here. There is no need for the Parallel Computing toolbox.
If the "do something with l" can benefit from the toolbox, just implement the function before reading another line.
You may alternatively want to read the whole file using
fid = fopen('textfile.txt');
C = textscan(fid,'%s','delimiter','\n');
fclose(fid);
and then compute the cells in C in parallel.
If the reading time is a key issue, you may also want to access parts of the data file within a parfor loop. Here is an example from Edric M Ellis.
%Some data
x = rand(1000, 10);
fh = fopen( 'tmp.bin', 'wb' );
fwrite( fh, x, 'double' );
fclose( fh );
% Read the data
y = zeros(1000, 10);
parfor ii = 1:10
fh = fopen( 'tmp.bin', 'rb' );
% Get to the correct spot in the file:
offset_bytes = (ii-1) * 1000 * 8; % 8 bytes/double
fseek( fh, offset_bytes, 'bof' );
% read a column
y(:,ii) = fread( fh, 1000, 'double' );
fclose( fh );
end
% Check
assert( isequal( x, y ) );

How would I open multiple files, and combine one line of data from each document into a single number? Matlab

I have several files all named add_.txt with numbers from 1 -5 and I want to take the first line of information (a 1 by 5 matrix with all ones) from each file add them together, take this information and create a new text file with the result. Obviously the answer would simply be [5 5 5 5 5] but I would like to know how to program to get there.
Ive been able to teach myself how to "add" two data strings from the same document and create a text file with the answer with this code
fid=fopen('add.txt');
A = fgetl(fid);
AA = str2num(A)
B = fgets(fid);
BB = str2num(B)
C = AA + BB;
fclose(fid);
dlmwrite('results.txt', C)
but i do not know how to make the jump to automated calculations on a multi-file level, any help would be great.
Something like this should do the trick:
% List of file names
% (can be auto-generated like so: filename = ['add_' num2str(ii) '.txt']
% with ii your iteration variable)
filenames = {'add_1.txt', 'add_2.txt', 'add_3.txt', 'add_4.txt', 'add_5.txt'};
% If you know the size of the first line:
A = zeros(1,5);
% Loop through all filenames
for filename = filenames
fid = fopen(filename{1});
A = A + str2num( fgetl(fid) );%#ok
fclose(fid);
end
% Write results to file
dlmwrite('results.txt', A);
If you don't know beforehand how many elements there are in A, you'll have to modify the loop a little bit:
A = 0;
for filename = filenames
fid = fopen(filenames{1});
A = A + str2num( fgetl(fid) );%#ok
fclose(fid);
end

Problem with loop MATLAB

no time scores
1 10 123
2 11 22
3 12 22
4 50 55
5 60 22
6 70 66
. . .
. . .
n n n
Above a the content of my txt file (thousand of lines).
1st column - number of samples
2nd column - time (from beginning to end ->accumulated)
3rd column - scores
I wanted to create a new file which will be the total of every three sample of the scores divided by the time difference of the same sample.
e.g. (123+22+22)/ (12-10) = 167/2 = 83.5
(55+22+66)/(70-50) = 143/20 = 7.15
new txt file
83.5
7.15
.
.
.
n
so far I have this code:
fid=fopen('data.txt')
data = textscan(fid,'%*d %d %d')
time = (data{1})
score= (data{2})
for sample=1:length(score)
..... // I'm stucked here ..
end
....
If you are feeling adventurous, here's a vectorized one-line solution using ACCUMARRAY (assuming you already read the file in a matrix variable data like the others have shown):
NUM = 3;
result = accumarray(reshape(repmat(1:size(data,1)/NUM,NUM,1),[],1),data(:,3)) ...
./ (data(NUM:NUM:end,2)-data(1:NUM:end,2))
Note that here the number of samples NUM=3 is a parameter and can be substituted by any other value.
Also, reading your comment above, if the number of samples is not a multiple of this number (3), then simply discard the remaining samples by doing this beforehand:
data = data(1:fix(size(data,1)/NUM)*NUM,:);
Im sorry, here's a much simpler one :P
result = sum(reshape(data(:,3), NUM, []))' ./ (data(NUM:NUM:end,2)-data(1:NUM:end,2));
%# Easier to load with importdata
data = importdata('data.txt',' ',1);
%# Get the number of rows
n = size(data,1);
%# Column IDs
time = 2;score = 3;
%# The interval size (3 in your example)
interval = 3;
%# Pre-allocate space
new_data = zeros(numel(interval:interval:n),1);
%# For each new element in the new data
index = 1;
%# This will ignore elements past the closest (floor) multiple of 3 as requested
for i = interval:interval:n
%# First and last elements in a batch
a = i-interval+1;
b = i;
%# Compute the new data
new_data(index) = sum( data(a:b,score) )/(data(b,time)-data(a,time));
%# Increment
index = index+1;
end
For what it's worth, here is how you would go about to do that in Python. It is probably adaptable to Matlab.
import numpy
no, time, scores = numpy.loadtxt('data', skiprows=1).T
# here I assume that your n is a multiple of 3! otherwise you have to adjust
sums = scores[::3]+scores[1::3]+scores[2::3]
dt = time[2::3]-time[::3]
result = sums/dt
I suggest you use the importdata() function to get your data into your variable called data. Something like this:
data = importdata('data.txt',' ', 1)
replace ' ' by the delimiter your file uses, the 1 specifies that Matlab should ignore 1 header line. Then, to compute your results, try this statement:
(data(1:3:end,3)+data(2:3:end,3)+data(3:3:end,3))./(data(3:3:end,2)-data(1:3:end,2))
This worked on your sample data, should work on the real data you have. If you figure it out yourself you'll learn some useful Matlab.
Then use save() to write the results back to a file.
PS If you find yourself writing loops in Matlab you are probably doing something wrong.

Problem (bug?) loading hexadecimal data into MATLAB

I'm trying to load the following ascii file into MATLAB using load()
% some comment
1 0xc661
2 0xd661
3 0xe661
(This is actually a simplified file. The actual file I'm trying to load contains an undefined number of columns and an undefined number of comment lines at the beginning, which is why the load function was attractive)
For some strange reason, I obtain the following:
K>> data = load('testMixed.txt')
data =
1 50785
2 58977
3 58977
I've observed that the problem occurs anytime there's a "d" in the hexadecimal number.
Direct hex2dec conversion works properly:
K>> hex2dec('d661')
ans =
54881
importdata seems to have the same conversion issue, and so does the ImportWizard:
K>> importdata('testMixed.txt')
ans =
1 50785
2 58977
3 58977
Is that a bug, am I using the load function in some prohibited way, or is there something obvious I'm overlooking?
Are there workarounds around the problem, save from reimplementing the file parsing on my own?
Edited my input file to better reflect my actual file format. I had a bit oversimplified in my original question.
"GOLF" ANSWER:
This starts with the answer from mtrw and shortens it further:
fid = fopen('testMixed.txt','rt');
data = textscan(fid,'%s','Delimiter','\n','MultipleDelimsAsOne','1',...
'CommentStyle','%');
fclose(fid);
data = strcat(data{1},{' '});
data = sscanf([data{:}],'%i',[sum(isspace(data{1})) inf]).';
PREVIOUS ANSWER:
My first thought was to use TEXTSCAN, since it has an option that allows you to ignore certain lines as comments when they start with a given character (like %). However, TEXTSCAN doesn't appear to handle numbers in hexadecimal format well. Here's another option:
fid = fopen('testMixed.txt','r'); % Open file
% First, read all the comment lines (lines that start with '%'):
comments = {};
position = 0;
nextLine = fgetl(fid); % Read the first line
while strcmp(nextLine(1),'%')
comments = [comments; {nextLine}]; % Collect the comments
position = ftell(fid); % Get the file pointer position
nextLine = fgetl(fid); % Read the next line
end
fseek(fid,position,-1); % Rewind to beginning of last line read
% Read numerical data:
nCol = sum(isspace(nextLine))+1; % Get the number of columns
data = fscanf(fid,'%i',[nCol inf]).'; % Note '%i' works for all integer formats
fclose(fid); % Close file
This will work for an arbitrary number of comments at the beginning of the file. The computation to get the number of columns was inspired by Jacob's answer.
New:
This is the best I could come up with. It should work for any number of comment lines and columns. You'll have to do the rest yourself if there are strings, etc.
% Define the characters representing the start of the commented line
% and the delimiter
COMMENT_START = '%%';
DELIMITER = ' ';
% Open the file
fid = fopen('testMixed.txt');
% Read each line till we reach the data
l = COMMENT_START;
while(l(1)==COMMENT_START)
l = fgetl(fid);
end
% Compute the number of columns
cols = sum(l==DELIMITER)+1;
% Split the first line
split_l = regexp(l,' ','split');
% Read all the data
A = textscan(fid,'%s');
% Compute the number of rows
rows = numel(A{:})/cols;
% Close the file
fclose(fid);
% Assemble all the data into a matrix of cell strings
DATA = [split_l ; reshape(A{:},[cols rows])']; %' adding this to make it pretty in SO
% Recognize each column and process accordingly
% by analyzing each element in the first row
numeric_data = zeros(size(DATA));
for i=1:cols
str = DATA(1,i);
% If there is no '0x' present
if isempty(findstr(str{1},'0x')) == true
% This is a number
numeric_data(:,i) = str2num(char(DATA(:,i)));
else
% This is a hexadecimal number
col = char(DATA(:,i));
numeric_data(:,i) = hex2dec(col(:,3:end));
end
end
% Display the data
format short g;
disp(numeric_data)
This works for data like this:
% Comment 1
% Comment 2
1.2 0xc661 10 0xa661
2 0xd661 20 0xb661
3 0xe661 30 0xc661
Output:
1.2 50785 10 42593
2 54881 20 46689
3 58977 30 50785
OLD:
Yeah, I don't think LOAD is the way to go. You could try:
a = char(importdata('testHexa.txt'));
a = hex2dec(a(:,3:end));
This is based on both gnovice's and Jacob's answers, and is a "best of breed"
For files like:
% this is my comment
% this is my other comment
1 0xc661 123
2 0xd661 456
% surprise comment
3 0xe661 789
4 0xb661 1234567
(where the number of columns within the file MUST be the same, but not known ahead of time, and all comments denoted by a '%' character), the following code is fast and easy to read:
f = fopen('hexdata.txt', 'rt');
A = textscan(f, '%s', 'Delimiter', '\n', 'MultipleDelimsAsOne', '1', 'CollectOutput', '1', 'CommentStyle', '%');
fclose(f);
A = A{1};
data = sscanf(A{1}, '%i')';
data = repmat(data, length(A), 1);
for ctr = 2:length(A)
data(ctr,:) = sscanf(A{ctr}, '%i')';
end