I'm writing some larger (~500MB - 3GB) pieces binary data in MATLAB using the fwrite command.
I want the data to be written in a tabular format so I'm using the skip parameter. E.g. I have 2 vectors of uint8 values a = [ 1 2 3 4]; b = [5 6 7 8]. I want the binary file to look like this 1 5 2 6 3 7 4 8
So in my code I do something similar to this (my data is more complex)
fwrite(f,a,'1*uint8',1);
fseek(f,2)
fwrite(f,b,'1*uint8',1);
But the writes are painfully slow ( 2MB/s ).
I ran the following block of code, and when I set passed in a skip count of 1 the write is approximately 300x slower.
>> f = fopen('testfile.bin', 'w');
>> d = uint8(1:500e6);
>> tic; fwrite(f,d,'1*uint8',1); toc
Elapsed time is 58.759686 seconds.
>> tic; fwrite(f,d,'1*uint8',0); toc
Elapsed time is 0.200684 seconds.
>> 58.759686/0.200684
ans =
292.7971
I could understand 2x or 4x slowdown since the you have to traverse twice as many bytes with the skip parameter set to 1 but 300x makes me think I'm doing something wrong.
Has anyone encountered this before? Is there a way to speed up this write?
Thanks!
UPDATE
I wrote the following function to format arbitrary data sets. Write speed is vastly improved (~300MB/s) for large data sets.
%
% data: A cell array of matrices. Matrices can be composed of any
% non-complex numeric data. Each entry in data is considered
% to be an independent column in the data file. Rows are indexed
% by the last column in the numeric matrix hence the count of elements
% in the last dimension of the matrix must match.
%
% e.g.
% size(data{1}) == [1,5]
% size(data{2}) == [4,5]
% size(data{3}) == [3,2,5]
%
% The data variable has 3 columns and 5 rows. Column 1 is made of scalar values
% Column 2 is made of vectors of length 4. And column 3 is made of 3 x 2
% matrices
%
%
% returns buffer: a N x M matrix of bytes where N is the number of bytes
% of each row of data, and M is the number of rows of data.
function [buffer] = makeTabularDataBuffer(data)
dataTypes = {};
dataTypesLengthBytes = [];
rowElementCounts = []; %the number of elements in each "row"
rowCount = [];
%figure out properties of tabular data
for idx = 1:length(data)
cDat = data{idx};
dimSize = size(cDat);
%ensure each column has the same number of rows.
if isempty(rowCount)
rowCount = dimSize(end);
else
if dimSize(end) ~= rowCount
throw(MException('e:e', sprintf('data column %d does not have the required number of rows (%d)\n',idx,rowCount)));
end
end
dataTypes{idx} = class(data{idx});
dataTypesLengthBytes(idx) = length(typecast(eval([dataTypes{idx},'(1)']),'uint8'));
rowElementCounts(idx) = prod(dimSize(1:end-1));
end
rowLengthBytes = sum(rowElementCounts .* dataTypesLengthBytes);
buffer = zeros(rowLengthBytes, rowCount,'uint8'); %rows of the dataset map to column in the buffer matrix because fwrite writes columnwise
bufferRowStartIdxs = cumsum([1 dataTypesLengthBytes .* rowElementCounts]);
%load data 1 column at a time into the buffer
for idx = 1:length(data)
cDat = data{idx};
columnWidthBytes = dataTypesLengthBytes(idx)*rowElementCounts(idx);
cRowIdxs = bufferRowStartIdxs(idx):(bufferRowStartIdxs(idx+1)-1);
buffer(cRowIdxs,:) = reshape(typecast(cDat(:),'uint8'),columnWidthBytes,[]);
end
end
I've done some very limited testing of the function but it appears to be working as expected. The returned
buffer matrix can then be passed to fwrite without the skip argument and fwrite will write the buffer in column major order.
dat = {};
dat{1} = uint16([1 2 3 4]);
dat{2} = uint16([5 6 7 8]);
dat{3} = double([9 10 ; 11 12; 13 14; 15 16])';
buffer = makeTabularDataBuffer(dat)
buffer =
20×4 uint8 matrix
1 2 3 4
0 0 0 0
5 6 7 8
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
34 38 42 46
64 64 64 64
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
36 40 44 48
64 64 64 64
For best I/O performance, use sequential writes, and avoid skipping.
Reorder the data in the RAM before saving to file.
Reordering the data in the RAM is in order of 100 times faster than reordering data on disk.
I/O operations and storage devices are optimized for sequential writes of large data chunks (optimized both in hardware and in software).
In mechanical drives (HDD), writing data with skipping may take a very long time, because the mechanical head of the drive must move (usually the OS optimize it by using memory buffer, but in principle it takes a long time).
With SSD, there is no mechanical seeking, but sequential writes are still much faster. Read the following post Sequential vs Random I/O on SSDs? for some explanation.
Example for reordering data in RAM:
a = uint8([1 2 3 4]);
b = uint8([5 6 7 8]);
% Allocate memory space for reordered elements (use uint8 type to save RAM).
c = zeros(1, length(a) + length(b), 'uint8');
%Reorder a and b in the RAM.
c(1:2:end) = a;
c(2:2:end) = b;
% Write array c to file
fwrite(f, c, 'uint8');
fclose(f);
Time measurements in my machine:
Writing file to SSD:
Elapsed time is 56.363397 seconds.
Elapsed time is 0.280049 seconds.
Writing file to HDD:
Elapsed time is 56.063186 seconds.
Elapsed time is 0.522933 seconds.
Reordering d in RAM:
Elapsed time is 0.965358 seconds.
Why 300x times slower and not 4x ?
I am guessing the software implementation of writing data with skipping is not optimized for best performance.
According to the following post:
fseek() or fflush() require the library to commit buffered operations.
Daniel's guess (in the comment) is probably correct.
"The skip causes MATLAB to flush after each byte."
Skipping is probably implemented using fseek(), and fseek() forces flushing data to disk.
It could explain why writing with skipping is painfully slow.
Related
I have the following problem:
I need certain columns of a huge triangular 1-0 matrix.
E.g.
Matrix =
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
Index =
[1 4]
Result =
1 0
1 0
1 0
1 1
I figured the easiest way would be:
index = [10 20 300] %arbitrary index
buf = tril(ones(60000,60000))
matr = buf(:,index)
However, this does not work as the buffer matrix is too large and leads to MATLAB throwing an error. Thus, this approach is blocked.
How can I solve that problem efficiently? (E.g. it would be trivial by just looping over the index array and concatenating self-made rows, however this would be slow and I was hoping for a faster approach)
The index array will not be larger than 1/10th of the available columns.
If the matrix contains ones on the main diagonal and below, and zeros otherwise, you can do it as follows without actually generating the matrix:
N = 10; % number of rows of (implicit) matrix
Index = [1 4]; % column indices
Result = bsxfun(#ge, (1:N).', Index);
I'd like to insert columns to a matrix, but the insertion column positions within the matrix differ by row. How can I do this without using for-loop?
Following is a simplified example in MATLAB;
From A,X,P, I want to get APX without using for-loop.
>> A = zeros(4,5) % inclusive matrix
A =
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
>> X = [9,8;5,7;8,3;6,7] % data to insert
X =
9 8
5 7
8 3
6 7
>> P = [3;2;4;1] % insertion position within the matrix
P =
3
2
4
1
>> APX = [0,0,9,8,0;0,5,7,0,0;0,0,0,8,3;6,7,0,0,0] % what I want
APX =
0 0 9 8 0
0 5 7 0 0
0 0 0 8 3
6 7 0 0 0
It's simply determining the right column-major indices to access the matrix so you can populate it with your desired values. This first requires generating the right row and column values to access the right positions in APX so you can use X to populate those positions.
Using P, each element tells you which column you should start populating for each row of X. You will need to generate column indices in increasing order up to as many columns as there are in X. To generate the row indices, simply create a matrix that is the same size as X where each column spans from 0 up to as many rows as there are in X minus 1 (i.e. 0:size(X,2)-1). This matrix gives you the correct offsets so that you can take P and add it with this matrix. Once you do that you will have a column index matrix that tells you specifically where each element should go with regards to the columns of the output matrix per row of P. Finally, use sub2ind to generate the column-major indices using the rows and columns generated above to place X in APX.
In other words:
P = [3;2;4;1];
X = [9,8;5,7;8,3;6,7];
rowInd = repmat((1:size(X,1)).', 1, size(X,2)); %'
colInd = bsxfun(#plus, P, 0:size(X,2)-1);
APX = zeros(size(X,1), max(colInd(:)));
APX(sub2ind(size(APX), rowInd, colInd)) = X;
To generate the row locations, we use repmat to create a matrix that is the same size as X where each column spans from 1 up to as many rows as X. To generate the column locations, we use bsxfun to create a matrix where each column is the vector P but increasing by 1 per column. We then create APX to be of compatible size then use sub2ind to finally populate the matrix.
With your above test inputs, we get:
APX =
0 0 9 8 0
0 5 7 0 0
0 0 0 8 3
6 7 0 0 0
Minor Note
You really should actually try using loops before trying it vectorized. Though using loops was slow in previous versions of MATLAB, MATLAB R2015b has an improved JIT engine where loops are now competitive. You should time your code using loops and ensuring that it is justifiable before switching to vectorized implementations.
With a geographical grid the size 20x30, I have two (temperature) variables:
The data A with the size 20x30x100
and a threshold the size 20x30
I'd like to apply the threshold to the data, i.e. to cut out the values in A that are above threshold, with every grid point having its own threshold. Since that will give a different number of values for each grid point, I thought to pad the rest with zeros, so that the resulting variable, let's call it B, will also be of the size 20x30x100.
I was thinking to do something like this, but there's something wrong with the loop:
B = sort(A,3); %// sort third dimension in ascending order
threshold_3d = repmat(threshold,1,1,100); %// make threshold into same size as B
for i=1:20
for j=1:30
if B(i,j,:) > threshold_3d(i,j,:); %// if B is above threshold
B(i,j,:); %// keep values
else
B(i,j,:) = 0; %// otherwise set to zero
end
end
end
What is the correct way to do the loop?
What other options to do this are there?
Thanks for any help!
You can use bsxfun for a much more efficient solution which will internally take care of the replication being done with repmat, like so -
B = bsxfun(#times,B,bsxfun(#gt,B,threshold))
A more efficient solution might be to use logical indexing to set the False elements from the mask created by bsxfun(gt, i.e. True ones with the use of bsxfun(#le in B to zeros thereby avoiding the use of bsxfun(#times, which for huge multidimensional arrays could be a bit expensive, like so -
B(bsxfun(#le,B,threshold)) = 0
Note on efficiency : Being a relational operation, going with the vectorized operation with bsxfun would provide both memory and runtime efficiency. The memory efficiency part has been discussed here - BSXFUN on memory efficiency with relational operations and the performance numbers have been researched here - Comparing BSXFUN and REPMAT.
Sample run -
>> B
B(:,:,1) =
8 3 9
2 8 3
B(:,:,2) =
4 1 8
4 5 6
B(:,:,3) =
4 8 5
5 6 5
>> threshold
threshold =
1 3 9
1 9 1
>> B(bsxfun(#le,B,threshold)) = 0
B(:,:,1) =
8 0 0
2 0 3
B(:,:,2) =
4 0 0
4 0 6
B(:,:,3) =
4 8 0
5 0 5
I want to create a matrix with all the possible combinations of 10 numbers between 0 and 100, with intervals of 5, that its sum be equal to 100. I mean something like this:
(0 0 0 0 0 0 0 0 0 10 90; 10 10 10 10 10 10 10 10 20 0;...)
I use "allcomb.m" to create something like all the possible numbers that are between 0 and 100, with intervals of 5. However, this matrix is so big, and that implies that Matlab doesn't create it. I was thinking that, if I have that matrix, I could reduce it using a condition but this is impossible because I never get the matrix.
So, the question is how I can modify the allcomb's code with the condition in the same code or maybe, and better, another way to create the matrix that I purpose.
Be warned that even the result matrix is very large - to be precise, it has 10,015,005 rows and ten columns, and (if stored as a double) takes up about 1GB of space. On my machine it takes about ten minutes to compute. Nevertheless, it is computable, and the following function computes it.
function w = allconstrainedcombinations(n,k)
if n == 1
w = k;
else
t = nchoosek(n+k-1,k); # Total number of rows
w = zeros(t,n); # Pre-allocate
r = 1; # Current row
for v = 0:k
u = allconstrainedcombinations(n-1,k-v);
m = size(u,1);
w(r:r+m-1,1) = v;
w(r:r+m-1,2:end) = u;
r = r + m;
end
end
end
To get the result you want, you should call
>> x = allconstrainedcombinations(10,20) * 5;
Here's the result for a small example:
>> allconstrainedcombinations(3,2)
ans =
0 0 2
0 1 1
0 2 0
1 0 1
1 1 0
2 0 0
I have a 429x1 vector that represents a hydrological time series. I am looking to "lag" the time series by a time step and turn it into a matrix for input into the nftool for some ANN analysis. The width of the matrix is controlled by the amount of input neurons in my input layer, which is a value I read in from a spread sheet. This is what I would like to do using a shorter time series to illustrate the example:
inp_neur = 5; % amount of input neurons (read in from excel)
A = [9;6;8;3;2]; % hypothetical hydrological time series
% do pad zero process
RESULT:
newA =
9 0 0 0 0
6 9 0 0 0
8 6 9 0 0
3 8 6 9 0
2 3 8 6 9
I'm sure this isn't the hardest thing to do, but can it be done in a one liner?
Any help would be greatly appreciated.
Cheers,
JQ
Another example with inp_neur = 7;
A = [11;35;63;21;45;26;29;84;51]
newA =
11 0 0 0 0 0 0
35 11 0 0 0 0 0
63 35 11 0 0 0 0
21 63 35 11 0 0 0
45 21 63 35 11 0 0
26 45 21 63 35 11 0
29 26 45 21 63 35 11
84 29 26 45 21 63 35
51 84 29 26 45 21 63
I know that this question has already been marked accepted, however, I think it is worth pointing out that the current accepted answer will be very inefficient if T (the number of observations in the time series) is much larger than K (the number of lags, ie inp_neur in the OP's notation). This is because it creates a T by T matrix then truncates it to T by K.
I would propose two possible alternatives. The first uses a function from the Econometrics toolbox designed to do exactly what the OP wants: lagmatrix. The second is a loop based solution.
The lagmatrix solution returns NaN where the OP wants 0, so an additional line is necessary to convert them. The full solution is:
newA2 = lagmatrix(A, 0:K-1);
newA2(isnan(newA2)) = 0;
The loop based solution is:
newA3 = zeros(T, K);
for k = 1:K
newA3(k:end, k) = A(1:end-k+1);
end
The obvious advantage of the loop based solution is that it does not require the econometrics toolbox. But is that the only advantage? Let's try some timed runs. Set T = K = 10. Then:
Elapsed time is 0.045809 seconds. %# 3lectrologos solution
Elapsed time is 0.049845 seconds. %# lagmatrix solution
Elapsed time is 0.017340 seconds. %# loop solution
3lectrologos solution and the lagmatrix solution are essentially the same. The loop based solution is 3 times faster! Now, to emphasize the problem with 3lectrologos solution, set T = 1000 and K = 10. Then:
Elapsed time is 10.615298 seconds.
Elapsed time is 0.149164 seconds.
Elapsed time is 0.056074 seconds.
Now 3lectrologos solution is two orders of magnitude slower than the lagmatrix solution. But the real winner on the day is the loop based solution that still manages to be 3 times faster than the lagmatrix solution.
Conclusion: Don't discount single-loops in Matlab anymore. They are getting really fast!
For those who are interested, the code for the timed runs is below:
M = 1000; %# Number of iterations for timed test
T = 1000; %# Length of your vector of inputs
K = 10; %# Equivalent to your inp_neur
A = randi(20, T, 1); %# Generate random data
%# 3lectrologos solution (inefficient if T is large relative to K)
tic
for m = 1:M
tmp = tril(toeplitz(A));
newA1 = tmp(:, 1:K);
end
toc
%# lagmatrix solution
tic
for m = 1:M
newA2 = lagmatrix(A, 0:K-1);
newA2(isnan(newA2)) = 0;
end
toc
%# Loop based solution
tic
for m = 1:M
newA3 = zeros(T, K);
for k = 1:K
newA3(k:end, k) = A(1:end-k+1);
end
end
toc
Here's a two liner:
tmp = tril(toeplitz(A));
newA = tmp(:, 1:inp_neur);