read text files containing binary data as a single matrix in matlab - matlab

I have a text file which contains binary data in the following manner:
00000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111110000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000000000111111111111111111111111111111111111110000000011100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111100111110000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111110111110000000000000000000000000000000
00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111000000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111110000011100000000000000000000000000000000
00000000000000000000000000000000000000000000011111111111111111111111111111111111100000000011100000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111110111100000000000000000000000000000000
Please note that each 1 or 0 is independent i.e the values are not decimal. I need to find the column wise sum of the file. There are 125 columns in all and there are 840946 rows.
I have tried textread, fscanf and a few other matlab commands, but the result is that they all read each row in decimal format and create a 840946x1 array. I want to create a 840946x125 matrix to compute a column wise sum.

You can use textread to do it. Just read strings and later process them with sscanf, one digit at a time
A = textread('data.txt', '%s');
ncols = size(A, 1);
nrows = size(A{1}, 2);
A = reshape(sscanf([A{:}], '%1d'), nrows, ncols);
Note that now A is transposed, i.e. you have 125 rows.
The column-wise sum is then computed simply by
colsum = sum(A);

Here's a slightly hack-ish approach:
A = textread('data.txt', '%s');
colsum = sum(cat(1,A{:})-'0')
Breakdown:
textread will read each line of 0's and 1's as a single string. A will therefore be a cell-string, with each element equal to a string of length 125.
cat(1,A{:}) will concatenate the cell string into a "normal" Matlab character array of size 840946-by-125.
Subtracting the ASCII-value '0' from any character array consisting of 0's and 1's will return their numeric representation. For example, 'a'-0 = 97, the ASCII-value for lower-case 'a'.
sum will finally sum over the columns of this array.

Related

How to separate 0's and 1's in text document after 50 consecutive values by using MATLAB

I have black and white image and i read that image in MATLAB. I
have save my image file in binary as 0's and 1's by using dlmwrite
command in MATLAB.
Now i want to implement an Algorithm in which i want to get first 50
values in first row and then next 50 values in second row and third
50 values in next row and so on. What should be the code for it. I
have written the code and getting an error in logic.
a= 100;
b= 100;
for j= 0:length(a)
for i= 0:length(b)
dlmwrite('rd.txt','rd[i,j]','delimiter','b','c');
end
end
I need result like this
00000000000000000000000000000000000000000000000000 // 50 bits per line
00000000000000000000000000000000000000000000000000 // 50 bits per line
00000000000000000000000000000000000000000000000000 // 50 bits per line
.
.
So on
I'm assuming you have your 0s and 1s in an array, and want to somehow output them with a given column width. If that is the case, reshape can generally do what I believe you want, except that your data may not be evenly divisible by the width of your columns, i.e., 50. One way would be to first pad the data such that it is, e.g.
data = 'abcdefghijklmnopqrstuvwxyz';
colwidth = 6;
% padding data to fit evenly in full rows
data = [data, repmat(' ', 1, colwidth-mod(numel(a), colwidth))];
% reshaping into columns of given width
reshape(data, colwidth, [])'
This will give you
5×6 char array
'abcdef'
'ghijkl'
'mnopqr'
'stuvwx'
'yz '
Or you can get the parts separately:
reshape(data(1:end-mod(numel(data), colwidth)), colwidth, [])'
data(end-mod(numel(data), colwidth)+1:end)
giving you
ans =
4×6 char array
'abcdef'
'ghijkl'
'mnopqr'
'stuvwx'
ans =
'yz'

Matlab fscanf read two column character/hex data from text file

Need to read in data stored as two columns of hex values in text file temp.dat into a Matlab variable with 8 rows and two columns.
Would like to stick with the fcsanf method.
temp.dat looks like this (8 rows, two columns):
0000 7FFF
30FB 7641
5A82 5A82
7641 30FB
7FFF 0000
7641 CF05
5A82 A57E
30FB 89BF
% Matlab code
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
% Matlab treats hex a a character string
formatSpec = '%s %s';
% Want the output variable to be 8 rows two columns
sizeA = [8,2];
A = fscanf(fid,formatSpec,sizeA)
fclose(fid);
Matlab is producing the following which I don't expect.
A = 8×8 char array
'03577753'
'00A6F6A0'
'0F84F48F'
'0B21F12B'
'77530CA8'
'F6A00F59'
'F48F007B'
'F12B05EF'
In another variation, I attemped changing the format string like this
formatSpec = '%4c %4c';
Which produced this output:
A =
8×10 char array
'0↵45 F7↵78'
'031A3F65E9'
'00↵80 4A↵B'
'0F52F0183F'
'7BA7B0C20 '
'F 86↵0F F '
'F724700AB '
'F6 1F↵55 '
Still another variation like this:
formatSpec = '%4c %4c';
sizeA = [8,16];
A = fscanf(fid,formatSpec);
Produces a one by 76 character array:
A =
'00007FFF
30FB 7641
5A82 5A827641 30FB
7FFF 0000
7641CF05
5A82 A57E
30FB 89BF'
Would like and expect Matlab to produce a workspace variable with 8 rows and 2 columns.
Have followed the example on the Matlab help area here:
https://www.mathworks.com/help/matlab/ref/fscanf.html
My Matlab code is based on the 'read file contents into an array' section about 1/3 of the way down the page. The example I reference is doing something very similar except that the two columns are one int and one float rather than two characters.
Running Matlab R2017a on Redhat.
Here is the complete code with the solution provided by Azim and comments about
what I learned as a result of posting the question.
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
formatSpec = '%9c\n';
% specify the output size as the input transposed, NOT the input.
sizeA = [9,8];
A = fscanf(fid,formatSpec,sizeA);
% A' is an 8 by 9 character array, which is the goal matrix size.
% B is an 8 by 1 cell array, each member has this format 'dead beef'.
%
% Cell arrays are data types with indexed data containers called cells,
% where each cell can contain any type of data.
B = cellstr(A');
% split divides str at whitespace characters.
S = split(C)
fclose(fid)
S =
8×2 cell array
'0000' '7FFF'
'30FB' '7641'
'5A82' '5A82'
'7641' '30FB'
'7FFF' '0000'
'7641' 'CF05'
'5A82' 'A57E'
'30FB' '89BF'
It is likely your, 8x2 MATLAB variable would end up being a cell array. This can be done in two steps.
First, your lines have 9 characters so you could use formatSpec = '%9c\n' to read each line. Next you need to adjust the size parameter to read 9 rows and 8 columns; sizeA = [9 8]. This will read in all 9 characters into columns of the output; transposing the output will get you closer.
In the second step you need to convert the result of fscanf into your 8x2 cell array. Since you have R2017a you can then use cellstr and split to get your result.
Finally, if you need the integer values of each hex value you can use hex2dec on each cell in the cell-array.

How to split cell array values into two columns in MATLAB?

I have data in a cell array as shown in the variable viewer here:
{[2.13949546690144;56.9515770543056],
[1.98550875192835;50.4110852121618],
...}
I want to split it into two columns with two decimal-point numbers as:
2.13 56.95
1.98 50.41
by removing opening and closing braces and semicolons such as [;]
(to do as like "Text to columns" in Excel).
If your N-element cell array C has 2-by-1 numeric data in each cell, you can easily convert that into an N-by-2 numeric matrix M like so (using the round function to round each element to 2 significant digits):
M = round([C{:}].', 2);
The syntax C{:} creates a comma-separated list of the contents of C, equivalent to C{1}, C{2}, ... C{N}. These are all horizontally concatenated using [ ... ], then the result is transposed using .'.
% let's build a matching example...
c = cell(2,1);
c{1} = [2.13949546690144; 56.9515770543056];
c{2} = [1.98550875192835; 50.4110852121618];
% convert your cell array to a double array...
m = cell2mat(c);
% take the odd rows and place them to the left
% take the even rows and place them to the right
m = [m(1:2:end,:) m(2:2:end,:)];
% round the whole matrix to two decimal digits
m = round(m,2);
Depending on your environment settings, you may still see a lot of trailing zeros after the first two decimal digits... but don't worry, everything is ok (on the precision point of view). If you want to display only the "real" digits of your numbers, use this command:
format short g;
you should use cell2mat
A={2.14,1.99;56.95,50.41};
B=cell2mat(A);
As for the rounding, you can do:
B=round(100*B)/100;

Importing text file into matrix form with indexes as strings?

I'm new to Matlab so bear with me. I have a text file in this form :
b0002 b0003 999
b0002 b0004 999
b0002 b0261 800
I need to read this file and convert it into a matrix. The first and second column in the text file are analogous to row and column of a matrix(the indices). I have another text file with a list of all values of 'indices'. So it should be possible to create an empty matrix beforehand.
b0002
b0003
b0004
b0005
b0006
b0007
b0008
Is there anyway to access matrix elements using custom string indices(I doubt it but just wondering)? If not, I'm guessing the only way to do this is to assign the first row and first column the index string values and then assign the third column values based on the first text file. Can anyone help me with that?
You can easily convert those strings to numbers and then use those as indices. For a given string, b0002:
s = 'b0002'
str2num(s(2:end); % output = 2
Furthermore, you can also do this with a char matrix:
t = ['b0002';'b0003';'b0004']
t =
b0002
b0003
b0004
str2num(t(:,2:end))
ans =
2
3
4
First, we use textscan to read the data in as two strings and a float (could use other numerical formats. We have to open the file for reading first.
fid = fopen('myfile.txt');
A = textscan(fid,'%s%s%f');
textscan returns a cell array, so we have to extract your three variables. x and y are converted to single char arrays using cell2mat (works only if all the strings inside are the same length), n is a list of numbers.
x = cell2mat(A{1});
y = cell2mat(A{2});
n = A{3};
We can now convert x and y to numbers by telling it to take every row : but only the second to final part of the row 2:end, e.g 002, 003 , not b002, b003.
x = str2num(x(:,2:end));
y = str2num(y(:,2:end));
Slight problem with indexing - if I have a matrix A and I do this:
A = magic(8);
A([1,5],[3,8])
Then it returns four elements - [1,3],[5,3],[1,8],[5,8] - not two. But what you want is the location in your matrix equivalent to x(1),y(1) to be set to n(1) and so on. To do this, we need to 1) work out the final size of matrix. 2) use sub2ind to calculate the right locations.
% find the size
% if you have a specified size you want the output to be use that instead
xsize = max(x);
ysize = max(y);
% initialise the output matrix - not always necessary but good practice
out = zeros(xsize,ysize);
% turn our x,y into linear indices
ind = sub2ind([xsize,ysize],x,y);
% put our numbers in our output matrix
out(ind) = n;

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end