Import big txt file into matlab without cutting it - matlab

I have a text file whose size is 40000 x 40000, called for example "Name" the first line and first column are the labels and all other number are decimal numbers.
I need to save it into mat file in matlab without the labels, it means without the first line and first columns.
I have tried the following method:
data = importdata('Name.txt') ; %
save data.mat -v7.3
But, I get the data cut into only 590 x 590 , it means it's only a part of the data in the file.
How can I save the the whole data into mat file in matlab ?
EDIT
I also tried this way:
M = readmatrix('Name.txt');
M(:,1) = [];
It read all the rows number of 40000, but the columns are read till 587 !!

At first you should find out where the error happens. Is your data read correctly?
The following solution should work in any case:
fID = fopen('Name.txt');
headerline = fgetl(fID);
C = textscan(fID,['%s' repmat('%f', 1, 40000)], 'Delimiter', '\r');
dMat = [C{2:end}];
save('Data.mat', 'dMat' , '-v7.3')
If this doesn't work you can always try to read the data via:
T = readtable('Name.txt')
and then have a look at the resulting table.

Related

String vector to array

I am trying to make a script in Matlab that pulls data from a file and generates an array of data. Since the data is a string I've tried to split it into columns, take the transpose, and split it into columns again to populate an array.
When I run the script I don't get any errors, but I also don't get any useful data. I tell it to display the final vector (Full_Array) and I get {1×4 cell} 8 times. When I try to use strsplit I get the error:
'Error using strsplit (line 80) First input must be either a character vector or a string scalar.'
I'm pretty new to Matlab and I honestly have no clue how to fix it after reading through similar threads and the documentation I'm out of ideas. I've attached the code and the data to read in below. Thank you.
clear
File_Name = uigetfile; %Brings up windows file browser to locate .xyz file
Open_File = fopen(File_Name); %Opens the file given by File_Name
File2Vector = fscanf(Open_File,'%s'); %Prints the contents of the file to a 1xN vector
Vector2ColumnArray = strsplit(File2Vector,';'); %Splits the string vector from
%File2Vector into columns, forming an array
Transpose = transpose(Vector2ColumnArray); %Takes the transpose of Vector2ColumnArray
%making a column array into a row array
FullArray = regexp(Transpose, ',', 'split');
The data I am trying to read in comes from a .xyz file that I have titled methylformate.xyz, here is the data:
O2,-0.23799,0.65588,-0.69492;
O1,0.50665,0.83915,1.47685;
C2,-0.32101,2.08033,-0.75096;
C1,0.19676,0.17984,0.49796;
H4,0.66596,2.52843,-0.59862;
H3,-0.67826,2.36025,-1.74587;
H2,-1.03479,2.45249,-0.00927;
H1,0.23043,-0.91981,0.45346;
When I started using Matlab I also had problems with the data structure. The last line
FullArray = regexp(Transpose, ',', 'split');
splits each line and stores it in a cell array. In order to access the individual strings you have to index with curly brackets into FullArray:
FullArray{1}{1} % -> 'O2'
FullArray{1}{2} % -> '-0.23799'
FullArray{2}{1} % -> 'O1'
FullArray{2}{2} % -> '0.50665'
Thereby the first number corresponds to the row and the second to the particular element in the row.
However, there are easier functions in Matlab which load text files based on regular expressions.
Usually, the easiest function for reading mixed data is readtable.
data = readtable('methylformate.txt');
However, in your case this is more complex because
readtable can't cope with .xyz files, so you'd have to copy to .txt
The semi-colons confuse the read and make the last column characters
You can loop through each row and use textscan like so:
fid = fopen('methylformate.xyz');
tline = fgetl(fid);
myoutput = cell(0,4);
while ischar(tline)
myoutput(end+1,:) = textscan(tline, '%s %f %f %f %*[^\n]', 'delim', ',');
tline = fgetl(fid);
end
fclose(fid);
Output is a cell array of strings or doubles (as appropriate).

How to extract one combination of two elements in a cell once a time

I want to extract one row at a time from a txt.file as follows in Matlab.
29,Downstairs,9481262431000,3.79,8.2,5.6252036;
29,Downstairs,9481312266000,2.96,7.08,4.1814466;
13,Walking,1047162303000,-2.41,5.18,1.5390993;
13,Walking,1047212260000,-0.3,1.73,-0.50395286;
13,Walking,1047262309000,1.27,11.03,2.5606253;
13,Walking,1047312266000,-1.42,14.75,8.158588;
14,Jogging,60423222332000,13.82,-4.37,12.64;
14,Jogging,60423272319000,14.33,7.08,-2.3;
14,Jogging,60423322338000,19.42,19.46,-7.59;
This is a portion of the whole file. I need to extract every combination that the first two columns can have. For example, I extract all the rows containing
29,Downstairs
Then
13,Walking
Then
14,Jogging
and so on.
Is there a easy way to express all the combination I want to traverse? Because The txt.file is huge. It contains 36*6=216 different combinations(the first column containing number from 1 to 36, second column containing 6 activities). And I want to store the rows sharing the same combination (usually one combination has more than 500 different entries, not like the example 29,Downstairs only 2) into a cell structure.
Assume you have all your data in a file called myFile.txt, then:
fid = fopen('myFile.txt');
A = textscan(fid,'%s');
fclose(fid);
lines = A{1};
allInstances = cell(length(lines),1);
for i = 1:length(lines)
temp = textscan(lines{i}, '%s', 'delimiter', ',');
allInstances{i} = [temp{1}{1}, ',',temp{1}{2}];
end
uniqueCombos = unique(allInstances)
Result:
uniqueCombos =
'13,Walking'
'14,Jogging'
'29,Downstairs'

Performing logical OR between multiple CSVs with 32 bit hex values using MATLAB

I am trying to read multiple (50+) CSV files within the same folder using MATLAB. These CSVs contain 3 32 bit hex values and the format of the data is the same for all files. Each CSV contains the data within 2 rows and 3 columns with no headers. For e.g.
00000800,D404002C,4447538F
000008FF,D404002C,4447538F
After ORing the 2 rows from all files, the final 2 rows of 3 32 bit hex values need to be written out to a CSV.
Now, before jumping in the deep end trying to process multiple files, I have started by just trying to OR Row 1 with Row 2 of the same file. So, 00000800| 000008FF , D404002C | D404002C.. I have been able to convert them to binary and do a logical OR between the 3 values however currently have the following issues:
1) If the MSB of the hex value starts with 3 or 4 (binary 0011 or 0100) then the leading 0's are missed or if the second hex value happens to be 800 then the leading 00000's are missed.
2) I cannot convert the integer cell array back to hex
I have seen many posts about just reading CSVs using MATLAB or separating the data and etc on stackoverflow and matlabcentral however not been able to interpret any of them to sort my issue. Any help would be much appreciated.Below is what I have so far:
fid = fopen('File1.csv');
c = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
contents = c{1};
row1 = strsplit(contents{1},',','CollapseDelimiters',0);
row2 = strsplit(contents{2},',','CollapseDelimiters',0);
x = 1;
y = 1;
while x <= length(row1)
column1{x} = hex2dec(row1(x));
column2{x} = hex2dec(row2(x));
x = x + 1;
end
while y <= length(column1)
bin1{y} = zeros(1,32);
bin2{y} = zeros(1,32);
bin1{y} = dec2bin(column1{y});
bin2{y} = dec2bin(column2{y});
result{y} = bitor(bitget(uint8(bin1{y}),1),bitget(uint8(bin2{y}),1));
y = y+ 1 ;
end
Also, eventually need to be able to do this process with multiple CSVs so I have attached link to File1.csv and File2.csv if someone wants to try to OR row 1 of File1 with row 2 of File2.csv and so on.
CSV Files
Apologies if I have missed anything, Please leave a comment and I'll try to explain it further.
Thanks!
EDIT: Hope the image below explains what I am trying to do better.
You can try the following approach:
use the dir function to get the list of files to be processed
create a loop to go through the files to be processed. In the loop
read the input files
convert the hexadecimal values read from the files into a matrix of characters using the char function
convert the data stored in the char matrinx from hex to dec and then to uint32 using the functions hex2dec and uint32
perform the or using the bitor function
go to next iteration
at the end of the loop, write the output
The above described approach has been implemented in the folowing code:
% Get the list of CSV files
hex_files=dir('O_File*.csv');
% Open the outpur file
fp_out=fopen('new_hex_file.csv','wt');
% Loop over the CSV files
for i=1:length(hex_files)
% Read the i-th CSV file
fid = fopen(hex_files(i).name);
c = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
% Get the 2 rows
contents = c{1};
row_1=char(strsplit(contents{1},',','CollapseDelimiters',0));
row_2=char(strsplit(contents{2},',','CollapseDelimiters',0));
% Convert from hex to uint32
row_d_1=uint32(hex2dec(row_1));
row_d_2=uint32(hex2dec(row_2));
if(i == 1)
% Store the row of the first file and continue
tmp_var_1=row_d_1;
tmp_var_2=row_d_2;
continue
else
% OR the two rows
tmp_var_1=bitor(tmp_var_1,row_d_1);
tmp_var_2=bitor(tmp_var_2,row_d_2);
end
end
% Write the OR values into the new file
fprintf(fp_out,'%08X,%08X,%08X\n',tmp_var_1);
fprintf(fp_out,'%08X,%08X,%08X\n',tmp_var_2);
% Close the output file
fclose(fp_out);
The following input files have been used to test it:
File1.csv
00000800,D404002C,4447538F
000008FF,D404002C,4447538F
File2.csv
000008FF,D404DD2C,49475115
11100800,D411EC2C,3ACD1266
File3.csv
123456FF,ABCDEF2C,369ABC15
01012369,00110033,36936966
The output is:
12345EFF,FFCDFF2C,7FDFFF9F
11112BFF,D415EC3F,7EDF7BEF
Hope this helps.
Qapla'

Importing large amount of data into MATLAB?

I have a text file that is ~80MB. It has 2 cols and around 6e6 rows. I would like to import the data into MATLAB, but it is too much data to do with the load function. I have been playing around with the fopen function but cant get anything to work.
Ideally I would like to take the first col of data and import and eventually have it in one large array in MATLAB. If that isn't possible, I would like to split it into arrays of 34,013 in length. I would also like to do the same for the 2nd col of data.
fileID = fopen('yourfilename.txt');
formatSpec = '%f %f';
while ~feof(fileID)
C = textscan(fileID,formatSpec,34013);
end
Hope this helps..
Edit:
The reason you are getting error is because C has two columns. So you need to take the columns individually and handle them.
For example:
column1data = reshape(C(:,1),301,113);
column2data = reshape(C(:,2),301,113);
You may also consider to convert your file to binary format if your data file is not changing each time you want to load it. Then you'll load it way much faster.
Or you may do "transparent binary conversion" like in the function below. Only first time you load the data will be slow. All subsequent will be fast.
function Data = ReadTextFile(FileName,NColumns)
MatFileName = sprintf('%s.mat',FileName); % binary file name
if exist(MatFileName,'file')==2 % if it exists
S = load(MatFileName,'Data'); % load it instead of
Data = S.Data; % the original text file
return;
end
fh = fopen(FileName); % if binary file does not exist load data ftom the original text file
fh_closer = onCleanup( #() fclose(fh) ); % the file will be closed properly even in case of error
Data = fscanf(fh, repmat('%f ',1,NColumns), [NColumns,inf]);
Data = Data';
save(MatFileName,'Data'); % and make binary "chache" of the original data for faster subsequent reading
end
Do not forget to remove the MAT file when the original data file is changed.

Textscan until end of line

I'm trying to textscan a file and read a single line until the end of it, undependently of the number of elements in that line.
My file is a .txt file formatted like this :
602,598,302,456,1023,523,....
293,291,566,331,987,56,....
589,202,429,2911,294,567,...
And so on. I have the number of the line, and all lines have the same number of elements, but it can vary from one file to another.
I wrote something like:
fid = fopen('somefile.txt');
C = textscan(fid, formatSpec,'HeaderLines',Row-1);
TheLine = C{1};
fclose(fid);
X = numel(TheLine);
plot(1:X,TheLine);
I really don't know what to type in the formatSpec field. I've tried a few things in the way of %[^\n] but I didn't get much sucess.
Try this -
C = textscan(fid, '%d,','HeaderLines',Row-1);
Row will specify the row of data that you want to extract from the text file.