saving common strings to a new text file in MATLAB - matlab

To get similar files among different text files I've used ismember()
file1 = {'DSC01605.bmp';'Hampi8.bmp';'DSC01633.bmp';...
'DSC01198.bmp';'DSC01619.bmp'}
file2 = {'DSC01605.bmp';'Hampi8.bmp';'DSC01633.bmp'}
file3 = {'DSC01605.bmp';'Hampi8.bmp'}
matching12 = ismember(file1, file2)
matching13 = ismember(file1, file3)
matchesAll3 = matching12 & matching13
allMatchingStrings = file1(matchesAll3)
Now allMatchingStrings contains
'DSC01605.bmp'
'Hampi8.bmp'
How can i write these files to a new text file all.txt? Problem with my requirements is - suppose allMatchingStrings contains around 10 files, but i need only 5 out of those 10 files. I need to save 5 files to a new text file say all.txt. How can i do that?

A quick way to write them to disk is with the fprintf command.
fid = fopen('all.txt', 'w');
fprintf(fid, '%s\n', allMatchingStrings{:});
fclose(fid);
If you only wanted to write the first 2 filenames in allMatchingStrings then you could limit like this:
filenamesIWant = 1:2;
fid = fopen('all2.txt', 'w');
fprintf(fid, '%s\n', allMatchingStrings{filenamesIWant});
fclose(fid);
This works because the fprintf command repeats for each string you give it. The only trick is getting the curly brackets int he right place.

Related

Matlab Error: ()-indexing must appear last in an index expression

I have this code and want to write an array in a tab delimited txt file :
fid = fopen('oo.txt', 'wt+');
for x = 1 :length(s)
fprintf(fid, '%s\t\n', s(x)(1)) ;
end;
fclose(fid);
but I receive this error :
Error: ()-indexing must appear last in an index expression.
how should i call s(x)(1)? s is an array
s <2196017x1 cell>
when I use this code I get no error but return me some characters not words.
fprintf(fid, '%s\t\n', ( s{x}{1})) ;
With MATLAB, you cannot immediately index into the result of a function using () without first assigning it to a temporary variable (Octave does allow this though). This is due to some of the ambiguities that happen when you allow this.
tmp = s(x);
fprintf(fid, '%s\t\n', tmp(1)) ;
There are some ways around this but they aren't pretty
It is unclear what exactly your data structure is, but it looks like s is a cell so you should really be using {} indexing to access it's contents
fprintf(fid, '%s\t\n', s{x});
Update
If you're trying to read individual words in from your input file and then write those out to a tab-delimited file, I'd probably do something like the following:
fid = fopen('input.txt', 'r');
contents = fread(fid, '*char')';
fclose(fid)
% Break a string into words and yield a cell array of strings
words = regexp(contents, '\s+', 'split');
% Write these out to a file separated by tabs
fout = fopen('output.tsv', 'w');
fprintf(fout, '%s\t', words{:});
fclose(fout)

Read data to matlab with for loop

I want to read the data of a file with size about 60 MB into matlab in some variables, but I get errors. This is my code:
clear all ;
clc ;
% Reading Input File
Dataz = importdata('leak0.lis');
%Dataz = load('leak0.lis');
for k = 1:1370
foundPosition = 1 ;
for i=1:size(Dataz,1)
strp = sprintf('I%dz=',k);
fprintf(strp);
findValue = strfind(Dataz{i}, strp) ;
if ~isempty(findValue)
eval_param = strp + '(foundPosition) = sscanf(Dataz{i},''%*c%*c%*f%*c%*c%f'') ;';
disp(eval_param);
% str(foundPosition) = sscanf(Dataz{i},'%*c%*c%*f%*c%*c%f') ;
eval(eval_param);
foundPosition = foundPosition + 1 ;
end
end
end
When I debugged it, I found out that the dataz is empty & so it doesn't proceed to next lines. I replace it with fopen, load & etc, but it didn't work.
From the Matlab help files, import data is likely failing because it doesn't understand your file format.
From the help files
Name and extension of the file to import, specified as a string. If importdata recognizes the file extension, it calls the MATLAB helper function designed to import the associated file format (such as load for MAT-files or xlsread for spreadsheets). Otherwise, importdata interprets the file as a delimited ASCII file.
For ASCII files and spreadsheets, importdata expects to find numeric
data in a rectangular form (that is, like a matrix). Text headers can
appear above or to the left of the numeric data, as follows:
Assuming that your .lis files actually have delimited text.
You should adjust the delimiter in the importdata call so that Matlab can understand your file.
filename = 'myfile01.txt';
delimiterIn = ' ';
headerlinesIn = 1;
A = importdata(filename,delimiterIn,headerlinesIn);

Read textfile with a mix of floats, integers and strings in the same column

Loading a well formatted and delimited text file in Matlab is relatively simple, but I struggle with a text file that I have to read in. Sadly I can not change the structure of the source file, so I have to deal with what I have.
The basic file structure is:
123 180 (two integers, white space delimited)
1.5674e-8
.
.
(floating point numbers in column 1, column 2 empty)
.
.
100 4501 (another two integers)
5.3456e-4 (followed by even more floating point numbers)
.
.
.
.
45 String (A integer in column 1, string in column 2)
.
.
.
A simple
[data1,data2]=textread('filename.txt','%f %s', ...
'emptyvalue', NaN)
Does not work.
How can I properly filter the input data? All examples I found online and in the Matlab help so far deal with well structured data, so I am a bit lost at where to start.
As I have to read a whole bunch of those files >100 I rather not iterate trough every single line in every file. I hope there is a much faster approach.
EDIT:
I made a sample file available here: test.txt (google drive)
I've looked at the text file you supplied and tried to draw a few general conclusions -
When there are two integers on a line, the second integer corresponds to the number of rows following.
You always have (two integers (A, B) followed by "B" floats), repeated twice.
After that you have some free-form text (or at least, I couldn't deduce anything useful about the format after that).
This is a messy format so I doubt there are going to be any nice solutions. Some useful general principles are:
Use fgetl when you need to read a single line (it reads up to the next newline character)
Use textscan when it's possible to read multiple lines at once - it is much faster than reading a line at a time. It has many options for how to parse, which it is worth getting to know (I recommend typing doc textscan and reading the entire thing).
If in doubt, just read the lines in as strings and then analyse them in MATLAB.
With that in mine, here is a simple parser for your files. It will probably need some modifications as you are able to infer more about the structure of the files, but it is reasonably fast on the ~700 line test file you gave.
I've just given the variables dummy names like "a", "b", "floats" etc. You should change them to something more specific to your needs.
function output = readTestFile(filename)
fid = fopen(filename, 'r');
% Read the first line
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
a = nums{1}(1);
b = nums{1}(2);
% Read 'b' of the next lines:
contents = textscan(fid, '%f', b);
floats1 = contents{1};
% Read the next line:
line = '';
while isempty(line)
line = fgetl(fid);
end
nums = textscan(line, '%d %d', 'CollectOutput', 1);
c = nums{1}(1);
d = nums{1}(2);
% Read 'd' of the next lines:
contents = textscan(fid, '%f', d);
floats2 = contents{1};
% Read the rest:
rest = textscan(fid, '%s', 'Delimiter', '\n');
output.a = a;
output.b = b;
output.c = c;
output.d = d;
output.floats1 = floats1;
output.floats2 = floats2;
output.rest = rest{1};
end
You can read in the file line by line using the lower-level functions, then parse each line manually.
You open the file handle like in C
fid = fopen(filename);
Then you can read a line using fgetl
line = fgetl(fid);
String tokenize it on spaces is probably the best first pass, storing each piece in a cell array (because a matrix doesn't support ragged arrays)
colnum = 1;
while ~isempty(rem)
[token, rem] = strtok(rem, ' ');
entries{linenum, colnum} = token;
colnum = colnum + 1;
end
then you can wrap all of that inside another while loop to iterate over the lines
linenum = 1;
while ~feof(fid)
% getl, strtok, index bookkeeping as above
end
It's up to you whether it's best to parse the file as you read it or read it into a cell array first and then go over it afterwards.
Your cell entries are all going to be strings (char arrays), so you will need to use str2num to convert them to numbers. It does a good job of working out the format so that might be all you need.

Loading multiple text files from a single directory in matlab

First time here so please be gentle
So the basic idea is i have folders with just txt files that has about 20000 points each. I only want specific intervals from each of them.
I have a made a single file with the ranges for that looks like this
. 2715 2955
1132 1372
each row representing the range i want in one file
I want to batch load all the files and export the just the ranges of each. Ive lost too much sleep over this please help
dirName = '*'; %# folder path
files = dir( fullfile(dirName,'*.txt') ); %# list all *.xyz files
files = {files.name}' ; %'# file names
data = cell(numel(files),1) ; %# store file contents
for u=1:numel(files)
A=files{u} ; %# full path to file
files{u};
STR1 = A
B=load(STR1);
end
This is all i have come up with in 2 days. im new to matlab
Thanks
A very good help is the matlab help of fscanf, http://www.mathworks.co.uk/help/matlab/ref/fscanf.html. Also, in your load you don't have the path. Replace the last two lines in your for loop with:
STR1 = [dirName A]
fileID = fopen(STR1,'r');
formatSpec = '%f';
B = fscanf(fileID,formatSpec)
Or try:
delim = ' ';
nrhdr = 0;
STR1 = [dirName A]
A = importdata(STR1, delim, nrhdr);
A.data will be your data, I'm assuming no header lines.

How to put conversion operation in a for loop?

Below is the code to convert .tim file to ascii file for one particular file. But what I need is to convert 500 files(.tim). I also need to save the .ascii file in SAME name as the .tim file name like below for all 500 files.
bin=fopen('file_01.tim','r');
ascii = fread(bin, [43,21000], 'float32');
data_values=ascii';
dlmwrite('file_01.xls', data_values, 'delimiter', '\t', ...
'precision', '%.6f','newline','pc');
Using a "for loop" to do the conversion and save the ascii file with the same name of the tim, was my first thought but I don't know how to that.
You can use dir to get a list of all the filenames in your folder and then proceed just as you have but using replacing 'file_01.tim' with [D(ii).name]
e.g.
D = dir('*.tim');
for ii = 1:size(D,1)
bin=fopen(D(ii).name,'r');
%your processing etc
savename = [strtok(D(ii).name,'.'), '.xls']; %Change the file ext from .tim to .xls
dlmwrite(savename, ...