Matlab Muliple delimiter removing during import - matlab

I am struggling with a problem and want a easy way around. I have a big array of data where I have some vector values as ( 1.02 1.23 3.32) format. I want it as 1.02 1.23 3.32 in a tabular form. The problem here is that there are two types of delimiter '(' and ')'.
can anyone help in writing a code for this? I have something like this:
filename = 'U.dat';
delimiterIn = '(';
headerlinesIn = 0;
A = textscan(filename,delimiterIn,headerlinesIn);
But one thing is that it only have one delimiter "(" and it does not work either.
Nishant

If your text file looks something like this:
(1 2 3) (4 5 6)
(7 8 9) (10 11 12)
You can read it in as strings and convert it to a cell array of vectors like this:
% read in file
clear all
filename = 'delim.txt';
fid = fopen(filename); %opens file for reading
tline = fgets(fid); %reads in a line
index = 1;
%reads in all lines until the end of the file is reached
while ischar(tline)
data{index} = tline;
tline = fgets(fid);
index = index + 1;
end
fclose(fid); %close file
% convert strings to a cell array of vectors
rowIndex = 1;
colIndex = 1;
outData = {};
innerStr = [];
for aCell = data % for each entry in data
sline = aCell{1,1};
for c = sline % for each charecter in the line
if strcmp(c, '(')
innerStr = [];
elseif strcmp(c, ')')
outData{rowIndex,colIndex} = num2str(innerStr);
colIndex = colIndex + 1;
else
innerStr = [innerStr, c];
end
end
rowIndex = rowIndex + 1;
colIndex = 1;
end
outData

Related

Importing data block with Matlab

I have a set of data in the following format, and I would like to import each block in order to analyze them with Matlab.
Emax=0.5/real
----------------------------------------------------------------------
4.9750557 14535
4.9825821 14522
4.990109 14511
4.9976354 14491
5.0051618 14481
5.0126886 14468
5.020215 14437
5.0277414 14418
5.0352678 14400
5.0427947 14372
5.0503211 14355
5.0578475 14339
5.0653744 14321
Emax=1/real
----------------------------------------------------------------------
24.965595 597544
24.973122 597543
24.980648 597543
24.988174 597542
24.995703 597542
25.003229 597542
I have modified this piece of code from MathWorks, but I think, I have problems dealing with the spaces between each column.
Each block of data consist of 3874 rows and is divided by a text (Emax=XX/real) and a line of ----, unfortunately is the only way the software export the data.
Here is one way to import the data:
% read file as a cell-array of lines
fid = fopen('file.dat', 'rt');
C = textscan(fid, '%s', 'Delimiter','');
C = C{1};
fclose(fid);
% remove separator lines
C(strncmp('---',C,3)) = [];
% location of section headers
headInd = [find(strncmp('Emax=', C, length('Emax='))) ; numel(C)+1];
% extract each section
num = numel(headInd)-1;
blocks = struct('header',cell(num,1), 'data',cell(num,1));
for i=1:num
% section header
blocks(i).header = C{headInd(i)};
% data
X = regexp(C(headInd(i)+1:headInd(i+1)-1), '\s+', 'split');
blocks(i).data = str2double(vertcat(X{:}));
end
The result is a structure array containing the data from each block:
>> blocks
blocks =
2x1 struct array with fields:
header
data
>> blocks(2)
ans =
header: 'Emax=1/real'
data: [6x2 double]
>> blocks(2).data(:,1)
ans =
24.9656
24.9731
24.9806
24.9882
24.9957
25.0032
This should work. I don't think textscan() will work with a file like this because of the breaks between blocks.
Essentially what this code does is loop through lines between blocks until it finds a line that matches the data format. The code is naive and assumes that the file will have exactly the number of blocks lines per block that you specify. If there were a fixed number of lines between blocks it would be a lot easier and you could remove the first inner loop and replace with just ~=fgets(fid) once for each line.
function block_data = readfile(in_file_name)
fid = fopen(in_file_name, 'r');
delimiter = ' ';
line_format = '%f %f';
n_cols = 2; % Number of numbers per line
block_length = 3874; % Number of lines per block
n_blocks = 2; % Total number of blocks in file
tline = fgets(fid);
line_data = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
block_n = 0;
block_data = zeros(n_blocks,block_length,n_cols);
while ischar(tline) && block_n < n_blocks
block_n = block_n+1;
tline = fgets(fid);
if ischar(tline)
line_data = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
end
while ischar(tline) && isempty(line_data)
tline = fgets(fid);
line_data = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
end
line_n = 1;
while line_n <= block_length
block_data(block_n,line_n,:) = cell2mat(textscan(tline,line_format,'delimiter',delimiter,'MultipleDelimsAsOne',1));
tline = fgets(fid);
line_n = line_n+1;
end
end
fclose(fid)

how to dlmwrite a file from array

How to write the cell as below in text file(my_data.out)?
http_only = cell2mat(http_only)
dlmwrite('my_data.out',http_only)
I get the error as below:(I have tried to solve but still return me the error)
Here is my full code:
I want to generate the text file for each of the data which only store 'http_only'
then check for that is it meet the word in split_URL.
%data = importdata('DATA/URL/training_URL')
data = importdata('DATA/URL/testing_URL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')
no_http_URL = regexp(domain_URL,'https?://(?:www\.)?(.*)','tokens','once');
no_http_URL = vertcat(no_http_URL{:});
split_URL = regexp(no_http_URL,'[:/.]*','split')
[sizeData b] = size(split_URL);
for i = 1:100
A7_data = split_URL{i};
data2=fopen(strcat('DATA\WEBPAGE_SOURCE\TESTING_DATA\',int2str(i),'.htm'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
img_only = regexp(CharData, '<img src.*?>', 'match'); %checking
http_only = regexp(img_only, '"http.*?"', 'match');
http_only1 = horzcat(http_only{:});
fid = fopen('my_data.out',int2str(i),'w');
for col = 1:numel(http_only1)
fprintf(fid,'%s\n',http_only1{:,col});
end
fclose(fid);
feature7_data=(~cellfun('isempty', regexpi(CharData , A7_data, 'once')))
B7(i)=sum(feature7_data)
end
feature7(B7>=5)=-1;
feature7(B7<5&B7>2)=0;
feature7(B7<=2)=1;
feature7'
Write cell-by-cell using fprintf -
fid = fopen('my_data.out','w');
for col = 1:numel(http_only)
fprintf(fid,'%s\n',http_only{:,col});
end
fclose(fid);
Edit 1: If your input is a cell array of cell arrays, use this code instead.
Code
http_only1 = horzcat(http_only{:});
fid = fopen('my_data.out','w');
for col = 1:numel(http_only1)
fprintf(fid,'%s\n',http_only1{:,col});
end
fclose(fid);
Edit 2: For a number of inputs to be stored into separate files, use this demo -
data1 = {{'[]'} {'"http://google.com"'} {'"http://yahoo.com'}};
data2 = {{'[]'} {'"http://overflow.com"'} {'"http://meta.exchange.com'}};
data = cat(1,data1,data2);
for k = 1:size(data,1)
data_mat = horzcat(data{k,:});
out_filename = strcat(out_basename,num2str(k),'.out');
fid = fopen(out_filename,'w');
for col = 1:numel(data_mat)
fprintf(fid,'%s\n',data_mat{:,col});
end
fclose(fid);
end

Matlab too many outputs

The program myfile.m reads a txt file that contains a total of 25 names and numbers like
exemple:
John doughlas 15986
Filip duch 357852
and so on.
The program converts them to
15986 Doughlas John
357852 duch Filip
This is without function, with it I get too many outputs.
Error message:
Error using disp
Too many output arguments.
Error in red4 (line 26)
array = disp(All);
Original code below:
function array = myfile(~)
if nargin == 0
dirr = '.';
end
answer = dir(dirr);
k=1;
while k <= length(answer)
if answer(k).isdir
answer(k)=[];
else
filename{k}=answer(k).name;
k=k+1;
end
end
chose=menu( 'choose file',filename);
namn = char(filename(chose));
fid = fopen(namn, 'r');
R = textscan(fid,'%s %s %s');
x=-1;
k=0;
while x <= 24
x = k + 1;
All = [R{3}{x},' ',R{1}{x},' ',R{2}{x}];
disp(All)
k = k + 1;
end
fclose(fid);
Now I have got many good answers from people and sites like functions but I cant get the results like the above with function.
I have tried combining them and got some results:
y = 15986 & [a,z,b] = myfile
y = 25 & myfile = x
y = numbers name1,2,3,4 and so one & myfile = fprintf(All)
y = & I used results().namn,
numbers name 1 & results().id, results().lastname
y =
numbers name 2 and so on.
The result I want is:
y = myfile
y =
15986 Doughlas John
357852 duch Filip
update: Change it like Eitan T said but did't get the result like above.
Got the result:
'John doughlas 15986'
'Filip duch 357852'
function C = myfile()
if nargin == 0
dirr = '.';
end
answer = dir(dirr);
k=1;
while k <= length(answer)
if answer(k).isdir
answer(k)=[];
else
filname{k}=answer(k).name;
k=k+1;
end
end
chose=menu( 'choose',filname);
name = char(filname(chose));
fid = fopen(name, 'r');
C = textscan(fid, '%s', 'delimiter', '');
C = regexprep(C{1}, '(\w+) (\w+) (\w+)', '$3 $2 $1');
fclose(fid);
Why use loops? Read the lines at once with textscan and use regexprep to manipulate the words:
fid = fopen(filename, 'r');
C = textscan(fid, '%s', 'delimiter', '');
C = regexprep(C{1}, '(\w+) (\w+) (\w+)', '$3 $2 $1')
fclose(fid);
The result is a cell array C, each cell storing a line. For your example, you'll get a 2×1 cell array:
C =
'15986 doughlas John'
'357852 duch Filip'
I'm not sure what you want to do with it, but if you provide more details I can improve my answer further.
Hope this helps!

Matlab: Having trouble with this code regarding arrays

Here is the question:
The file upcs.txt contains a list of UPC codes that were scanned in a grocery
store. Each line should, ideally, contain 12 digits corresponding to a single product. Read
the contents of the file and store the entries into an m x 12 sized numeric array named
codes, where m is the number of valid lines in the le. Lines that have less or more than
12 digits should be discarded. Some lines with 12 digits may have digits that were not
correctly scanned, which were replaced by the letter X'. These missing digits should be
represented in the array codes by the integer-1'. After processing the file, print the total
number of lines read, the number of lines discarded, and the number of lines correctly
processed and stored in codes.
upcs.txt:
X9096X082489
921642004330
810905023006
733554287763
413527622XX1
287X35871528
100093334850
764491079X90
1537X8886614
086755751640
860053705316
980098819206
038356338621
577577248178
82825685985
684580785580
736657539753
71113617151
935014271064
702345843488
58316491755
110118383664
333841856254
996003013296
495258095746
4457870230
684104168936
522784039910
6504512835
699553963094
853110488363
554147120089
Here is my code so far:
fid = fopen('upcs.txt');
mat = [];
if fid == -1
disp('File open was not successful')
else codes = {};
while feof(fid) == 0
aline = fgetl(fid);
num = strtok(aline);
codes = [codes; num]
end;
[m n] = size(codes)
discard = 0
for i = 1:m
len = length (codes(i))
if len ~= 12
codes = [];
discard = discard + 1
else
char(codes(i))
codes = strrep(codes, 'X', '-1')
end
end
codes
end
The trouble I am having is that I don't know how to delete the codes that have less or more than 12 digits in my code.
clear;clc;
fid = fopen('upcs.txt','r');
if fid == -1
error('File open was not successful');
end
C = textscan(fid,'%s');
C = C{1};
all_codes_num = size(C,1);
codes_discarded_num = 0;
codes_missed_digit_num = 0;
codes_correct_num = 0;
codes = [];
for i = 1:all_codes_num
one_code = C{i};
if length(one_code) == 12
x_flag = 0;
code_tmp = zeros(1,12);
for j = 1:12
if one_code(j) == 'X' || one_code(j) == 'x'
code_tmp(j) = -1;
x_flag = 1;
else
code_tmp(j) = str2num(one_code(j));
end
end
if x_flag == 1
codes_missed_digit_num = codes_missed_digit_num +1;
end
codes = [codes;code_tmp];
elseif length(one_code) ~= 12
codes_discarded_num = codes_discarded_num + 1;
end
end
all_codes_num
codes_discarded_num
codes_with_x = codes_missed_digit_num
correct_codes_without_x = all_codes_num - codes_discarded_num - codes_with_x
codes: have all the correct codes and also 12-length codes with missing data which has been replaced with '-1'. This is a m*12 numeric matrix. Each row is a code.
all_codes_num: the number of all the lines we have read
codes_discarded_num: the number of all the codes which have more or less than 12 chars
codes_with_x: the number of 12-length codes which have missing digits.
correct_codes_without_x: the number of 12-length codes which have digits only.
In the codes, I assume that in the 'upcs.txt', each line is a code.

Writing a mixed matrix of integers and strings to .csv in Matlab

Perhaps this question has been answered before but I can't seem to find any good documentation on it. So my problem is the following:
Suppose I have two vectors of the same length in Matlab
x = [1;2;3];
and
y = ['A';'B';'C'];
Basically I would like to create the matrix {x,y} (ie 3 rows, 2 columns) and then write it to a .csv file. So in the end I'd like to see a .csv file like
1,A
2,B
3,C
This is just a mocked-up example but really I have 75 columns with each being either a column of strings or numerics. Any suggestions are greatly appreciated!
Actually here is the solution
http://www.mathworks.com/help/matlab/import_export/write-to-delimited-data-files.html#br2ypq2-1
This works much simpler.
If you sort your data into a suitable cell
A = cell(3,2);
A{1,1} = 1;
A{2,1} = 2;
A{3,1} = 3;
A{1,2} = 'A';
A{2,2} = 'B';
A{3,2} = 'C';
you may then call this function:
cell2csv(filename,A)
function cell2csv(filename,cellArray,delimiter)
% Writes cell array content into a *.csv file.
%
% CELL2CSV(filename,cellArray,delimiter)
%
% filename = Name of the file to save. [ i.e. 'text.csv' ]
% cellarray = Name of the Cell Array where the data is in
% delimiter = seperating sign, normally:',' (default)
%
% by Sylvain Fiedler, KA, 2004
% modified by Rob Kohr, Rutgers, 2005 - changed to english and fixed delimiter
if nargin<3
delimiter = ',';
end
datei = fopen(filename,'w');
for z=1:size(cellArray,1)
for s=1:size(cellArray,2)
var = eval(['cellArray{z,s}']);
if size(var,1) == 0
var = '';
end
if isnumeric(var) == 1
var = num2str(var);
end
fprintf(datei,var);
if s ~= size(cellArray,2)
fprintf(datei,[delimiter]);
end
end
fprintf(datei,'\n');
end
fclose(datei);