how can I read multiple text files in matlab? - matlab

if possible please let me know that how I can read different text files in Matlab .
considering that there is 33 txt files that every one should process.
it is my code which has error. :(
textFilename = cell(1,33);
id = cell(1,33);
for k=1:33;
textFilename{k} = fullfile('C:\Users\Desktop\SentimentCode\textfiles',['file' num2str(k) '.txt']);
id{k} = fopen(textFilename{k},'rt');
str{k} = textscan(id{k},'%s%s');
end
str(str == '.') = '';
str(str == '_') = '';
str(str == '-') = '';
% Remove numbers from text
T =regexprep(str, '[\d]', ' ');
and my error is : ??? Undefined function or method 'eq' for input arguments of type 'cell'.
Error in ==> Untitled9 at 23
str(str == '.') = '';

In your current edit your error seem more directed to the removal of . - and _ characters.
The == comparasion works better with character strings while textscan returns a cell
Instead of
str(str == '.') = '';
str(str == '_') = '';
str(str == '-') = '';
try using
regexprep(str,'(\.|-|_)','')
to replace all at once (the '\.' is needed as '.' is a special character).
This works on cellstrings so depending on how deep your cell structure goes you might need to call it within a for loop, str{k},str{k}{1}, str{k}{i} etc...
An alternative could be to look at cellfun
or/and strjoin... depending on how your data are arranged in the files.

Just by looking at the example code:
extFilename{k} = fullfile(..);
should be
textFilename{k} = fullfile(...);
Also it is good idea to close the files after you read them: fclose(id{k})

Related

Shifting a string in matlab

Ok so I have retrieved this string from the text file now I am supposed to shift it by a specified amount. so for example, if the string I retrieved was
To be, or not to be
That is the question
and the shift number was 5 then the output should be
stionTo be, or not
to beThat is the que
I was going to use circshift but the given string wouldn't of a matching dimesions. Also the string i would retrieve would be from .txt file.
So here is the code i used
S = sprintf('To be, or not to be\nThat is the question')
circshift(S,5,2)
but the output is
stionTo be, or not to be
That is the que
but i need
stionTo be, or not
to beThat is the que
By storing the locations of the new lines, removing the new lines and adding them back in later we can achieve this. This code does rely on the insertAfter function which is only available in MATLAB 2016b and later.
S = sprintf('To be, or not to be\nThat is the \n question');
newline = regexp(S,'\n');
S(newline) = '';
S = circshift(S,5,2);
for ii = 1:numel(newline)
S = insertAfter(S,newline(ii)-numel(newline)+ii,'\n');
end
S = sprintf(S);
You can do this by performing a circular shift on the indices of the non-newline characters. (The code below actually skips all control characters with ASCII code < 32.)
function T = strshift(S, k)
T = S;
c = find(S >= ' '); % shift only printable characters (ascii code >= 32)
T(c) = T(circshift(c, k, 2));
end
Sample run:
>> S = sprintf('To be, or not to be\nThat is the question')
S = To be, or not to be
That is the question
>> r = strshift(S, 5)
r = stionTo be, or not
to beThat is the que
If you want to skip only the newline characters, just change to
c = find(S != 10);

Display result after validating for loop in Matlab

In the following code, I check to see if the first letter is in the dictionary of words and if the length of the word matches. If it does, return the word. Otherwise, return an error statement.
words = {'apple', 'banana', 'bee', 'salad', 'corn', 'elephant', 'pterodactyl'};
user_letter_input = input('Please enter the first letter of a word: ', 's');
user_num_input = input('Please enter how long you would like the word to be: ');
for i = words
if ((i{1}(1) == user_letter_input) && (length(i{1}) == user_num_input))
result = i;
else
result = 0;
end
end
if (result == 0)
disp('There are no matching words');
else
disp(['Your new word is: ' result]);
end
The comparison returns i being 'apple' if I type a for the first input and 5 for the second input - as it should.
However, at the end when I try to see if (result == 0), it does not display the new word, even though result is not 0.
Could someone help me fix this please?
You are overwriting result each time through your for loop. The only time that result will be 0 after the loop, is if the last word in words matches your criteria.
I would recommend storing the matching words in a separate cell array, or have a boolean array to indicate which words match. In my opinion, using a boolean is better as it takes less memory and doesn't duplicate data.
words = {'apple', 'banana', 'bee', 'salad', 'corn', 'elephant', 'pterodactyl'};
user_letter_input = input('Please enter the first letter of a word: ', 's');
user_num_input = input('Please enter how long you would like the word to be: ');
isMatch = false(size(words));
for k = 1:numel(words)
word = words{k};
isMatch(k) = word(1) == lower(user_letter_input) && ...
numel(word) == user_num_input;
end
if ~any(isMatch)
disp('There are no matching words');
else
disp(['Your matching words are:', sprintf(' %s', words{isMatch})]);
end
Also, as a side note don't use the cell array in the for loop like that. That leads to a lot of confusion. Also avoid using i as a loop variable.
You're overwriting result each time the word in your dictionary doesn't match. The only time this will work is if the last word matches. You need to change both your initialization of result and your loop:
result = 0; %// assume that no words match
for i = words
if (....
result = 1; %// we found a match... record it
end
%// no else! If we get no match, result will already be 0
end
You can use a flag to detect whether a match was found:
breakflag = 0
for i = words
if ((i{1}(1) == user_letter_input) && (length(i{1}) == user_num_input))
breakflag = 1;
break;
end
end
if (breakflag == 0)
disp('There are no matching words');
else
disp(['Your new word is: ' i]);
end

Split word and check spelling error within article

I want to check the spelling error within an aricle, I have 100 articles to check to see got spelling error of not, if got one error then word return 1 else 0. I have to split the article into words by word then only check. I have done all of these here, but the problem is i could not check the spelling error of the split word.However, I could check with
deliberate_mistake = 'tabel';
suggestion = checkSpelling(deliberate_mistake)
output:
suggestion =
'table'
checkSpelling.m file
function suggestion = checkSpelling(word)
h = actxserver('word.application');
h.Document.Add;
correct = h.CheckSpelling(word);
if correct
suggestion = []; %return empty if spelled correctly
else
%If incorrect and there are suggestions, return them in a cell array
if h.GetSpellingSuggestions(word).count > 0
count = h.GetSpellingSuggestions(word).count;
for i = 1:count
suggestion{i} = h.GetSpellingSuggestions(word).Item(i).get('name');
end
else
%If incorrect but there are no suggestions, return this:
suggestion = 'no suggestions';
end
end
%Quit Word to release the server
h.Quit
f20.m file
for i = 1:1
data2=fopen(strcat('DATA\',int2str(i),''),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
word =regexp(CharData,' ','split')
[sizeData b] = size(word);
suggestion = checkSpelling(word)
Your input is a cell array, try to give your function a single string input. Works for me.

Remove "square brackets and characters within it" from the string-- Matlab

I have a problem removing the text and special characters from the string. For eg:
str = 'Accleration [ms^{{-}2}]';
The expected output: str_out = 'Acceleration'; I tried using the function regexprep but couldn't get the result as expected.
You can try
opens = str == '[';
closes = str == ']';
nestingcount = cumsum(opens - [0 closes(1:end-1)]);
outstr = str(nestingcount == 0);
Note that trimming trailing spaces was not part of your specification, you'll have to do that as well to get your example to work right.

delimiting by a char but not deleting it

I have a text file that looks like this:
(a (bee (cold down)))
if I load it using
c=textscan(fid,'%s');
I get this:
'(a'
'(bee'
'(cold'
'down)))'
What I would like to get is:
'('
'a'
'('
'bee'
'('
'cold'
'down'
')'
')'
')'
I know I can delimit with '(' and ')' by specifying 'Delimiter' in textscan, but then I will loose this character, which I want to keep.
Thank you in Advance.
The %s specifier indicates that you want Strings, what you want is individual chars. Use %c instead .
c=textscan(fid,'%c');
Update if you want too keep your words intact then you'll want to load your text using the %s specifier. After the text is loaded you can either solve this problem with Regular Expressions (not my forte) or write your own parser then parses each word individually and saves the paranthesis and words to a new cell array.
AFAIK, there is no canned routine capable of preserving arbitrary delimiters.
You'd have to do it yourself:
string = '(a (bee (cold down)))';
bo = string == '(';
bc = string == ')';
sp = string == ' ';
output = cell(nnz(bo|bc|sp)+1,1);
j = 1;
for ii = 1:numel(string)
if bo(ii)
output{j} = '(';
j = j + 1;
elseif bc(ii)
output{j} = ')';
j = j + 1;
elseif sp(ii)
j = j + 1;
else
output{j} = [output{j} string(ii)];
end
end
Which can probably be improved -- the growing character array will prevent the loop from being JIT'ed. The array bc | bo | sp holds all the information to vectorize this thing, I just don't see how at this hour...
Nevertheless, it should give you a place to start.
Matlab has a strtok function similar to C. Its format is:
token = strtok(str)
token = strtok(str, delimiter)
[token, remain] = strtok('str', ...)
there is also a string replace function strrep:
modifiedStr = strrep(origStr, oldSubstr, newSubstr)
What I would do is modify the original string with strrep to add in delimiters, then use strtok. Since you already scanned the string into c:
c = (c,'(','( '); %Add a space after each open paren
c = (c,')',' ) '); % Add a space before and after each close paren
token = zeros(10); preallocate for speed
i = 2;
[token(1), remain] = strtok(c, ' ');
while(remain)
[token(i), remain] = strtok(c, ' ');
i =i + 1;
end
gives you the linear token array of each of the string you requested.
strtok reference: http://www.mathworks.com/help/techdoc/ref/strtok.html
strrep reference: http://www.mathworks.com/help/techdoc/ref/strrep.html