I have a problem removing the text and special characters from the string. For eg:
str = 'Accleration [ms^{{-}2}]';
The expected output: str_out = 'Acceleration'; I tried using the function regexprep but couldn't get the result as expected.
You can try
opens = str == '[';
closes = str == ']';
nestingcount = cumsum(opens - [0 closes(1:end-1)]);
outstr = str(nestingcount == 0);
Note that trimming trailing spaces was not part of your specification, you'll have to do that as well to get your example to work right.
Related
Ok so I have retrieved this string from the text file now I am supposed to shift it by a specified amount. so for example, if the string I retrieved was
To be, or not to be
That is the question
and the shift number was 5 then the output should be
stionTo be, or not
to beThat is the que
I was going to use circshift but the given string wouldn't of a matching dimesions. Also the string i would retrieve would be from .txt file.
So here is the code i used
S = sprintf('To be, or not to be\nThat is the question')
circshift(S,5,2)
but the output is
stionTo be, or not to be
That is the que
but i need
stionTo be, or not
to beThat is the que
By storing the locations of the new lines, removing the new lines and adding them back in later we can achieve this. This code does rely on the insertAfter function which is only available in MATLAB 2016b and later.
S = sprintf('To be, or not to be\nThat is the \n question');
newline = regexp(S,'\n');
S(newline) = '';
S = circshift(S,5,2);
for ii = 1:numel(newline)
S = insertAfter(S,newline(ii)-numel(newline)+ii,'\n');
end
S = sprintf(S);
You can do this by performing a circular shift on the indices of the non-newline characters. (The code below actually skips all control characters with ASCII code < 32.)
function T = strshift(S, k)
T = S;
c = find(S >= ' '); % shift only printable characters (ascii code >= 32)
T(c) = T(circshift(c, k, 2));
end
Sample run:
>> S = sprintf('To be, or not to be\nThat is the question')
S = To be, or not to be
That is the question
>> r = strshift(S, 5)
r = stionTo be, or not
to beThat is the que
If you want to skip only the newline characters, just change to
c = find(S != 10);
I know a simular question has been asked before here:
How to concatenate a number and a string in auto hotkey
But this question only takes numbers into account. My problem is a bit different. For example:
myStr = A literal string
myInt = 5
Now I wish to concatenate both into a new string: 5A literal string
This is what I've tried so far:
newStr = %myInt%%myStr% ;Result: Error illegal character found
newStr = % myInt myStr ;Result: Some number
convertString = (%myInt% . String)
newStr = %convertString%%myStr% ;Result: Error illegal character found
It seems like no matter what I try, AHK just can't handle concatenating an integer with a textstring. Has anyone experience with this and know a way to get it working?
EDIT
I should add that I can't solve the issue by doing myInt = "5" as I need to operate on the integer in a loop with myInt++. Also a second question that I haven't figured out yet is: How to add unicode to the string? I thought it was U+0003, but that doesn't seem to work.
EDIT 2
It seems someone else isn't getting the same results. I've updated AHK but the problems remains. So I'll be including my exact code here, perhaps I'm doing something wrong?
global OriText ;Contains textstring
global NewText ;Empty
global ColorNumber
ColorNumber = 2
convert_text(){
StringSplit, char_array, OriText
Loop, %char_array0%
{
thisChar := char_array%a_index%
NewText += % ColorNumber thisChar
MsgBox, %NewText%
ColorNumber++
if (ColorNumber = 13){
ColorNumber = 2
}
}
GuiControl,, NewText, %NewText%
ColorNumber = 2
}
Short explanation: I'm building a little tool that will automaticly colorize text in irc adding a different color to each character. Therefor splitting the string into an array and trying to add:
U:0003ColorNumberCharacter
Where U:0003 should be the unicode for the character used in mIRC with (Ctrl+K).
You used
NewText += % ColorNumber thisChar
+ is used for adding up numbers. But the operator for concatenating strings is . in AutoHotkey. Note that this all varies from language to language. So it should be:
NewText .= ColorNumber . thisChar
which is the same as
NewText := NewText . ColorNumber . thisChar
And whenever you use the := operator, there is no need for any % in simple assignments - only when assigning in two steps, e.g., with arrays, like you did correctly with thisChar.
Another way to express the allocation above, with the plain = operator would be
NewText = %NewText%%ColorNumber%%thisChar%
which you figured out yourself already.
It turned out I was simply using the wrong operator. The correct code was:
NewText = %NewText%%ColorNumber%%thisChar%
eliminate punctuation
words split when meeting new line and space, then store in array
check the text file got error or not with the function of checkSpelling.m file
sum up the total number of error in that article
no suggestion is assumed to be no error, then return -1
sum of error>20, return 1
sum of error<=20, return -1
I would like to check spelling error of certain paragraph, I face the problem to get rid of the punctuation. It may have problem to the other reason, it return me the error as below:
My data2 file is :
checkSpelling.m
function suggestion = checkSpelling(word)
h = actxserver('word.application');
h.Document.Add;
correct = h.CheckSpelling(word);
if correct
suggestion = []; %return empty if spelled correctly
else
%If incorrect and there are suggestions, return them in a cell array
if h.GetSpellingSuggestions(word).count > 0
count = h.GetSpellingSuggestions(word).count;
for i = 1:count
suggestion{i} = h.GetSpellingSuggestions(word).Item(i).get('name');
end
else
%If incorrect but there are no suggestions, return this:
suggestion = 'no suggestion';
end
end
%Quit Word to release the server
h.Quit
f19.m
for i = 1:1
data2=fopen(strcat('DATA\PRE-PROCESS_DATA\F19\',int2str(i),'.txt'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
word_punctuation=regexprep(CharData,'[`~!##$%^&*()-_=+[{]}\|;:\''<,>.?/','')
word_newLine = regexp(word_punctuation, '\n', 'split')
word = regexp(word_newLine, ' ', 'split')
[sizeData b] = size(word)
suggestion = cellfun(#checkSpelling, word, 'UniformOutput', 0)
A19(i)=sum(~cellfun(#isempty,suggestion))
feature19(A19(i)>=20)=1
feature19(A19(i)<20)=-1
end
Substitute your regexprep call to
word_punctuation=regexprep(CharData,'\W','\n');
Here \W finds all non-alphanumeric characters (inclulding spaces) that get substituted with the newline.
Then
word = regexp(word_punctuation, '\n', 'split');
As you can see you don't need to split by space (see above). But you can remove the empty cells:
word(cellfun(#isempty,word)) = [];
Everything worked for me. However I have to say that you checkSpelling function is very slow. At every call it has to create an ActiveX server object, add new document, and delete the object after check is done. Consider rewriting the function to accept cell array of strings.
UPDATE
The only problem I see is removing the quote ' character (I'm, don't, etc). You can temporary substitute them with underscore (yes, it's considered alphanumeric) or any sequence of unused characters. Or you can use list of all non-alphanumeric characters to be remove in square brackets instead of \W.
UPDATE 2
Another solution to the 1st UPDATE:
word_punctuation=regexprep(CharData,'[^A-Za-z0-9''_]','\n');
I have a text file that looks like this:
(a (bee (cold down)))
if I load it using
c=textscan(fid,'%s');
I get this:
'(a'
'(bee'
'(cold'
'down)))'
What I would like to get is:
'('
'a'
'('
'bee'
'('
'cold'
'down'
')'
')'
')'
I know I can delimit with '(' and ')' by specifying 'Delimiter' in textscan, but then I will loose this character, which I want to keep.
Thank you in Advance.
The %s specifier indicates that you want Strings, what you want is individual chars. Use %c instead .
c=textscan(fid,'%c');
Update if you want too keep your words intact then you'll want to load your text using the %s specifier. After the text is loaded you can either solve this problem with Regular Expressions (not my forte) or write your own parser then parses each word individually and saves the paranthesis and words to a new cell array.
AFAIK, there is no canned routine capable of preserving arbitrary delimiters.
You'd have to do it yourself:
string = '(a (bee (cold down)))';
bo = string == '(';
bc = string == ')';
sp = string == ' ';
output = cell(nnz(bo|bc|sp)+1,1);
j = 1;
for ii = 1:numel(string)
if bo(ii)
output{j} = '(';
j = j + 1;
elseif bc(ii)
output{j} = ')';
j = j + 1;
elseif sp(ii)
j = j + 1;
else
output{j} = [output{j} string(ii)];
end
end
Which can probably be improved -- the growing character array will prevent the loop from being JIT'ed. The array bc | bo | sp holds all the information to vectorize this thing, I just don't see how at this hour...
Nevertheless, it should give you a place to start.
Matlab has a strtok function similar to C. Its format is:
token = strtok(str)
token = strtok(str, delimiter)
[token, remain] = strtok('str', ...)
there is also a string replace function strrep:
modifiedStr = strrep(origStr, oldSubstr, newSubstr)
What I would do is modify the original string with strrep to add in delimiters, then use strtok. Since you already scanned the string into c:
c = (c,'(','( '); %Add a space after each open paren
c = (c,')',' ) '); % Add a space before and after each close paren
token = zeros(10); preallocate for speed
i = 2;
[token(1), remain] = strtok(c, ' ');
while(remain)
[token(i), remain] = strtok(c, ' ');
i =i + 1;
end
gives you the linear token array of each of the string you requested.
strtok reference: http://www.mathworks.com/help/techdoc/ref/strtok.html
strrep reference: http://www.mathworks.com/help/techdoc/ref/strrep.html
I have the following string in MATLAB, for example
##%%F1_USA(40)_u
and I want
F1_USA_40__u
Does it has any function for this?
Your best bet is probably regexprep which allows you to replace parts of a string using regular expressions:
s_new = regexprep(regexprep(s, '[()]', '_'), '[^A-Za-z0-9_]', '')
Update: based on your updated comment, this is probably what you want:
s_new = regexprep(regexprep(s, '^[^A-Za-z0-9_]*', ''), '[^A-Za-z0-9_]', '')
or:
s_new = regexprep(regexprep(s, '[^A-Za-z0-9_]', '_'), '^_*', '')
One way to do this is to use the function ISSTRPROP to find the indices of alphanumeric characters and replace or remove the others accordingly:
>> str = '##%%F1_USA(40)_u'; %# Sample string
>> index = isstrprop(str,'alphanum'); %# Find indices of alphanumeric characters
>> str(~index) = '_'; %# Set non-alphanumeric characters to '_'
>> str = str(find(index,1):end) %# Remove any leading '_'
str =
F1_USA_40__u %# Result
If you want to use regular expressions (which can get a little more complicated) then the last suggestion from Tamas will work. However, it can be greatly simplified to the following:
str = regexprep(str,{'\W','^_*'},{'_',''});