Which regexp is faster - pcre

I am using PCRE|^/foo/(.*?)(?::(?:bar)?)?$| or |^/foo/(.*?)(?::bar)?:?$| this will be a replace so we want to strip : and :bar from the end while doing the replacement. I know the two are not exactly the same but it does not matter much here.

I would use the first one as it only has to check for : once. The second one could match the first three characters of :bat before having to backtrack, then check for : again. Also, the second one could match :bar: whereas the first one can't. The actual speed difference would be tiny. The second way would be better written as /^\/foo\/(.*?)(?::bar|:)?$/
Try not to use regex metacharacters as delimiters!

Related

Collision with two lines of code makes code does not work the way it is meant by me(?)

Try running the following code yourself, and you would notice that "/hello" changes to "/HELLO", but I want it to change it to "hi". On the other hand, I want to keep the 1.st line of code, which changes "hello" to "HELLO". How could I achieve this(?)
This code problem is very related to my last problem:
Collision with two lines of code make code does not work the way it is meant by me, what could I do different to get this work(?)
The soltuion for my last problem was good for that problem, but it is not working for the new above mentioned problem.
::hello::HELLO
::/hello::hi
That is interesting. I really expected it to work by removing / from the EndChars. But after looking at it for a while, it becomes obvious why it's behaving this way. When you type "/hello" it actually matches to both hotstrings, so AHK chooses the first one defined. Anyway, there are two solutions that I know of:
Reorder your hotstrings. Place ::/hello::hi above the other one and you'll always get the desired result. Additionally, you don't need to change the EndChars since / is the first character.
Use the asterisk option on the second hotstring. This will make it update immediately, which may or may not be desirable (I prefer it).

Subscript multiple characters in Julia variable name?

I can write:
x\_m<TAB> = 5
to get x subscript m as a variable name in Julia. What if I want to subscript a word instead of a single character? This
x\_max<TAB> = 5
doesn't work. However,
x\_m<TAB>\_a<TAB>\_x<TAB> = 5
does work, it's just very uncomfortable. Is there a better way?
As I noted in my comment, not all ASCII characters exist as unicode super- or sub-scripts. In addition, another difficulty in generalizing this tab completion will be determining what \_phi<TAB> should mean: is it ₚₕᵢ or ᵩ? Finally, I'll note that since these characters are cobbled together from different ranges for different uses they look pretty terrible when used together.
A simple hack to support common words you use would be to add them piecemeal to the Base.REPLCompletions.latex_symbols dictionary:
Base.REPLCompletions.latex_symbols["\\_max"] = "ₘₐₓ"
Base.REPLCompletions.latex_symbols["\\_min"] = "ₘᵢₙ"
You can put these additions in your .juliarc.jl file to load them every time on startup. While it may be possible to get a comprehensive solution, it'll take much more work.
Since Julia 1.6 this works for subscripts (\_) and superscripts(\^) in the Julia REPL.
x\_maxTAB will print out like this: xₘₐₓ.
x\^maxTAB will print out like this: xᵐᵃˣ.

sed seems to match pattern properly only when newline inserted

I am currently running the following sed command:
sed 's/P(\(.*\))\\mid(\(.*\))/\\condprob{\1}{\2}/g' myfile.tex
Essentially, I have inherited an oddly formatted tex file, and want to replace everything like this:
P(<foo>)\mid(<bar>)
With this
\condprob{<foo>}{<bar>}
The file I am trying to run sed on contains the following line:
P(\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu})\mid(t,h,\alpha)
Which I would like to change to this:
\condprob{\vec{m}_i}{t,h,\alpha} = \prod_{u\in\mathcal{U}}\condprob{\vec{m}_{iu}}{t,h,\alpha}
However, sed keeps missing the first \mid and instead gives me this:
\condprob{\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu}}{t,h,\alpha}
If I add a line break at the = sign it matches everything fine
Can someone please a) help me resolve this, and b) perhaps tell me why it is happening?
Thanks.
Edit: thanks choroba and Sloopjon, you've both answered my why, and Sloopjon's solution is actually exactly what I was needing. choroba: I guess I will have to wait another day to learn perl.
For those that are interested Sloopjon's solution when translated into my problem looks like this (match everything that isn't a closing parenthesis):
sed 's/P(\([^)]*\))\\mid(\([^)]\))/\\condprob{\1}{\2}/g' myfile.tex
It looks like you expect P(\(.*\)) to match only P(\vec{m}_i), but the * quantifier is greedy, so it actually matches P(\vec{m}_i)\mid...P(\vec{m}_{iu}). There are two common fixes for this: use a non-greedy quantifier if your tool supports it, or change the pattern so that it only matches what you expect. For example, if you know that parentheses won't nest in this P() construct, change .* to [^)]*.
Edit: I also suggest that you look for a regex visualizer or debugger when you have a problem like this. For example, pasting your example into debuggex.com makes it clear what's happening.
The problem is the greediness of the * quantifier. It matches as many times as it can, i.e. it doesn't stop at the first ).
You can try Perl, that features "non-greedy" (frugal, lazy) *?:
perl -pe 's/P\((.*?)\)\\mid\((.*?)\)/\\condprob{$1}{$2}/g'

CLUTO doc2mat specified stop word list not working

I am trying to convert my documents into vector-space format using doc2mat
On the website, it says I can use my specified text file where words are white-space separated or on multiple lines. So, I use some code similar to this one:
./doc2mat -mystoplist=stopword.txt -skipnumeric mydocuments.txt myvectorspace.txt
However, when I check the output .clabel file, it still has stop words that's in stopword.txt.
I really do not know how to do this. Someone help me out please? Thank you!
There's one important thing I should remember: I should include ALL the unwanted words in my stop list. This is somewhat difficult since there's always some variations available...
For example, if I want to exclude method I add it to my list. However, the resulting vocabulary may also contain method since there are words like methodist, methods, etc. Then doc2mat by default stems these words and I will still get method in the output.
Another thing is to make sure that "-nostop" option must be provided for user-specified stop list.

Using text as condition in a while loop

Im having some trouble with using text as an condition for a while loop, currently the basic coding is:
result=struct('val','yes');
while result.val=='yes'
result.val=input('more digits?');
end
So as you see, what Im trying to do is keeping the loop going as long as the user types in 'yes'. But thats one of the probelmes I am having; Is there a way to get rid of the need to write the ''(e.g yes instead of 'yes')? Secondly, when I run the code it gives me the error message "Error using == ,Matrix dimensions must agree.". I realise this have to do with the word yes being longer than no, but I don't know how to fix it. It's not really an issue though considering its the the program ends anyway, but it is an annoyance I would like to get rid off.
To compare strings, use strcmp, or strcmpi to ignore case. It will handle comparison of different length strings. For example:
strcmpi(result.val,'yes')
If you want to search for a substring, such as just a 'y', at the beginning of the input, consider strncmpi (strncmpi(result.val,'y',1)) or just check the first character (result.val(1)).