Regexp doesn't support negative lookahead? Any workarounds?

Regexp doesn't support negative lookahead? Any workarounds? - sphinx

Trying to do a version of this for regexp:
brown fox(?! that)
But I get:
re2/re2.cc:534: Invalid RE2: invalid perl operator: (?!
Is there a work around for this as the negative words are key here

Definitely a bodge, but just reverse the unneeded replacement?
regexp_filter = brown fox => blabla
regexp_filter = blabla that => brown fox that
Maybe?

Related

How to replace characters between flags in MATLAB

Suppose I have a char variable in Matlab like this:
x = 'hello my name $ is Sean $ Daley.';
I want to replace the first '$' with the symbol '&', and the second '$' with the symbol '#'.
Furthermore, if I have a more complicated char such that pairs of '$' repeat many times, I want to repeat the same pattern. So the following:
y = 'hello $ my $ name is $ Sean $ Daley $.$.';
would be transformed into:
'hello & my # name is & Sean # Daley &.#.'
I have tried coding this manually via for loops and while loops, but the code is just so ugly. Are there any simple functions that I can use?

Since you're dealing with single characters and non-nested pairs of flags, you can easily do this with a simple call to find and some indexed replacement:
y = 'hello $ my $ name is $ Sean $ Daley $.$.';
index = find(y == '$');
y(index(1:2:end)) = '&';
y(index(2:2:end)) = '#';
And the result:
y =
'hello & my # name is & Sean # Daley &.#.'

Getting IndexOutOfBounds Exception while search for a subtring

I have a string like
var word = "banana"
and a sentence like var sent = "the monkey is holding a banana which is yellow"
sent1 = "banana!!"
I want to search banana in sent and then write to a file in the following way:
the monkey is holding a
banana
which is yellow
I'm doing it in the following way:
var before = sent.substring(0, sent.indexOf(word))
var after = sent.substring(sent.indexOf(word) + word.length)
println(before)
println(after)
This works fine but when I do the same for sent1, then it gives me IndexOutOfBoundsException. I think it is because there is nothing before banana in sent1. How to deal with this?

You can split based on the word and you will get an array with everything before and after the word.
val search = sent.split(word)
search: Array[String] = Array("the monkey is holding a ", " which is yellow")
This works in the "banana!!!" case:
"banana!!".split(word)
res5: Array[String] = Array("", !!)
Now you can write the three lines to a file in your favorite way:
println(search(0))
println(word)
println(search(1))

What if you had more than one occurrence of the word? .split understands regular expressions, so you could improve the previous solution with something like this:
string
.replaceAll("\\s+(?=banana)|(?<=banana)\\s+")
.foreach(println)
\\s means a whitespace character
(?=<word>) means "followed by <word>"
(?<=<word>) means "preceded by <word>"
So, this would split your string into pieces, using any spaces either preceded or followed by the "banana", and not the word itself. The actual word ends up in the list, just like the other parts of the string, so you don't need to print it out explicitly
This regex trick is called "positive look-around" ( ?= is look-ahead, ?<= is look-behind) in case you are wondering.

Forcing gaps between words in a Marpa grammar

I'm trying to set up a grammar that requires that [\w] characters cannot appear directly adjacent to each other if they are not in the same lexeme. That is, words must be separated from each other by a space or punctuation.
Consider the following grammar:
use Marpa::R2; use Data::Dump;
my $grammar = Marpa::R2::Scanless::G->new({source => \<<'END_OF_GRAMMAR'});
:start ::= Rule
Rule ::= '9' 'september'
:discard ~ whitespace
whitespace ~ [\s]+
END_OF_GRAMMAR
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');
This parses successfully. Now I want to change the grammar to force a separation between 9 and september. I thought of doing this by introducing an unused lexeme that matches [\w]+:
use Marpa::R2; use Data::Dump;
my $grammar = Marpa::R2::Scanless::G->new({source => \<<'END_OF_GRAMMAR'});
:start ::= Rule
Rule ::= '9' 'september'
:discard ~ whitespace
whitespace ~ [\s]+
word ~ [\w]+ ### <== Add unused lexeme to match joined keywords
END_OF_GRAMMAR
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');
Unfortunately, this grammar fails with:
A lexeme is not accessible from the start symbol: word
Marpa::R2 exception at marpa.pl line 3.
Although this can be resolved by using a lexeme default statement:
use Marpa::R2; use Data::Dump;
my $grammar = Marpa::R2::Scanless::G->new({source => \<<'END_OF_GRAMMAR'});
lexeme default = action => [value] ### <== Fix exception by adding lexeme default statement
:start ::= Rule
Rule ::= '9' 'september'
:discard ~ whitespace
whitespace ~ [\s]+
word ~ [\w]+
END_OF_GRAMMAR
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\'9september');
This results in the following output:
Inaccessible symbol: word
Error in SLIF parse: No lexemes accepted at line 1, column 1
* String before error:
* The error was at line 1, column 1, and at character 0x0039 '9', ...
* here: 9september
Marpa::R2 exception at marpa.pl line 16.
That is, the parse has failed due to the fact that there is no gap between 9 and september which is exactly what I want to happen. The only fly in the ointment is that there is an annoying Inaccessible symbol: word message on STDERR because the word lexeme is not used in the actual grammar.
I see that in Marpa::R2::Grammar I could have declared word as inaccessible_ok in the constructor options but I can't do that in Marpa::R2::Scanless.
I also could have done something like the following:
Rule ::= nine september
nine ~ word
september ~ word
then used a pause to use custom code to examine the actual lexeme value and return the appropriate lexeme depending on the value.
What is the best way to construct a grammar that uses keywords or numbers and words but will disallow adjacent lexemes to be run together without white space or punctuation separating them?

Well, the obvious solution is to require some whitespace in between (on the G1 level). When we use the following grammar
:default ::= action => ::array
:start ::= Rule
Rule ::= '9' (Ws) 'september'
Ws ::= [\s]+
:discard ~ whitespace
whitespace ~ [\s]+
then 9september fails, but 9 september is parsed. Important points to note:
Lexemes can be both discarded and required, when they are both a longest token. This is why the :discard and Ws rule don't interfere with each other. Marpa doesn't mind this kind of “ambiguity”.
The Ws rule is enclosed in parens, which discards the value – to keep the resulting parse tree clean.
You do not usually want to use tricks like phantom lexemes to misguide the parser. That way lies breakage.
When every bit of whitespace is important, you might want to get rid of :discard ~ whitespace. This is meant to be used e.g. for C-like languages where whitespace traditionally does not matter.

How to match text in [...] with NSRegularExpression

what would be the NSRegularExpression to match the text between the "[" and the "]" in the following example:
"[this is some] text i would [like] to [parse] with NSRegularExpression"
Thx in advance :)

Not sure about NSRegularExpression, but I suspect it's like other regular expressions in terms of greediness. So, if there could be multiple [words] or [phrases in brackets] that you want to capture in the same string, you'll have to make sure the regex isn't greedy.
Maybe something like (broken up across a few lines for clarity:)
\[ // find an opening square bracket
([^]]+) // find one or more characters that ARE NOT a square bracket
\] // find the corresponding closing square bracket

Like a Daniel answer right pattern is \\[(.*?)\\]
Let me explain why.
\\[ is mean "open-bracket" symbol. Double slash allow to use special symbol [
All in ( ) allow to union as atomic sequence
. — any symbol without special * ? + [ ( ) { } ^ $ | \ . /!
*? — symbol entry 0 or more times. Match as few times as possible.
\\] — close-bracket

Your pattern should be \[(.*?)\]

Code Golf - Word Scrambler

Please answer with the shortest possible source code for a program that converts an arbitrary plaintext to its corresponding ciphertext, following the sample input and output I have given below. Bonus points* for the least CPU time or the least amount of memory used.
Example 1:
Plaintext: The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!
Ciphertext: eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
Example 2:
Plaintext: 123 1234 12345 123456 1234567 12345678 123456789
Ciphertext: 312 4213 53124 642135 7531246 86421357 975312468
Rules:
Punctuation is defined to be included with the word it is closest to.
The center of a word is defined to be ceiling((strlen(word)+1)/2).
Whitespace is ignored (or collapsed).
Odd words move to the right first. Even words move to the left first.
You can think of it as reading every other character backwards (starting from the end of the word), followed by the remaining characters forwards. Corporation => XoXpXrXtXoX => niaorCoprto.
Thank you to those who pointed out the inconsistency in my description. This has lead many of you down the wrong path, which I apologize for. Rule #4 should clear things up.
*Bonus points will only be awarded if Jeff Atwood decides to do so. Since I haven't checked with him, the chances are slim. Sorry.

Python, 50 characters
For input in i:
' '.join(x[::-2]+x[len(x)%2::2]for x in i.split())
Alternate version that handles its own IO:
print ' '.join(x[::-2]+x[len(x)%2::2]for x in raw_input().split())
A total of 66 characters if including whitespace. (Technically, the print could be omitted if running from a command line, since the evaluated value of the code is displayed as output by default.)
Alternate version using reduce:
' '.join(reduce(lambda x,y:y+x[::-1],x) for x in i.split())
59 characters.
Original version (both even and odd go right first) for an input in i:
' '.join(x[::2][::-1]+x[1::2]for x in i.split())
48 characters including whitespace.
Another alternate version which (while slightly longer) is slightly more efficient:
' '.join(x[len(x)%2-2::-2]+x[1::2]for x in i.split())
(53 characters)

J, 58 characters
>,&.>/({~(,~(>:#+:#i.#-#<.,+:#i.#>.)#-:)#<:##)&.><;.2,&' '

Haskell, 64 characters
unwords.map(map snd.sort.zip(zipWith(*)[0..]$cycle[-1,1])).words
Well, okay, 76 if you add in the requisite "import List".

Python - 69 chars
(including whitespace and linebreaks)
This handles all I/O.
for w in raw_input().split():
o=""
for c in w:o=c+o[::-1]
print o,

Perl, 78 characters
For input in $_. If that's not acceptable, add six characters for either $_=<>; or $_=$s; at the beginning. The newline is for readability only.
for(split){$i=length;print substr$_,$i--,1,''while$i-->0;
print"$_ ";}print $/

C, 140 characters
Nicely formatted:
main(c, v)
char **v;
{
for( ; *++v; )
{
char *e = *v + strlen(*v), *x;
for(x = e-1; x >= *v; x -= 2)
putchar(*x);
for(x = *v + (x < *v-1); x < e; x += 2)
putchar(*x);
putchar(' ');
}
}
Compressed:
main(c,v)char**v;{for(;*++v;){char*e=*v+strlen(*v),*x;for(x=e-1;x>=*v;x-=2)putchar(*x);for(x=*v+(x<*v-1);x<e;x+=2)putchar(*x);putchar(32);}}

Lua
130 char function, 147 char functioning program
Lua doesn't get enough love in code golf -- maybe because it's hard to write a short program when you have long keywords like function/end, if/then/end, etc.
First I write the function in a verbose manner with explanations, then I rewrite it as a compressed, standalone function, then I call that function on the single argument specified at the command line.
I had to format the code with <pre></pre> tags because Markdown does a horrible job of formatting Lua.
Technically you could get a smaller running program by inlining the function, but it's more modular this way :)
t = "The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!"
T = t:gsub("%S+", -- for each word in t...
function(w) -- argument: current word in t
W = "" -- initialize new Word
for i = 1,#w do -- iterate over each character in word
c = w:sub(i,i) -- extract current character
-- determine whether letter goes on right or left end
W = (#w % 2 ~= i % 2) and W .. c or c .. W
end
return W -- swap word in t with inverted Word
end)
-- code-golf unit test
assert(T == "eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos")
-- need to assign to a variable and return it,
-- because gsub returns a pair and we only want the first element
f=function(s)c=s:gsub("%S+",function(w)W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end return W end)return c end
-- 1 2 3 4 5 6 7 8 9 10 11 12 13
--34567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
-- 130 chars, compressed and written as a proper function
print(f(arg[1]))
--34567890123456
-- 16 (+1 whitespace needed) chars to make it a functioning Lua program,
-- operating on command line argument
Output:
$ lua insideout.lua 'The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!'
eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
I'm still pretty new at Lua so I'd like to see a shorter solution if there is one.
For a minimal cipher on all args to stdin, we can do 111 chars:
for _,w in ipairs(arg)do W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end io.write(W ..' ')end
But this approach does output a trailing space like some of the other solutions.

For an input in s:
f=lambda t,r="":t and f(t[1:],len(t)&1and t[0]+r or r+t[0])or r
" ".join(map(f,s.split()))
Python, 90 characters including whitespace.

TCL
125 characters
set s set f foreach l {}
$f w [gets stdin] {$s r {}
$f c [split $w {}] {$s r $c[string reverse $r]}
$s l "$l $r"}
puts $l

Bash - 133, assuming input is in $w variable
Pretty
for x in $w; do
z="";
for l in `echo $x|sed 's/\(.\)/ \1/g'`; do
if ((${#z}%2)); then
z=$z$l;
else
z=$l$z;
fi;
done;
echo -n "$z ";
done;
echo
Compressed
for x in $w;do z="";for l in `echo $x|sed 's/\(.\)/ \1/g'`;do if ((${#z}%2));then z=$z$l;else z=$l$z;fi;done;echo -n "$z ";done;echo
Ok, so it outputs a trailing space.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Regexp doesn't support negative lookahead? Any workarounds? - sphinx

Trying to do a version of this for regexp: brown fox(?! that) But I get: re2/re2.cc:534: Invalid RE2: invalid perl operator: (?! Is there a work around for this as the negative words are key here

Definitely a bodge, but just reverse the unneeded replacement? regexp_filter = brown fox => blabla regexp_filter = blabla that => brown fox that Maybe?

Related

How to replace characters between flags in MATLAB

Getting IndexOutOfBounds Exception while search for a subtring

Forcing gaps between words in a Marpa grammar

How to match text in [...] with NSRegularExpression

Code Golf - Word Scrambler

Categories

Resources