Why the 'stop' pattern doesn't work in my sed command? - sed

I was following this sed tutorial: http://www.grymoire.com/unix/Sed.html#uh-29
And write the following command to dealing with "zen of python" motto, like:
sed '/Although/,/There/ s/a/A/g' zen
Special cases aren't special enough to break the rules.
Although prActicAlity beAts purity.
Errors should never pAss silently.
Unless explicitly silenced.
In the fAce of Ambiguity, refuse the temptAtion to guess.
There should be one-- And preferAbly only one --obvious wAy to do it.
Although thAt wAy mAy not be obvious At first unless you're Dutch.
Now is better thAn never.
It seems that the start pattern works well but the stop pattern didn't work!
How to fix it?

Actually your stop pattern does work. But hey, look:
Special cases aren't special enough to break the rules.
>>> Although <<< prActicAlity beAts purity. # Start
Errors should never pAss silently.
Unless explicitly silenced.
In the fAce of Ambiguity, refuse the temptAtion to guess.
>>> There <<< should be one-- And preferAbly only one \
--obvious wAy to do it. # Stop
>>> Although <<< thAt wAy mAy not be obvious At first unless \
you're Dutch. # Start again!
Now is better thAn never.
You have another start pattern (Although word) the line after! So sed starts substituting again. That's all. Replace second Although with anything else in text, or choose Although p as start pattern, and everything works as expected.

Related

sed seems to match pattern properly only when newline inserted

I am currently running the following sed command:
sed 's/P(\(.*\))\\mid(\(.*\))/\\condprob{\1}{\2}/g' myfile.tex
Essentially, I have inherited an oddly formatted tex file, and want to replace everything like this:
P(<foo>)\mid(<bar>)
With this
\condprob{<foo>}{<bar>}
The file I am trying to run sed on contains the following line:
P(\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu})\mid(t,h,\alpha)
Which I would like to change to this:
\condprob{\vec{m}_i}{t,h,\alpha} = \prod_{u\in\mathcal{U}}\condprob{\vec{m}_{iu}}{t,h,\alpha}
However, sed keeps missing the first \mid and instead gives me this:
\condprob{\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu}}{t,h,\alpha}
If I add a line break at the = sign it matches everything fine
Can someone please a) help me resolve this, and b) perhaps tell me why it is happening?
Thanks.
Edit: thanks choroba and Sloopjon, you've both answered my why, and Sloopjon's solution is actually exactly what I was needing. choroba: I guess I will have to wait another day to learn perl.
For those that are interested Sloopjon's solution when translated into my problem looks like this (match everything that isn't a closing parenthesis):
sed 's/P(\([^)]*\))\\mid(\([^)]\))/\\condprob{\1}{\2}/g' myfile.tex
It looks like you expect P(\(.*\)) to match only P(\vec{m}_i), but the * quantifier is greedy, so it actually matches P(\vec{m}_i)\mid...P(\vec{m}_{iu}). There are two common fixes for this: use a non-greedy quantifier if your tool supports it, or change the pattern so that it only matches what you expect. For example, if you know that parentheses won't nest in this P() construct, change .* to [^)]*.
Edit: I also suggest that you look for a regex visualizer or debugger when you have a problem like this. For example, pasting your example into debuggex.com makes it clear what's happening.
The problem is the greediness of the * quantifier. It matches as many times as it can, i.e. it doesn't stop at the first ).
You can try Perl, that features "non-greedy" (frugal, lazy) *?:
perl -pe 's/P\((.*?)\)\\mid\((.*?)\)/\\condprob{$1}{$2}/g'

What's the recommended replacement for Perl's deprecated-ish given/when?

Now that the Perl devs have decided to sort-of deprecate given/when statements, is there a recommended replacement, beyond just going back to if/elsif/else?
if/elsif/else chains are the best option most of the time — except when something completely different is better than both if/elsif/else and given/when, which is actually reasonably often. Examples of "completely different" approaches are creating different types of objects to handle different scenarios, and letting method dispatch do your work for you, or finding an opportunity to make your code more data-driven. Both of those, if they're appropriate and you do them right, can greatly reduce the number of "switch statement" constructs in your code.
Just as a supplement, I've found that a combination of 'for' and if/elsif/else is good if you have some given/when/default code that needs to be quickly updated. Just replace given with for and replace the when statements with a cascade of if & elsif, and replace default with else. This allows all your tests to continue using $_ implicitly, requiring less rewriting. (But be aware that other special smart match features will not work any more.)
This is just for rewriting code that already uses given/when, though. For writing new code, #hobbs has the right answer.

General check of missing semicolon

As a Perl beginner I am sometimes getting compilation errors and have to search a lot to find it. In the end it is just a missing semicolon at the end of a line. Some syntax errors with missing semicolon are checked by Perl but not in general. Is there a way to get this check?
edit:
I know about Perl::Critic but can't use it atm. And I don't know if it checks for missing semicolon in general.
Because semicolons actually mean something in Perl and aren't just there for decoration, it's not possible for any tool (even the Perl interpreter itself) to know in every case whether you actually meant to leave off the semi-colon or not. Thus, there's no general-case answer to your question; you'll just need to go through your code and make sure it's correct.
As mentioned in my comments, there are various tricks you can try with your editor to expedite the process of finding potentially-incorrect lines; you must, however, either examine and fix these by hand or risk introducing new problems.
The syntax check is perl -c, but that's no different than attempting to run the program outright. Due to its flexible/undecidable syntax, one cannot generally do what you want. That's the downside of comfort and expressiveness.
Upgrade to the latest stable Perl, the parser's error messages got better/more exact over the last years and will correctly recognise many circumstances of a missing semicolon.
Rule of thumb that works for many parsers/other languages: if the error makes no sense, look a couple of lines before.
use diagnostics; usually gives you a nice hint, same as use warnings;. Try to keep a consistent coding style, check perlstyle.
Also you can use Perl::Critic online.
Also as general advice learn how to use packages and modules, try to group code into subs and study the syntax of arrays, lists and hashes. A common mistake is forgetting the ; after an anonymous hashref assignment:
my $hashref = { a => 5, b => 10};

Is this trivial function silly?

I came across a function today that made me stop and think. I can't think of a good reason to do it:
sub replace_string {
my $string = shift;
my $regex = shift;
my $replace = shift;
$string =~ s/$regex/$replace/gi;
return $string;
}
The only possible value I can see to this is that it gives you the ability to control the default options used with a substitution, but I don't consider that useful. My first reaction upon seeing this function get called is "what does this do?". Once I learn what it does, I am going to assume it does that from that point on. Which means if it changes, it will break any of my code that needs it to do that. This means the function will likely never change, or changing it will break lots of code.
Right now I want to track down the original programmer and beat some sense into him or her. Is this a valid desire, or am I missing some value this function brings to the table?
The problems with that function include:
Opaque: replace_string doesn't tell you that you're doing a case-insensitive, global replace without escaping.
Non-idiomatic: $string =~ s{$this}{$that}gi is something you can learn what it means once, and its not like its some weird corner feature. replace_string everyone has to learn the details of, and its going to be different for everyone who writes it.
Inflexible: Want a non-global search-and-replace? Sorry. You can put in some modifiers by passing in a qr// but that's far more advanced knowledge than the s/// its hiding.
Insecure: A user might think that the function takes a string, not a regex. If they put in unchecked user input they are opening up a potential security hole.
Slower: Just to add the final insult.
The advantages are:
Literate: The function name explains what it does without having to examine the details of the regular expression (but it gives an incomplete explanation).
Defaults: The g and i defaults are always there (but that's non-obvious from the name).
Simpler Syntax: Don't have to worry about the delimiters (not that s{}{} is difficult).
Protection From Global Side Effects: Regex matches set a salad of global variables ($1, $+, etc...) but they're automatically locally scoped to the function. They won't interfere if you're making use of them for another regex.
A little overzealous with the encapsulation.
print replace_string("some/path", "/", ":");
Yes, you get some magic in not having to replace / with a different delimiter or escape / in the regex.
If it's just a verbose replacement for s/// then I'd guess that it was written by someone who came to Perl from a language where using regular expressions required extra syntax and who is/was more comfortable coding that way. If that's the case I'd classify it as Perl baby-talk: silly and awkward to seasoned coders but not bad -- not bad enough to warrant a beating, anyway. ;)
If I squint really hard I can almost see cases where such a function might be useful: applying a bunch of patterns to a bunch of strings, allowing user input for the terms, supplying a CODE reference for a callback...
My first reaction upon seeing that is a new Perl programmer didn't want to remember the syntax for a regular expression and created a function he or she could easily remember, without learning the syntax.
The only reason I can see other than the ones mentioned already ( new programmer does not want to remember regex syntax ) is that it is possible they may be using some IDE that does not have any syntax highlighting for regex, but it does exist for functions they've written. Not the best of reasons, but plausible.

What is Perl's secret of getting small code do so much?

I've seen many (code-golf) Perl programs out there and even if I can't read them (Don't know Perl) I wonder how you can manage to get such a small bit of code to do what would take 20 lines in some other programming language.
What is the secret of Perl? Is there a special syntax that allows you to do complex tasks in few keystrokes? Is it the mix of regular expressions?
I'd like to learn how to write powerful and yet short programs like the ones you know from the code-golf challenges here. What would be the best place to start out? I don't want to learn "clean" Perl - I want to write scripts even I don't understand anymore after a week.
If there are other programming languages out there with which I can write even shorter code, please tell me.
There are a number of factors that make Perl good for code golfing:
No data typing. Values can be used interchangeably as strings and numbers.
"Diagonal" syntax. Usually referred to as TMTOWTDI (There's more than one way to do it.)
Default variables. Most functions act on $_ if no argument is specified. (A few act
on #_.)
Functions that take multiple arguments (like split) often have defaults that
let you omit some arguments or even all of them.
The "magic" readline operator, <>.
Higher order functions like map and grep
Regular expressions are integrated into the syntax (i.e. not a separate library)
Short-circuiting operators return the last value tested.
Short-circuiting operators can be used for flow control.
Additionally, without strictures (which are off be default):
You don't need to declare variables.
Barewords auto-quote to strings.
undef becomes either 0 or '' depending on context.
Now that that's out of the way, let me be very clear on one point:
Golf is a game.
It's great to aspire to the level of perl-fu that allows you to be good at it, but in the name of $DIETY do not golf real code. For one, it's a horrible waste of time. You could spend an hour trying to trim out a few characters. Golfed code is fragile: it almost always makes major assumptions and blithely ignores error checking. Real code can't afford to be so careless. Finally, your goal as a programmer should be to write clear, robust, and maintainable code. There's a saying in programming: Always write your code as if the person who will maintain it is a violent sociopath who knows where you live.
So, by all means, start golfing; but realize that it's just playing around and treat it as such.
Most people miss the point of much of Perl's syntax and default operators. Perl is largely a "DWIM" (do what I mean) language. One of it's major design goals is to "make the common things easy and the hard things possible".
As part of that, Perl designers talk about Huffman coding of the syntax and think about what people need to do instead of just giving them low-level primitives. The things that you do often should take the least amount of typing, and functions should act like the most common behavior. This saves quite a bit of work.
For instance, the split has many defaults because there are some use cases where leaving things off uses the common case. With no arguments, split breaks up $_ on whitespace because that's a very common use.
my #bits = split;
A bit less common but still frequent case is to break up $_ on something else, so there's a slightly longer version of that:
my #bits = split /:/;
And, if you wanted to be explicit about the data source, you can specify the variable too:
my #bits = split /:/, $line;
Think of this as you would normally deal with life. If you have a common task that you perform frequently, like talking to your bartender, you have a shorthand for it the covers the usual case:
The usual
If you need to do something, slightly different, you expand that a little:
The usual, but with onions
But you can always note the specifics
A dirty Bombay Sapphire martini shaken not stirred
Think about this the next time you go through a website. How many clicks does it take for you to do the common operations? Why are some websites easy to use and others not? Most of the time, the good websites require you to do the least amount of work to do the common things. Unlike my bank which requires no fewer than 13 clicks to make a credit card bill payment. It should be really easy to give them money. :)
This doesn't answer the whole question, but in regards to writing code you won't be able to read in a couple days, here's a few languages that will encourage you to write short, virtually unreadable code:
J
K
APL
Golfscript
Perl has a lot of single character special variables that provide a lot of shortcuts eg $. $_ $# $/ $1 etc. I think it's that combined with the built in regular expressions, allows you to write some very concise but unreadable code.
Perl's special variables ($_, $., $/, etc.) can often be used to make code shorter (and more obfuscated).
I'd guess that the "secret" is in providing native operations for often repeated tasks.
In the domain that perl was originally envisioned for you often have to
Take input linewise
Strip off whitespace
Rip lines into words
Associate pairs of data
...
and perl simple provided operators to do these things. The short variable names and use of defaults for many things is just gravy.
Nor was perl the first language to go this way. Many of the features of perl were stolen more-or-less intact (or often slightly improved) from sed and awk and various shells. Good for Larry.
Certainly perl wasn't the last to go this way, you'll find similar features in python and php and ruby and ... People liked the results and weren't about to give them up just to get more regular syntax.
What's Java's secret of copying a variable in only one line, without worrying about buses and memory? Answer: the code is transformed to bigger code. Same for every language ever invented.