Processing paragraph using sed - sed

I want to process a paragraph using sed. I want to extract sentence with odd number of words only. and then print the words in reverse order. e.g: input is 'Hello world. hello world again.' and the output required 'again world hello'

If this is homework, your teacher wants you to discover sed's hold pattern. If this is not homework, it is much less awkward to do this in e.g. Perl. Welcome to the world of wonderful Perl one-liners:
perl -00 -lane 'next unless #F % 2; $, = " "; print reverse #F' yourfilenamehere
This only does paragraphs. Splitting into sentences and looping over those should not be too hard to hack in.

Related

Unexpected character when running one-liner on Windows

I want to generate an output file that shows the frequency of each word inside an input file. After some search, I found that Perl is the ideal language for this problem, but I don't know this language.
After some more search, I found the following code here at stackoverflow, supposedly it provides the solution I want at great efficiency:
perl -lane '$h{$_}++ for #F; END{for $w (sort {$h{$b}<=>$h{$a} || $a cmp $b} keys %h) {print "$h{$w}\t$w"}}' file > freq
I tried running this command line using the form below:
perl -lane 'code' input.txt > output.txt
The execution halts due to an unexpected '>' (the one at '<=>'). I did some research but can't understand what is wrong.
Could some one enlight me? Thanks!
Here is the topic from where I got the code:
Elegant ways to count the frequency of words in a file
If it's relevant, my words use letters and numbers and are separated by a single white space.
You are probably using Windows. You therefore need to use doubles quotes " instead of singles quotes ' around your code:
perl -lane "$h{$_}++ for #F; END{for $w (sort {$h{$b}<=>$h{$a} || $a cmp $b} keys %h) {print qq($h{$w}\t$w)}}" file > freq
Also, note how I used qq() instead of "..." within the code, as suggested by #mob. Another option is to escape the quotes with \".

Function name inside parentheses in Perl one liner

I'm working on a Perl one liner tutorial and there are one liners like this:
ls -lAF | perl -e 'while (<>) {next if /^[dt]/; print +(split)[8] . " size: " . +(split)[4] . "\n"}'
You see the function name split has been inside parentheses. Documentation about this use of functions is hard to find on Google so I couldn't find any information on it. Could somebody explain it? Thank you.
It probably doesn't help that the use of split is defaulting everything - it's splitting $_ by spaces and returning a list of values.
The (...)[8] is called a list slice, and it filters out all but the 9th value returned by split. The preceding plus is there to prevent Perl from misparsing the brackets as being part of a function call. Which also means you don't need it on the second instance.
So print +(split)[8]; is basically a very succinct way of writing
my #results=split(/ /,$_);
print $results[8];
The example you've included is performing the split twice so it might be more efficient to do the more verbose version as you can get $results[4] from the above without any extra effort.
Or because you can put a list of indexes inside the [], you could do the split once and use printf to format the output like this
printf "%s size: %s\n", (split)[8,4];
In my opinion you should be avoiding this author's advice, both for the reasons laid out in my comments on your question, and because they don't appear to know their topic at all well.
The original "one-liner" was this
ls -lAF | perl -e 'while (<>) {next if /^[dt]/; print +(split)[8] . " size: " . +(split)[4] . "\n"}'
This could be written much more succinctly by using the -n and -a options, giving this
ls -lAF | perl -wane 'print $F[8] size: $F[4]\n" unless /^[dt]/'
Even without the "luxury" of these options you could write
ls -lAF | perl -e '/^[dt]/ or printf "%s size: %s\n", (split)[8,4] while <>'
I recommend that you go and read the Camel Book several times over the next few years. That is the best way to learn the language that I have found.
Most installations of Perl include a full set of documentation, accessible using the perldoc command.
You need to read the Slices section of perldoc perldata which makes very clear this use of slicing.

Prevent perl from printing a newline

I have this simple command:
printf TEST | perl -nle 'print lc'
Which prints:
test
​
I want:
test
...without the newline. I tried perl's printf but that removes all newlines, and I'd like to keep existing one's in place. Plus, that wouldn't work for my second example that doesn't even use print in it:
printf "BOB'S BIG BOY" | perl -ple 's/([^\s.,-]+)/\u\L$1/g'
Which prints:
Bob's Big Boy
​
...with that annoying newline as well. I'm hoping for a magical switch like --no-newline but I'm guessing it's something more involved.
EDIT: I've changed my use of echo in the examples to printf to clarify the problem. A few commenters were correct in stating that my problem wouldn't actually be fixed as it was written.
You simply have to remove the -l switch, see perldoc perlrun
-l[octnum]
enables automatic line-ending processing. It has two separate
effects. First, it automatically chomps $/ (the input record
separator) when used with -n or -p. Second, it assigns $\ (the output
record separator) to have the value of octnum so that any print
statements will have that separator added back on. If octnum is
omitted, sets $\ to the current value of $/.

Extract a specific pattern from lines with sed, awk or perl

Can I use sed if I need to extract a pattern enclosed by a specific pattern, if it exists in a line?
Suppose I have a file with the following lines :
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
In both the cases I have to scan the line for the first occurring pattern i.e ' [/ ' or '/* ' in their respective cases and store the following pattern till then exit pattern i.e ' /] 'or ' */ ' respectively .
In short , I need fear and answer .If possible , Can it be extended for multiple lines ;in the sense ,if the exit pattern occurs in a line different than the same .
Any kind of help in the form of suggestions or algorithms are welcome. Thanks in advance for the replies
use strict;
use warnings;
while (<DATA>) {
while (m#/(\*?)(.*?)\1/#g) {
print "$2\n";
}
}
__DATA__
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
As a one-liner:
perl -nlwe 'while (m#/(\*?)(.*?)\1/#g) { print $2 }' input.txt
The inner while loop will iterate between all matches with the /g modifier. The backreference \1 will make sure we only match identical open/close tags.
If you need to match blocks that extend over multiple lines, you need to slurp the input:
use strict;
use warnings;
$/ = undef;
while (<DATA>) {
while (m#/(\*?)(.*?)\1/#sg) {
print "$2\n";
}
}
__DATA__
There are many who dare not kill themselves for [/fear/] of what the neighbors will say. /* foofer */
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
foo bar /
baz
baaz / fooz
One-liner:
perl -0777 -nlwe 'while (m#/(\*?)(.*?)\1/#sg) { print $2 }' input.txt
The -0777 switch and $/ = undef will cause file slurping, meaning all of the file is read into a scalar. I also added the /s modifier to allow the wildcard . to match newlines.
Explanation for the regex: m#/(\*?)(.*?)\1/#sg
m# # a simple m//, but with # as delimiter instead of slash
/(\*?) # slash followed by optional *
(.*?) # shortest possible string of wildcard characters
\1/ # backref to optional *, followed by slash
#sg # s modifier to make . match \n, and g modifier
The "magic" here is that the backreference requires a star * only when one is found before it.
Quick and dirty way in awk
awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' input_file
Test:
$ cat file
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn't.
$ awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' file
fear
answer
Single-Line Matches
If you really want to do this in sed, you can extract your delimited patterns relatively easily as long as they are on the same line.
# Using GNU sed. Escape a whole lot more if your sed doesn't handle
# the -r flag.
sed -rn 's![^*/]*(/\*?.*/).*!\1!p' /tmp/foo
Multi-Line Matches
If you want to perform multi-line matches with sed, things get a little uglier. However, it can certainly be done.
# Multi-line matching of delimiters with GNU sed.
sed -rn ':loop
/\/[^\/]/ {
N
s![^*/]+(/\*?.*\*?/).*!\1!p
T loop
}' /tmp/foo
The trick is to look for a starting delimiter, then keep appending lines in a loop until you find the ending delimiter.
This works really well as long as you really do have an ending delimiter. Otherwise, the contents of the file will keep being appended to the pattern space until sed finds one, or until it reaches the end of the file. This may cause problems with certain versions of sed or with really, really large files where the size of the pattern space gets out of hand.
See GNU sed's Limitations and Non-limitations for more information.

How can I interpolate literal \t and \n in Perl strings? [duplicate]

This question already has answers here:
How can I manually interpolate string escapes in a Perl string?
(2 answers)
Closed 8 years ago.
Say I have an environment variable myvar:
myvar=\tapple\n
When the following command will print out this variable
perl -e 'print "$ENV{myvar}"'
I will literally have \tapple\n, however, I want those control chars to be evaluated and not escaped. How would I achieve it?
In the real world $ENV residing in substitution, but I hope the answer will cover that.
Use eval:
perl -e 'print eval qq{"$ENV{myvar}"}'
UPD: You can also use substitution with the ee switch, which is safer:
perl -e '(my $s = $ENV{myvar}) =~ s/(\\n|\\t)/"qq{$1}"/gee; print $s'
You should probably be using String::Escape.
use String::Escape qw(unbackslash);
my $var = unbackslash($ENV{'myvar'});
unbackslash unescapes any string escape sequences it finds, turning them into the characters they represent. If you want to explicitly only translate \n and \t, you'll probably have to do it yourself with a substitution as in this answer.
There's nothing particularly special about a sequence of characters that includes a \. If you want to substitute one sequence of characters for another, it's very simple to do in Perl:
my %sequences = (
'\\t' => "\t",
'\\n' => "\n",
'foo' => 'bar',
);
my $string = '\\tstring fool string\\tfoo\\n';
print "Before: [$string]\n";
$string =~ s/\Q$_/$sequences{$_}/g for ( keys %sequences );
print "After: [$string]\n";
The only trick with \ is to keep track of the times when Perl thinks it's an escape character.
Before: [\tstring fool string\tfoo\n]
After: [ string barl string bar
]
However, as darch notes, you might just be able to use String::Escape.
Note that you have to be extremely careful when you're taking values from environment variables. I'd be reluctant to use String::Escape since it might process quite a bit more than you are willing to translate. The safe way is to only expand the particular values you explicitly want to allow. See my "Secure Programming Techniques" chapter in Mastering Perl where I talk about this, along with the taint checking you might want to use in this case.