Perl one-liner to extract groups of characters - perl

I am trying to extract a group of characters with a Perl one-liner, but I have been unsuccessful:
echo "hello_95_.txt" | perl -ne 's/.*([0-9]+).*/\1/'
Returns nothing, while I would like it to return 95. How can I do this with Perl?
Update:
Note that, in contrast to the suggested duplicate, I am interested in how to do this from the command-line. Surely this looks like a subtle difference, but it's not straightforward unless you already know how to effectively use Perl one-liners.
Since people are asking, eventually I want to learn to use Perl to write powerful one-liners, but most immediately I need a one-liner to extract consecutive digits from each line in a large text file.

perl -pe's/\D*(\d+).*/$1/'
or
perl -nE'/\d+/&&say$&'
or
perl -nE'say/(\d+)/'
or
perl -ple's/\D//g'
or may be
perl -nE'$,=" ";say/\d+/g'

Well first, you need to use the -p rather than the -n switch.
And you need to amend your regular expression, as in:
echo "hello_95_.txt" | perl -pe "s/^.*?([0-9]+).*$/\1/"
which looks for the longest non-greedy string of chars, followed by one or more digits, followed by any number of chars to the end of the line.
Note that while '\1' is acceptable as a back-reference and is more familiar to SED/AWK users, '$1' is the more up-to-date form. So, you might wish to use:
echo "hello_95_.txt" | perl -pe "s/^.*?([0-9]+).*$/$1/"
instead.

Related

How to modify the matched pattern

Just wondering if there is a handy way to modify matched pattern variable in Perl one liner. For instance in the string abcdef I'd like to replace def with e (output abce) using a command looking like this :
echo "abcdef" | perl -pne 's/(def)/{command that trims first and last character of $1 and returns it as a string for perl to use it as a replacement}/'
It would be easy to use such functionality to perform various formating tasks. Can we do this in sed ?
This is easy in Perl with the /e flag:
echo 'abcdef' | perl -pe 's/(def)/substr $1, 1, -1/e'
e tells perl to parse the replacement part as a block of code, not a string. You can put arbitrary code in there.
But your concrete task (trimming the first and last character) can also be done like this:
echo 'abcdef' | perl -pe 's/d(e)f/$1/'
(Also, perl -p already implies -n. No need to specify both.)

What is the purpose of filtering a log file using this Perl one-liner before displaying it in the terminal?

I came across this script which was not written by me, but because of an issue I need to know what it does.
What is the purpose of filtering the log file using this Perl one-liner?
cat log.txt | perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)/ /g'
The log.txt file contains the output of a series of commands. I do not understand what is being filtered here, and why it might be useful.
It looks like the code should remove ANSI escape codes from the input, i.e codes to set colors, window title .... Since some of these code might cause harm it might be a security measure in case some kind of attack was able to include such escape codes into the log file. Since usually a log file does not contain any such escape codes this would also explain why you don't see any effect of this statement for normal log files.
For more information about this kind of attack see A Blast From the Past: Executing Code in Terminal Emulators via Escape Sequences.
BTW, while your question looks bad on the first view it is actually not. But you might try to improve questions by at least formatting it properly. Otherwise you risk that this questions gets down-voted fast.
First, the command line suffers from a useless use of cat. perl is fully capable of reading from a file name on the command line.
So,
$ perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)/ /g' log.txt
would have done the same thing, but avoided spawning an extra process.
Now, -e is followed by a script for perl to execute. In this case, we have a single global substitution.
\e in a Perl regex pattern corresponds to the escape character, x1b.
The pattern following \e looks like the author wants to match ANSI escape sequences.
The -p option essentially wraps the script specified with -e in while loop, so the s/// is executed for each line of the input.
The pattern probably does the job for this simple purpose, but one might benefit from using Regexp::Common::ANSIescape as in:
$ perl -MRegexp::Common::ANSIescape=ANSIescape,no_defaults -pe 's/$RE{ANSIescape}/ /g' log.txt
Of course, if one uses a script like this very often, one might want to either use an alias, or even write a very short script that does this, as in:
#!/usr/bin/env perl
use strict;
use Regexp::Common 'ANSIescape', 'no_defaults';
while (<>) {
s/$RE{ANSIescape}/ /g;
print;
}

sed in perl script

I am using following kind of script in my perl script and expecting entry at line 1. I am getting some error as below; any help?
plz ignore perl variable....
Error Messaage -
sed: -e expression #1, char 22: extra characters after command
# Empty file will not work for Sed line number 1
`"Security Concerns Report" > $outputFile`;
`sed '1i\
\
DATE :- $CDate \
Utility accounts with super access:- $LNumOfSupUserUtil \
Users not found in LDAP with super access: - $LNumOfSupUserNonLdap\
' $outputFile > $$`;
`mv $$ $outputFile`;
}
Your immediate problem is that the backslash character is interpreted by Perl inside the backtick operator, as is the dollar character. So your backslash-newline sequence turns into a newline in the shell command that is executed. If you replace these backslashes by \\, you'll go over this hurdle, but you'll still have a very brittle program.
Perl is calling a shell which calls sed. This requires an extra level of quoting for the shell which you are not performing. If your file names and data contain no special characters, you may be able to get away with this, until someone uses a date format containing a ' (among many things that would break your code).
Rather than fix this, it is a lot simpler to do everything in Perl. Everything sed and shells can do, Perl can do almost as easily or easier. It's not very clear from your question what you're trying to do. I'll focus on the sed call, but this may not be the best way to write your program.
If you really need to prepend some text to an existing file, there's a widely-used module on CPAN that already does this well. Use existing libraries in preference to reinventing the wheel. File::Slurp has a prepend_file method just for that. In the code below I use a here-document operator for the multiline string.
use File::Slurp; # at the top of the script with the other use directives
File::Slurp->prepend_file($outputFile, <<EOF);
DATE :- $CDate
Utility accounts with super access:- $LNumOfSupUserUtil
Users not found in LDAP with super access: - $LNumOfSupUserNonLdap
EOF

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?
This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'
#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters
In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.
The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file
Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).

How do I count the number of rows in a large CSV file with Perl?

I have to use Perl on a Windows environment at work, and I need to be able to find out the number of rows that a large csv file contains (about 1.4Gb).
Any idea how to do this with minimum waste of resources?
Thanks
PS This must be done within the Perl script and we're not allowed to install any new modules onto the system.
Do you mean lines or rows? A cell may contain line breaks which would add lines to the file, but not rows. If you are guaranteed that no cells contain new lines, then just use the technique in the Perl FAQ. Otherwise, you will need a proper CSV parser like Text::xSV.
Yes, don't use perl.
Instead use the simple utility for counting lines; wc.exe
It's part of a suite of windows utilities ported from unix originals.
http://unxutils.sourceforge.net/
For example;
PS D:\> wc test.pl
12 26 271 test.pl
PS D:\>
Where 12 == number of lines, 26 == number of words, 271 == number of characters.
If you really have to use perl;
D:\>perl -lne "END{print $.;}" < test.pl
12
perl -lne "END { print $. }" myfile.csv
This only reads one line at a time, so it doesn't waste any memory unless each line is enormously long.
This one-liner handles new lines within the rows:
Considering lines with an odd number of quotes.
Considering that doubled quotes is a way of indicating quotes within the field.
It uses the awesome flip-flop operator.
perl -ne 'BEGIN{$re=qr/^[^"]*(?:"[^"]*"[^"]*)*?"[^"]*$/;}END{print"Count: $t\n";}$t++ unless /$re/../$re/'
Consider:
wc is not going to work. It's awesome for counting lines, but not CSV rows
You should install--or fight to install--Text::CSV or some similar standard package for proper handling.
This may get you there, nonetheless.
EDIT: It slipped my mind that this was windows:
perl -ne "BEGIN{$re=qr/^[^\"]*(?:\"[^\"]*\"[^\"]*)*?\"[^\"]*$/;}END{print qq/Count: $t\n/;};$t++ unless $pq and $pq = /$re/../$re/;"
The weird thing is that The Broken OS' shell interprets && as the OS conditional exec and I couldn't do anything to change its mind!! If I escaped it, it would just pass it that way to perl.
Upvote for edg's answer, another option is to install cygwin to get wc and a bunch of other handy utilities on Windows.
I was being idiotic, the simple way to do it in the script is:
open $extract, "<${extractFileName}" or die ("Cannot read row count of $extractFileName");
$rowCount=0;
while (<$extract>)
{
$rowCount=$rowCount+1;
}
close($extract);