translate pcregrep into Perl one-liner - perl

I need to find all active network interfaces on new macOS. That means the following one-liner with pcregrep will not work:
ifconfig | pcregrep -M -o '^[^\t:]+(?=:([^\n]|\n\t)*status: active)'
because pcregrep is no default install on macOS.
I tried to translate it into egrep to no avail, because a positive lookahead is not possible, right?
So I tried with a one-liner in perl. But the following command is not working, because the switch -pe is not gobbling up all lines together. I tried with -p0e too.
ifconfig | perl -pe 'while (<>) {if (/^[^\t:]+(?=:([^\n]|\n\t)*status: active)/){print "$1";};}'
If I search with a positive lookahead the same line, it is working; for example:
ifconfig | perl -pe 'while (<>) {if (/^([^\t:]+)(?=:([^\n]|\n\t)*mtu 1380)/){print "$1";};}'
utun0
A typical output of ifconfig:
en10: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
options=6467<RXCSUM,TXCSUM,VLAN_MTU,TSO4,TSO6,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
ether 00:e0:4c:68:01:20
inet6 fe80::1470:31b9:a01c:6f5e%en10 prefixlen 64 secured scopeid 0xd
inet 192.168.178.39 netmask 0xffffff00 broadcast 192.168.178.255
inet6 2003:ee:4f1a:ce00:864:f90c:9a11:6ad9 prefixlen 64 autoconf secured
inet6 2003:ee:4f1a:ce00:d89a:7e34:6dd4:1370 prefixlen 64 autoconf temporary
nd6 options=201<PERFORMNUD,DAD>
media: autoselect (1000baseT <full-duplex>)
status: active
The expected result would be:
en10
I am on macOS Monterey, zsh and perl 5.34
Thank you for your help
marek

Since output of ifconfig normally† has a multiline block of text for each interface, all separated by blank lines, it is convenient to read it in paragraphs (-00). Then the rest simplifies a lot
ifconfig -a | perl -00 -nE'say $1 if /^(.+?)\s*:.*?status:\s+active/s'
We still need the /s modifier, making . match a newline as well, as each paragraph itself is a multiline string and the pattern needs to match across multiple lines.
† Except that it doesn't on the MacOS used for this question -- there are no blank lines separating blocks for interfaces. Then there is no point in seeking paragraphs (breaking on newline which isn't) and this answer doesn't work for that system.
Here is then a classic line-by-line approach that does -- set the interface name at the first line for that interface's output (no spaces at line beginning), then test for active status
perl -wnE'$ifn=$1, next if /^(\S[^:]+?)\s*:/; say $ifn if /status:\s+active/' file
This allows spaces inside an interface name, what is very unlikely (and perhaps not even allowed). For a more restrictive pattern, which doesn't allow spaces in the name, use /^(\S+?)\s*:/ (or the more efficient /^([^:\s]+)/). The \s* and the preceding ? are there only to make it not capture the trailing spaces (right before :), if any were possible.
This works in the case when there are empty lines between interface blocks as well.

You can use
perl -0777 -nE 'say "$&" while /^[^\n\r\t:]+(?=:(?:.*\R\t)*status:\h+active)/gm'
See the regex test.
Here, -0777 slurps the file so that a regex could match multiline text spans (the M equivalent that exposes the newlines to the pattern in pcregrep), say "$&" prints all matched substrings (the o equivalent, also see g flag).
I edited the [^\t:]+ to match any one or more chars other than tabs, colons and also CR/LF chars. Also, I replaced ([^\n]|\n\t)* into a more efficient (?:.*\R\t)* that matches zero or more occurrences of any zero or more chars other than line break chars till the end of a line (.*), then a line break sequence (\R), and then a tab char (\t).
Also, note the m flag to make ^ anchor also match any line start position.

perl's -n and -p command-line switches add an implicit while (<>) {...} block around the -e code, and in addition -p prints the line at the end of each iteration. So you need to change the -p to -n and only print out the lines which match; and remove the extra and unneeded while loop. So something like
ifconfig | perl -ne 'print if /...../'

Related

How to make regex works with perl command and extract numbers from a file?

I'm trying to extract from a tab delimited file a number that i need to store in a variable. I'm approaching the problem with a regex that thanks to some research online I have been able to built.
The file is composed as follow:
0 0 2500 5000
1 5000 7500 10000
2 10000 12500 15000
3 15000 17500 20000
4 20000 22500 25000
5 25000 27500 30000
I need to extract the number in the second column given a number of the first one. I wrote and tested online the regex:
(?<=5\t).*?(?=\t)
I need the 25000 from the sixth line.
I started working with sed but as you already know, it doesn't like lookbehind and lookahead pattern even with the -E option to enable extended version of regular expressions. I tried also with awk and grep and failed for similar reasons.
Going further I found that perl could be the right command but I'm not able to make it work properly. I'm trying with the command
perl -pe '/(?<=5\t).*?(?=\t)/' | INFO.out
but I admit my poor knowledge and I'm a bit lost.
The next step would be to read the "5" in the regex from a variable so if you already know problems that could rise, please let me know.
No need for lookbehinds -- split each line on space and check whether the first field is 5.
In Perl there is a command-line option convenient for this, -a, with which each line gets split for us and we get #F array with fields
perl -lanE'say $F[1] if $F[0] == 5' data.txt
Note that this tests for 5 numerically (==)
grep supports -P for perl regex, and -o for only-matching, so this works with a lookbehind:
grep -Po '(?<=5\t)\d+' file
That can use a shell variable pretty easily:
VAR=5 && grep -Po "(?<=$VAR\t)\d+"
Or perl -n, to show using s///e to match and print capture group:
perl -lne 's/^5\t(\d+)/print $1/e' file
Why do you need to use a regex? If all you are doing is finding lines starting with a 5 and getting the second column you could use sed and cut, e.g.:
<infile sed -n '/^5\t/p' | cut -f2
Output:
25000
One option is to use sed, match 5 at the start of the string and after the tab capture the digits in a group
sed -En 's/^5\t([[:digit:]]+)\t.*/\1/p' file > INFO.out
The file INFO.out contains:
25000
Using sed
$ var1=$(sed -n 's/^5[^0-9]*\([^ ]*\).*/\1/p' input_file)
$ echo "$var1"
25000

sed or awk to change a specific number in a file on RHEL7

I need help figuring out the syntax or what command to use to find an replace a specific number in a file.
I need to replace the number 10 with 25 in a configuration file. I have tried the following:
sed 's/10/25/g' /etc/security/limits.conf
This changes other instances that contain 10 such as 1000 and 10000 to 2500 and 25000, I need to juct change the need to just change 10 to 25. Please help.
Thank you,
Joseph
The trick here is to limit the sed substitution to the line you want to change. For limits.conf you are best off matching the domain, type and item. So if you wanted to just change a limit for domain #student, type hard, item nproc, you'd use something like
sed '/#student.*hard.*nproc/s/10/25/g' /etc/security/limits.conf
sed -ri '/^#/!s/(^.*)([[:space:]]10$)/\1 25/' /etc/security/limits.conf
With regular expression interpretation enabled (-r or -E), process all lines that don't start with a # by using ! We then split the lines into two sections, and replace the line for the first section followed by a space and 25. The $ ensure that the entry to replace is anchored at the end of the line.
Awk is another option:
awk -i 'NF==4 && $4==10 { gsub("10","25",$4) }1' /etc/security/limits.conf
Check if the line has 4 space delimited fields (NF==4) and the 4th field ($4) is 10. If this condition is met, replace 10 with 25 using gsub and print all lines with 1
The -i is an inplace amend flag on more recent versions of awk. If a compliant version is not available, use:
awk 'NF==4 && $4==10 { gsub("10","25",$4) }1' /etc/security/limits.conf > /etc/security/limits.tmp && mv -f /etc/security/limits.tmp /etc/security/limits.conf
Use this Perl one-liner, where \b stands for word break (so that 10 will not match 210 or 102):
perl -pe 's/\b10\b/25/g' in_file > out_file
Or to change the file in-place:
perl -i.bak -pe 's/\b10\b/25/g' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
The regex uses modifier /g : Match the pattern repeatedly.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

I want to replace last / by ,

I want to replace this:
a/b/c|d,385|386|387|388|389|390|391|392|393|394|395|396|397|398|399|400/0.162,214|229|254|255|270|272|276|287|346|356|361|362|365|366|367|369/0.18,improve/11.11,
With:
a/b/c|d,385|386|387|388|389|390|391|392|393|394|395|396|397|398|399|400/0.162,214|229|254|255|270|272|276|287|346|356|361|362|365|366|367|369/0.18,improve,11.11,
With this sed command:
sed -i 's/\(.*\)\//\1,/'
This works in Unix. I tried to use this with system in Perl code, but it doesnt work. I request a solution using sed in Perl for the same.
First of all, the code you claim works doesn't.
$ printf 'a/b/c\n' | sed 's/(.*)//\1,/'
sed: -e expression #1, char 9: unknown option to `s'
It should be
$ printf 'a/b/c\n' | sed 's/\(.*\)\//\1,/'
a/b,c
You're asking how to execute this command from Perl. You can use the following:
system('sed', '-i', '/\\(.*\\)\\//\\1,/', '--', $qfn)
Note that you can quite easily do the same task in Perl itself.
local #ARGV = $qfn;
local $^I = '';
while (<>) {
s{^.*\K/}{,};
print;
}
Here is way to do this in sed:
echo "365|366|367|369/0.18,improve/11.11," | sed 's/^\(.*\)\/\(.*\)$/\1,\2/'
365|366|367|369/0.18,improve,11.11,
The regex pattern used is:
^\(.*\)\/\(.*\)$
This says to match and capture everything up until the last forward slash. Then, also match and capture everything after the last forward slash. Finally replace with the first two capture groups, but now separated by a comma.
Notes:
forward slash / needs to be escaped by a backslash, to distinguish it from being the pattern delimiter
parentheses in the capture groups also need to be escaped with backslash

Multiple lines from one with sed

Is there some way in sed to create multiple output lines from a single input line? I have a template file (there are more lines in the file, I'm just simplifying it):
http://hostname:#PORT#
I am currently using sed to replace #PORT# with a real port. However, I'd like to be able to pass in multiple ports, and have sed create a line for each. Is that possible?
I'm assuming you would want to duplicate the whole line for each port number. In that case it's easier to think of it as replacing the port numbers with the URL:
$ cat ports.in
1
2
3
4
5
$ sed 's#^\([0-9]*\$)#http://hostname:\1#' ports.in
http://hostname:1
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
To do it the other way around is easier with awk:
$ cat url.in
http://hostname:#PORT#
$ awk '/^[0-9]/ {ports[++i]=$0} /^http/ {sub(":#PORT#", ":%d\n"); for (p in ports) printf($0, ports[p])}' ports.in url.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1
This reads both ports.in and url.in, and if a line starts with a number it is assumed that it's a port number from ports.in. Otherwise, if the line starts with http it's assumed to be an URL from url.in and will replace the port placeholder with a printf formatting string and then print the URL once for each port number read. It will fail to do the right thing if the files are fed in the wrong order.
A similar solution, but taking the URL from a shell variable:
$ myurl="http://hostname:#PORT#"
$ awk -v url="$myurl" 'BEGIN{sub(":#PORT#", ":%d\n",url)} /^[0-9]/ {ports[++i]=$0} END {for (p in ports) printf(url, ports[p])}' ports.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1
It seems you have multiple templates and multiple ports to apply to them. Here's how to do it in a shell script (tested with bash), but you'll need to do it in two sed executions if you want to keep it simple because you have two multiply valued inputs. It is mathematically a cross product of the templates and the substitution values.
ports='80
8080
8081'
templates='http://domain1.net:%PORT/
http://domain2.org:%PORT/
http://domain3.com:%PORT/'
meta="s/(.*)/g; s|%PORT|\1|p; /p"
sed="`echo \"$ports\" |sed -rn \"$meta\" |tr '\n' ' '`"
echo "$templates" |sed -rn "h; $sed"
The shell variable meta is a meta sed script because it writes another sed script. The h saves the pattern buffer in the sed hold space. The sed commands generated from the meta sed recall, substitute, and print for each port. This is the result.
http://domain1.net:80/
http://domain1.net:8080/
http://domain1.net:8081/
http://domain2.org:80/
http://domain2.org:8080/
http://domain2.org:8081/
http://domain3.com:80/
http://domain3.com:8080/
http://domain3.com:8081/

sed to remove URLs from a file

I am trying to write a sed expression that can remove urls from a file
example
http://samgovephotography.blogspot.com/ updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N https://hollywoodmomblog.com/?p=2442 Thx to HMB Contributor #kdpartak :)
But I dont get it:
sed 's/[\w \W \s]*http[s]*:\/\/\([\w \W]\)\+[\w \W \s]*/ /g' posFile
FIXED!!!!!
handles almost all cases, even malformed URLs
sed 's/[\w \W \s]*http[s]*[a-zA-Z0-9 : \. \/ ; % " \W]*/ /g' positiveTweets | grep "http" | more
The following removes http:// or https:// and everything up until the next space:
sed -e 's!http\(s\)\{0,1\}://[^[:space:]]*!!g' posFile
updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N Thx to HMB Contributor #kdpartak :)
Edit:
I should have used:
sed -e 's!http[s]\?://\S*!!g' posFile
"[s]\?" is a far more readable way of writing "an optional s" compared to "\(s\)\{0,1\}"
"\S*" a more readable version of "any non-space characters" than "[^[:space:]]*"
I must have been using the sed that came installed with my Mac at the time I wrote this answer (brew install gnu-sed FTW).
There are better URL regular expressions out there (those that take into account schemes other than HTTP(S), for instance), but this will work for you, given the examples you give. Why complicate things?
The accepted answer provides the approach that I used to remove URLs, etc. from my files. However it left "blank" lines. Here is a solution.
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' input_file
perl -i -pe 's/^'`echo "\012"`'${2,}//g' input_file
The GNU sed flags, expressions used are:
-i Edit in-place
-e [-e script] --expression=script : basically, add the commands in script
(expression) to the set of commands to be run while processing the input
^ Match start of line
$ Match end of line
? Match one or more of preceding regular expression
{2,} Match 2 or more of preceding regular expression
\S* Any non-space character; alternative to: [^[:space:]]*
However,
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g'
leaves nonprinting character(s), presumably \n (newlines). Standard sed-based approaches to remove "blank" lines, tabs and spaces, e.g.
sed -i 's/^[ \t]*//; s/[ \t]*$//'
do not work, here: if you do not use a "branch label" to process newlines, you cannot replace them using sed (which reads input one line at a time).
The solution is to use the following perl expression:
perl -i -pe 's/^'`echo "\012"`'${2,}//g'
which uses a shell substitution,
'`echo "\012"`'
to replace an octal value
\012
(i.e., a newline, \n), that occurs 2 or more times,
{2,}
(otherwise we would unwrap all lines), with something else; here:
//
i.e., nothing.
[The second reference below provides a wonderful table of these values!]
The perl flags used are:
-p Places a printing loop around your command,
so that it acts on each line of standard input
-i Edit in-place
-e Allows you to provide the program as an argument,
rather than in a file
References:
perl flags: Perl flags -pe, -pi, -p, -w, -d, -i, -t?
ASCII control codes: https://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/
remove URLs: sed to remove URLs from a file
branch labels: How can I replace a newline (\n) using sed?
GNU sed manual: https://www.gnu.org/software/sed/manual/sed.html
quick regex guide: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
Example:
$ cat url_test_input.txt
Some text ...
https://stackoverflow.com/questions/4283344/sed-to-remove-urls-from-a-file
https://www.google.ca/search?dcr=0&ei=QCsyWtbYF43YjwPpzKyQAQ&q=python+remove++citations&oq=python+remove++citations&gs_l=psy-ab.3...1806.1806.0.2004.1.1.0.0.0.0.61.61.1.1.0....0...1c.1.64.psy-ab..0.0.0....0.-cxpNc6youY
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://bbengfort.github.io/tutorials/2016/05/19/text-classification-nltk-sckit-learn.html
http://datasynce.org/2017/05/sentiment-analysis-on-python-through-textblob/
https://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
http://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
ftp://ftp.ncbi.nlm.nih.gov/
ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/alignment_indices/20100804.alignment.index
Some more text.
$ sed -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' url_test_input.txt > a
$ cat a
Some text ...
Some more text.
$ perl -i -pe 's/^'`echo "\012"`'${2,}//g' a
Some text ...
Some more text.
$