I needed some perl code to match balanced parens in a string.
so I found this regular expresion code below from .Net and pasted it into my Perl program thinking the regex engine was similar enough for it to work:
/
\s*\(
(?: [^\(\)] | (?<openp>\() | (?<-openp>\)) )+
(?(openp)(?!))
\)\s*
/x
My understanding of how this regex works is a follows:
Match first paren:
\(
Match pattern a, b, or c at least once:
(?: <a> | <b> | <c>)+
where a, b, and c are:
a is any character that is not a paren
[^\(\)]
b is character that is a left-paren
\(
c is character that is a right-paren
\)
and:
b is a capture group that pushes to named capture "openp"
(?<openp>\()
c is a capture group that pops from named capture "openp"
(?<openp>\()
reject any regular expresssion match where openp doesn't equal zero items on stack:
(?<-openp>\))
4. match end paren
\)
Here's the perl code:
sub eat_parens($) {
my $line = shift;
if ($line !~ /
\s*\(
(?: [^\(\)] | (?<openp>\() | (?<-openp>\)) )+
(?(openp)(?!))
\)\s*
/x)
{
return $line;
}
return $';
}
sub testit2 {
my $t1 = "(( (sdfasd)sdfsas (sdfasd) )sadf) ()";
$t2 = eat_parens($t1);
print "t1: $t1\n";
print "t2: $t2\n";
}
testit2();
Error is:
$ perl x.pl
Sequence (?<-...) not recognized in regex; marked by <-- HERE in m/\s*\((?: [^\(\)] | (?<openp> \( ) | (?<- <-- HERE openp> \) ) )+ (?(openp)(?!) ) \) \s*/ at x.pl line 411.
Not sure what's causing this.... any ideas?
Here's one way to do it:
/
(?&TEXT)
(?(DEFINE)
(?<TEXT>
[^()]*+
(?: \( (?&TEXT) \)
[^()]*+
)*+
)
)
/x
It can also be done without naming anything. Search for "recursive" in perlre.
this is my input text file
< > & * ^ % $ # # ! ) ( ) < > < > > > <
This is the sed shell script that I am using.
sed 's/&/&/g ; s/</</g ; s/>/>/g' html_file.txt > new_file.txt
This is the output file:
<lt; >gt; & * ^ % $ # # ! ) ( ) <lt; >gt; <lt; >gt; >gt; >gt; <lt;
I can't understand that why there is still < and > signs instead of &?
From info sed:
3.3 The 's' Command
[...]
The 's' command (as in substitute) is probably the most important in
'sed' [...]. The syntax of the 's' command is 's/REGEXP/REPLACEMENT/FLAGS'.
[...]
The REPLACEMENT can contain [...] unescaped '&' characters which reference the
whole matched portion of the pattern space.
Escape & with \ to \&.
I have several AIX systems with a configuration file, let's call it /etc/bar/config. The file may or may not have a line declaring values for foo. An example would be:
foo = A_1,GROUP_1,USER_1,USER_2,USER_3
The foo line may or may not be the same on all systems. Different systems may have different values and different a different number of values. My task is to add "bare minimum" values in the config file on all systems. The bare minimum line will look like this.
foo = A_1,USER_1,SYS_1,SYS_2
If the line does not exist, I must create it. If the line does exist, I must merge the two lines. Using my examples, the result would be this. The order of the values does not matter.
foo = A_1,GROUP_1,USER_1,USER_3,USER_2,SYS_1,SYS_2
Obviously I want a script to do my work. I have the standard sh, ksh, awk, sed, grep, perl, cut, etc. Since this is AIX, I do not have access to the GNU versions of these utilities.
Originally, I had a script with these commands to replace the entire foo line.
cp /etc/bar/config /etc/bar/config.$$
sed "s/foo = .*/foo = A_1,USER_1,SYS_1,SYS_2/" /etc/bar/config.$$ > /etc/bar/config
But this simply replaces the line. It does take into consideration any pre-existing configuration, including a line that's missing. And I'm doing other configuration modifications in the script, such as adding completely unique lines to other files and restarting a process, so I'd perfer this be some type of shell-based code snippet I can add to my change script. I am open to other options, especially if the solution is simpler.
Some dirty bash/sed:
#!/usr/bin/bash
input_file="some_filename"
v=$(grep -n '^foo *=' "$input_file")
lineno=$(cut -d: -f1 <<< "${v}0:")
base="A_1,USER_1,SYS_1,SYS_2,"
if [[ "$lineno" == 0 ]]; then
echo "foo = A_1,USER_1,SYS_1,SYS_2" >> "$input_file"
else
all=$(sed -n ${lineno}'s/^foo *= */'"$base"'/p' "$input_file" | \
tr ',' '\n' | sort | uniq | tr '\n' ',' | \
sed -e 's/^/foo = /' -e 's/, *$//' -e 's/ */ /g' <<< "$all")
sed -i "${lineno}"'s/.*/'"$all"'/' "$input_file"
fi
Untested bash, etc.
config=/etc/bar/config
default=A_1,USER_1,SYS_1,SYS_2
pattern='^foo[[:blank:]]*=[[:blank:]]*' # shared with grep and sed
if current=$( grep "$pattern" "$config" | sed "s/$pattern//" )
then
new=$( echo "$current,$default" | tr ',' '\n' | sort | uniq | paste -sd, )
sed "s/$pattern.*/foo = $new/" "$config" > "$config.$$.tmp" &&
mv "$config.$$.tmp" "$config"
else
echo "foo = $default" >> "$config"
fi
A vanilla perl solution:
perl -i -lpe '
BEGIN {%foo = map {$_ => 1} qw/A_1 USER_1 SYS_1 SYS_2/}
if (s/^foo\s*=\s*//) {
$found=1;
$foo{$_}=1 for split /,/;
$_ = "foo = " . join(",", keys %foo);
}
END {print "foo = " . join(",", keys %foo) unless $found}
' /etc/bar/config
This Perl code will do as you ask. It expects the path to the file to be modified as a parameter on the command line.
Note that it reads the entire input file into the array #config and then overwrites the same file with the modified data.
It works by building a hash %values from a combination of the items already present in the foo = line and the list of defaults items in #defaults. The combination is sorted in alphabetical order and joined eith a comma
use strict;
use warnings;
my #defaults = qw/ A_1 USER_1 SYS_1 SYS_2 /;
my ($file) = #ARGV;
my #config = <>;
open my $out_fh, '>', $file or die $!;
select $out_fh;
for ( #config ) {
if ( my ($pfx, $vals) = /^(foo \s* = \s* ) (.+) /x ) {
my %values;
++$values{$_} for $vals =~ /[^,\s]+/g;
++$values{$_} for #defaults;
print $pfx, join(',', sort keys %values), "\n";
}
else {
print;
}
}
close $out_fh;
output
foo = A_1,GROUP_1,SYS_1,SYS_2,USER_1,USER_2,USER_3
Since you didn't provide sample input and expected output I couldn't test this but this is the right approach:
awk '
/foo = / { old = ","$3; next }
{ print }
END {
split("A_1,USER_1,SYS_1,SYS_2"old,all,/,/)
for (i in all)
if (!seen[all[i]]++)
new = (new ? new "," : "") all[i]
print "foo =", new
}
' /etc/bar/config > tmp && mv tmp /etc/bar/config
I have a file inside that one line contains nested parenthesis, i want to display those words only.
Example:
(abc (defg) or hij(klmn)) and (opq(rstuv))
Expected Result:
defg
klmn
rstuv
I have tried with awk - awk -F "[(())]" '{ for (i=2; i<NF; i+=2) print $i}'
I have tried with sed - sed 's/.*(\([a-zA-Z0-9_]*\)).*/\1/'
Using perl global matching and lazy quantifiers:
#! /usr/bin/perl -n
use feature 'say';
while (/\((.*?\)[^(]*?)\)/g) {
$m=$1;
while ($m =~ /\((.*?)\)/g) {
say $1;
}
}
Output:
defg
klmn
rstuv
Maybe with grep?
$ echo "(abc (defg) or hij(klmn)) and (opq(rstuv))" | grep -o "([a-z]*)"
(defg)
(klmn)
(rstuv)
It catches the groups of ( + letters + ).
I tried to get rid of the paranthesis but could not. This is my approach:
grep -Po '(?<=()[a-z]*(?=))'
but it indicates that "grep: lookbehind assertion is not fixed length", as I guess it cannot decide up to which ) to look for.
This might work for you (GNU sed):
sed -r 's/\(([^()]*)\)/\n\1\n/;s/[^\n]*\n//;/[^()]/P;D' file
I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile