I would like to get the lines which begin with "Xboy" and its following lines which begins with "+". How can I do this by using sed?
The input looks like below:
Xapple
+apple1
+apple2
.ends
Xboy
+boy1
+boy2
V2
Xcat
+cat1
+cat2
Xcat
The output should look like below:
Xboy
+boy1
+boy2
This will do the job in sed, but really this problem is more complicated than sed is intended for. You'd be better off using perl or python.
$ cat foo.txt
Xapple
+apple1
+apple2
.ends
Xboy
+boy1
+boy2
V2
Xcat
+cat1
+cat2
Xcat
$ sed ':section;/Xboy/!d;:plusline;n;/^+/b plusline;b section' foo.txt
Xboy
+boy1
+boy2
In a proper programming language, the nested loop structure of the data becomes clearer, and we can be more confident there are no edge cases we've forgotten about.
In Perl:
my $line = <>;
while (defined($line)) {
chomp($line);
if ($line eq "Xboy") {
print $line, "\n";
$line = <>;
while (defined($line) && $line =~ /^\+/) {
print $line;
$line = <>;
}
}
else {
$line = <>;
}
}
In Python:
import fileinput
lines = fileinput.input()
line = lines.readline()
while line != '':
line = line.rstrip('\n')
if line == 'Xboy':
print(line)
line = lines.readline()
while line != '' and line.startswith('+'):
print(line, end='')
line = lines.readline()
else:
line = lines.readline()
An awk version
awk '/Xboy/ {f=1;print;next} {/^+/?a=1:f=0} a&&f' file
Xboy
+boy1
+boy2
This might work for you (GNU sed):
sed -n ':a;/Xboy/{:b;p;n;/^+/bb;ba}' file
If a line contains Xboy, print it and any following lines that begin + otherwise be silent.
I guessed this is what you intended, however you may have meant that other lines beginning with non-word-like characters should also be ignored, use:
sed -n ':a;/Xboy/{:b;p;:c;n;/^+/bb;/^\W/bc;ba}' file
or perhaps you meant this:
sed -n ':a;/Xboy/{:b;p;:c;n;/^+/bb;/^[^[:upper:]]/bc;ba}' file
It may be that you only want to print Xboy if there is a line following that begins +, then use:
sed -n ':a;/Xboy/{$d;h;:b;n;/^+/{H;$!bb};x;/\n/p;x;ba}' file
Related
I have several AIX systems with a configuration file, let's call it /etc/bar/config. The file may or may not have a line declaring values for foo. An example would be:
foo = A_1,GROUP_1,USER_1,USER_2,USER_3
The foo line may or may not be the same on all systems. Different systems may have different values and different a different number of values. My task is to add "bare minimum" values in the config file on all systems. The bare minimum line will look like this.
foo = A_1,USER_1,SYS_1,SYS_2
If the line does not exist, I must create it. If the line does exist, I must merge the two lines. Using my examples, the result would be this. The order of the values does not matter.
foo = A_1,GROUP_1,USER_1,USER_3,USER_2,SYS_1,SYS_2
Obviously I want a script to do my work. I have the standard sh, ksh, awk, sed, grep, perl, cut, etc. Since this is AIX, I do not have access to the GNU versions of these utilities.
Originally, I had a script with these commands to replace the entire foo line.
cp /etc/bar/config /etc/bar/config.$$
sed "s/foo = .*/foo = A_1,USER_1,SYS_1,SYS_2/" /etc/bar/config.$$ > /etc/bar/config
But this simply replaces the line. It does take into consideration any pre-existing configuration, including a line that's missing. And I'm doing other configuration modifications in the script, such as adding completely unique lines to other files and restarting a process, so I'd perfer this be some type of shell-based code snippet I can add to my change script. I am open to other options, especially if the solution is simpler.
Some dirty bash/sed:
#!/usr/bin/bash
input_file="some_filename"
v=$(grep -n '^foo *=' "$input_file")
lineno=$(cut -d: -f1 <<< "${v}0:")
base="A_1,USER_1,SYS_1,SYS_2,"
if [[ "$lineno" == 0 ]]; then
echo "foo = A_1,USER_1,SYS_1,SYS_2" >> "$input_file"
else
all=$(sed -n ${lineno}'s/^foo *= */'"$base"'/p' "$input_file" | \
tr ',' '\n' | sort | uniq | tr '\n' ',' | \
sed -e 's/^/foo = /' -e 's/, *$//' -e 's/ */ /g' <<< "$all")
sed -i "${lineno}"'s/.*/'"$all"'/' "$input_file"
fi
Untested bash, etc.
config=/etc/bar/config
default=A_1,USER_1,SYS_1,SYS_2
pattern='^foo[[:blank:]]*=[[:blank:]]*' # shared with grep and sed
if current=$( grep "$pattern" "$config" | sed "s/$pattern//" )
then
new=$( echo "$current,$default" | tr ',' '\n' | sort | uniq | paste -sd, )
sed "s/$pattern.*/foo = $new/" "$config" > "$config.$$.tmp" &&
mv "$config.$$.tmp" "$config"
else
echo "foo = $default" >> "$config"
fi
A vanilla perl solution:
perl -i -lpe '
BEGIN {%foo = map {$_ => 1} qw/A_1 USER_1 SYS_1 SYS_2/}
if (s/^foo\s*=\s*//) {
$found=1;
$foo{$_}=1 for split /,/;
$_ = "foo = " . join(",", keys %foo);
}
END {print "foo = " . join(",", keys %foo) unless $found}
' /etc/bar/config
This Perl code will do as you ask. It expects the path to the file to be modified as a parameter on the command line.
Note that it reads the entire input file into the array #config and then overwrites the same file with the modified data.
It works by building a hash %values from a combination of the items already present in the foo = line and the list of defaults items in #defaults. The combination is sorted in alphabetical order and joined eith a comma
use strict;
use warnings;
my #defaults = qw/ A_1 USER_1 SYS_1 SYS_2 /;
my ($file) = #ARGV;
my #config = <>;
open my $out_fh, '>', $file or die $!;
select $out_fh;
for ( #config ) {
if ( my ($pfx, $vals) = /^(foo \s* = \s* ) (.+) /x ) {
my %values;
++$values{$_} for $vals =~ /[^,\s]+/g;
++$values{$_} for #defaults;
print $pfx, join(',', sort keys %values), "\n";
}
else {
print;
}
}
close $out_fh;
output
foo = A_1,GROUP_1,SYS_1,SYS_2,USER_1,USER_2,USER_3
Since you didn't provide sample input and expected output I couldn't test this but this is the right approach:
awk '
/foo = / { old = ","$3; next }
{ print }
END {
split("A_1,USER_1,SYS_1,SYS_2"old,all,/,/)
for (i in all)
if (!seen[all[i]]++)
new = (new ? new "," : "") all[i]
print "foo =", new
}
' /etc/bar/config > tmp && mv tmp /etc/bar/config
I need to be able to replace, character positions 58-71 with whitespace on every line in a file, on Unix / Solaris.
Extract example:
LOCAX0791LOCPIKAX0791LOC AX0791LOC095200130008PIKAX079100000000000000WL1G011 000092000000000000
LOCAX0811LOCPIKAX0811LOC AX0811LOC094700450006PIKAX0811000000000000006C1G011 000294000000000000
LOCAX0831LOCPIKAX0831LOC AX0831LOC094000180006PIKAX083100000000000000OJ1G011 000171000000000000
Or:
sed -r 's/^(.{57})(.{14})/\1 /' bar.txt
With apologies for the horrible 14 space string.
Simple Perl oneliner
perl -pne 'substr($_, 58, 13) = (" "x13);' inputfile.txt > outputfile.txt
try this:
awk 'BEGIN{FS=OFS=""} {for(i=57;i<=71;i++)$i=" "}1' file
output for your first line:
LOCAX0791LOCPIKAX0791LOC AX0791LOC095200130008PIKAX079 WL1G011
Try this in Perl:
use strict;
use warnings;
while(<STDIN>) {
my #input = split(//, $_);
for(my $i=58; $i<71; $i++) {
$input[$i] = " ";
}
$_ = join(//, #input);
print $_;
}
If you have gawk on your Solaris box, you could try:
gawk 'BEGIN{FIELDWIDTHS = "57 14 1000"} gsub(/./," ",$2)' OFS= file
I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.
To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt
use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();
Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)
One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace
This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile
I am not very proficient in perl, awk, or sed and I have been searching the web for a solution to my problem for some while now, but wasn't very successful.
I would like to replace
<math> ... </math>
with
<math>\begin{align} ... \end{align}</math>
if ... contains \\. My problem is that the string between the <math> tags can span multiple lines. I managed to replace the tags within one line with sed but couldn't get it to run for multiple lines.
Any simple solution with perl, awk, or sed is very welcome. Thanks a lot.
Use separate expressions for each tag and the script will be immune to multilinedness:
sed -e 's,<math>,&\\begin{align},g' -e 's,</math>,&\\end{align},g'
Edit:
Multiline awk version:
awk '/<math>/,/<\/math>/ {
if (index($0, "<math>")) {
a=$0
} else {
b = b $0
}
if (index($0, "</math>")) {
if (index(b,"\\\\")) {
sub("<math>","&\\begin{align}", a)
sub("</math>","\\end{align}&", b)
};
print a,b
a=""
b=""
}
}'
Try next perl command. How it works? It reads content file in slurp mode saving it in $f variable and later add with a regexp in single mode (match newlines with .) \begin{regex} and \end{regex} if found \\ between math tags.
perl -e '
do {
$/ = undef;
$f = <>
};
$f =~ s#(<math>)(.*\\\\.*)(</math>)#$1\\begin{align}$2\\end{align}$3#s;
printf qq|%s|, $f
' infile
This might work for you (GNU sed):
sed ':a;$!{N;ba}
/[\x00\x01\x02]/q1
s/<math>/\x00/g
s/<\/math>/\x01/g
s/\\\\/\x02/g
s/\x00\([^\x01\x02]*\)\x01/<math>\1<\/math>/g
s/\x00/<math>\\begin{align}/g
s/\x01/\\end{align}<\/math>/g
s/\x02/\\\\/g' file
input file:
$ cat t.txt
id1;value1_1
id1;value1_2
id2;value2_1
id3;value3_1
id4;value4_1
id4;value4_2
id5;value5_1
result would be:
id1;value1_1;id1;value1_2
id3;value3_1
id4;value4_1;id4;value4_2
id5;value5_1
using sed or awk. Please give your opinion.
Here's one way to do it:
awk -F';' 'BEGIN { getline; id=$1; line=$0 } { if ($1 != id) { print line; line = $0; } else { line = line ";" $0; } id=$1; } END { print line; }' t.txt
Explanation:
Set field separator to ;:
-F';'
Start by reading the first line of input (getline), save the first field ($1) as id, and the first line ($0) as line:
BEGIN { getline; id=$1; line=$0 }
For each line of input, check if the first field differs from the stored id:
if ($1 != id)
If it does, then print the saved line and store the new one ($0):
print line; line = $0;
Otherwise, append the new line to the stored line(s):
line = line ";" $0;
And save the new id:
id=$1
At the end, print whatever is left in line:
END { print line; }
I guess in your result example, the id2; line is missing by mistake, right?
anyway, you could try the awk line below:
awk -F';' '{a[$1]=($1 in a)?a[$1]";"$0:$0}END{for(x in a)print a[x]}' yourFile|sort
output would be:
id1;value1_1;id1;value1_2
id2;value2_1
id3;value3_1
id4;value4_1;id4;value4_2
id5;value5_1
This might work for you:
sed -e '1{h;d};H;${x;:a;s/\(\([^;]*;\)\([^\n]*\)\)\n\2/\1;\2/;ta;p};d' t.txt
Explanation:
Slurp file in to hold space (HS) then on end-of-file swap to the HS and using substitution concatenate lines with duplicate keys and print. N.B. lines normally printed are all deleted.
EDIT:
The above solution works (as far as I know) but for large volumes is not very fast (read incredibly slow). This solution is better:
# cat -A /tmp/t.txt
id1;value1_1$
id1;value1_2$
id2;value2_1$
id3;value3_1$
id4;value4_1$
id4;value4_2$
id5;value5_1$
# for x in {1..1000};do cat /tmp/t.txt;done |
> sed ':a;$!N;/^\([^;]*;\).*\n\1/s/\n//;ta;P;D'| sort | uniq
id1;value1_1;id1;value1_2
id2;value2_1
id3;value3_1
id4;value4_1;id4;value4_2
id5;value5_1