Perform action on line range in sed/awk - sed

How can I extract certain variables from a specific range of lines in sed/awk?
Example: I want to exctract the host and port from .tnsnames.ora
from this section that starts at line 105.
DB_CONNECTION=
(description=
(address=
(protocol=tcp)
(host=127.0.0.1)
(port=1234)
)
(connect_data=
(sid=ABCD)
(sdu=4321)
)

The gawk can use regular expression in field separater(FS).
'$0=$2' is always true, so automatically this script print $2.
$ gawk -F'[()]' 'NR>105&&NR<115&&(/host/||/port/)&&$0=$2' .tnsnames.ora

use:
sed '105,$< whatever sed code you want here >'
If you specifically want the host and the port you can do something like:
sed .tnsnames.ora '105,115p'|grep -e 'host=' -e 'port='

You can use address ranges to specify to which section to apply the regular expressions. If you leave the end line address out (keep the comma) it will match to the end of file. You can also 'chain' multiple expressions by using '-e' multiple times. The following expression will just print the port and host value to standard out. It uses back references (\1) in order to just print the matching parts.
sed -n -e '105,115s/(port=\([0-9].*\))/\1/p' -e '105,115s/(host=\([0-9\.].*\))/\1/p' tnsnames.ora

#lk, to address the answer you posted:
You can write awk code like C, but it's more succinctly expressed as "pattern {action}" pairs.
If you have gawk or nawk, the field separator is an ERE as Hirofumi Saito said
gawk -F'[()=]' '
NR < 105 {next}
NR > 115 {exit}
$2 == "host" || $2 == "port" {
# do stuff with $2 and $3
print $2 "=" $3
}
'

Related

Replacing all occurrence after nth occurrence in a line in perl

I need to replace all occurrences of a string after nth occurrence in every line of a Unix file.
My file data:
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
My output data:
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
tried using sed: sed 's/://3g' test.txt
Unfortunately, the g option with the occurrence is not working as expected. instead, it is replacing all the occurrences.
Another approach using awk
awk -v c=':' -v n=2 'BEGIN{
FS=OFS=""
}
{
j=0;
for(i=0; ++i<=NF;)
if($i==c && j++>=n)$i=""
}1' file
$ cat file
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
$ awk -v c=':' -v n=2 'BEGIN{FS=OFS=""}{j=0;for(i=0; ++i<=NF;)if($i==c && j++>=n)$i=""}1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
With GNU awk, using gensub please try following. This is completely based on your shown samples, where OP wants to remove : from 3rd occurrence onwards. Using gensub to segregate parts of matched values and removing all colons from 2nd part(from 3rd colon onwards) in it as per OP's requirement.
awk -v regex="^([^:]*:)([^:]*:)(.*)" '
{
firstPart=restPart=""
firstPart=gensub(regex, "\\1 \\2", "1", $0)
restPart=gensub(regex,"\\3","1",$0)
gsub(/:/,"",restPart)
print firstPart restPart
}
' Input_file
I have inferred based on the limited data you've given us, so it's possible this won't work. But I wouldn't use regex for this job. What you have there is colon delimited fields.
So I'd approach it using split to extract the data, and then some form of string formatting to reassemble exactly what you like:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ( undef, $first, #rest ) = split /:/;
print ":$first:", join ( "", #rest ),"\n";
}
__DATA__
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
This gives you the desired result, whilst IMO being considerably clearer for the next reader than a complicated regex.
You can use the perl solution like
perl -pe 's~^(?:[^:]*:){2}(*SKIP)(?!)|:~~g if /^:account_id:/' test.txt
See the online demo and the regex demo.
The ^(?:[^:]*:){2}(*SKIP)(?!)|: regex means:
^(?:[^:]*:){2}(*SKIP)(?!) - match
^ - start of string (here, a line)
(?:[^:]*:){2} - two occurrences of any zero or more chars other than a : and then a : char
(*SKIP)(?!) - skip the match and go on to search for the next match from the failure position
| - or
: - match a : char.
And only run the replacement if the current line starts with :account_id: (see if /^:account_id:/').
Or an awk solution like
awk 'BEGIN{OFS=FS=":"} /^:account_id:/ {result="";for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}' test.txt
See this online demo. Details:
BEGIN{OFS=FS=":"} - sets the input/output field separator to :
/^:account_id:/ - line must start with :account_id:
result="" - sets result variable to an empty string
for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result} - iterates over the fields and if the field number is greater than 2, just append the current field value to result, else, append the value + output field separator; then print the result.
I would use GNU AWK following way if n fixed and equal 2 following way, let file.txt content be
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
then
awk 'BEGIN{FS=":";OFS=""}{$2=FS $2 FS;print}' file.txt
output
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Explanation: use : as field separator and nothing as output field separator, this itself does remove all : so I add : which have to be preserved: 1st (before second column) and 2nd (after second column). Beware that I tested it solely for this data, so if you would want to use it you should firstly test it with more possible inputs.
(tested in gawk 4.2.1)
This might work for you (GNU sed):
sed 's/:/\n/3;h;s/://g;H;g;s/\n.*\n//' file
Replace the third occurrence of : by a newline.
Make a copy of the line.
Delete all occurrences of :'s.
Append the amended line to the copy.
Join the two lines by removing everything from third occurrence of the copy to the third occurrence of the amended line.
N.B. The use of the newline is the best delimiter to use in the case of sed, as the line presented to seds commands are initially devoid of newlines. However the important property of the delimiter is that it is unique and therefore can be any such character as long as it is not found anywhere in the data set.
An alternative solution uses a loop to remove all :'s after the first two:
sed -E ':a;s/^(([^:]*:){2}[^:]*):/\1/;ta' file
With GNU awk for the 3rd arg to match() and gensub():
$ awk 'match($0,/(:[^:]+:)(.*)/,a){ $0=a[1] gensub(/:/,"","g",a[2]) } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
and with any awk in any shell on every Unix box:
$ awk 'match($0,/:[^:]+:/){ tgt=substr($0,1+RLENGTH); gsub(/:/,"",tgt); $0=substr($0,1,RLENGTH) tgt } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

perl or awk: zero proof division with perl or awk

I have to add a field showing the difference in percentage between 2 fields in a file like:
BI,1266,908
BIL,494,414
BKC,597,380
BOOM,2638,654
BRER,1453,1525
BRIG,1080,763
DCLE,0,775
The output should be:
BI,1266,908,-28.3%
BIL,494,414,-16.2%
BKC,597,380,-36.35%
BOOM,2638,654,-75.2%
BRER,1453,1525,5%
BRIG,1080,763,-29.4%
DCLE,0,775,-
Note the zero in the last row. Either of these fields could be zero. If a zero is present in either field, N/A or - is acceptable.
What I'm trying --
Perl:
perl -F, -ane 'if ($F[2] > 0 || $F[3] > 0){print $F[0],",",$F[1],",",$F[2],100*($F[2]/$F[3])}' file
I get Illegal division by zero at -e line 1, <> line 2. If I change the || to && it prints nothing.
In awk:
awk '$2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
Just prints the file.
$ awk -F, '$2 == 0 || $3 == 0 { printf("%s,-\n", $0); next }
{ printf("%s,%.2f%%\n", $0, 100 * ($3 / $2) - 100) }' input.csv
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,-
How it works: if the second or third columns are equal to 0, add a - field to the line. Otherwise, calculate the percentage difference and add that.
Your perl's main issue was confusing awk's 1-based column indexes with perl's 0-based column indexes.
perl -F, -ane 'print "$1," if /(.+)/;if ($F[1] > 0 && $F[2] > 0){printf ("%.2f%", ((100*$F[2]/$F[1])-100)) } else {print "-"};print "\n"' file
The $1 here refers to the capture group (.+) which means "The whole line but the linefeed". The rest is probably self-explanatory if you understand the awk.
You're not telling awk that the fields are separated by commas so it's assuming the default, spaces, and so $2 is never greater than zero because it's null as there's only 1 space-separated field per line. Change it to:
$ awk 'BEGIN{FS=OFS=","} $2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
BI,1266,908,908(71.72%)
BIL,494,414,414(83.81%)
BKC,597,380,380(63.65%)
BOOM,2638,654,654(24.79%)
BRER,1453,1525,1525(104.96%)
BRIG,1080,763,763(70.65%)
DCLE,0,775
and then tweak it for your desired output:
$ awk 'BEGIN{FS=OFS=","} {$4=($2 && $3 ? sprintf("%.2f%", (($3/$2)-1)*100) : "N/A")} 1' file
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,N/A

sed replace positional match of unknown string divided by user-defined separator

Want to rename the (known) 3th folder within a (unknown) file path from a string, when positioned on 3th level while separator is /
Need a one-liner explicitly for sed. Because I later want use it for tar --transform=EXPRESSION
string="/db/foo/db/bar/db/folder"
echo "$string" | sed 's,db,databases,'
sed replace "db" only on 3th level
expected result
/db/foo/databases/bar/db/folder
You could use a capturing group to capture /db/foo/ and then match db. Then use use the first caputring group in the replacement using \1:
string="/db/foo/db/bar/db/folder"
echo -e "$string" | sed 's,^\(/[^/]*/[^/]*/\)db,\1databases,'
About the pattern
^ Start of string
\( Start capture group
/[^/]*/[^/]*/ Match the first 2 parts using a negated character class
\) Close capture group
db Match literally
That will give you
/db/foo/databases/bar/db/folder
If awk is also an option for this task:
$ awk 'BEGIN{FS=OFS="/"} $4=="db"{$4="database"} 1' <<<'/db/foo/db/bar/db/folder'
/db/foo/database/bar/db/folder
FS = OFS = "/" assign / to both input and output field separators,
$4 == "db" { $4 = "database }" if fourth field is db, make it database,
1 print the record.
Here is a pure bash way to get this done by setting IFS=/ without calling any external utility:
string="/db/foo/db/bar/db/folder"
string=$(IFS=/; read -a arr <<< "$string"; arr[3]='databases'; echo "${arr[*]}")
echo "$string"
/db/foo/databases/bar/db/folder

Sed - replace words

I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.
awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.
If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array
You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.

Swap two columns - awk, sed, python, perl

I've got data in a large file (280 columns wide, 7 million lines long!) and I need to swap the first two columns. I think I could do this with some kind of awk for loop, to print $2, $1, then a range to the end of the file - but I don't know how to do the range part, and I can't print $2, $1, $3...$280! Most of the column swap answers I've seen here are specific to small files with a manageable number of columns, so I need something that doesn't depend on specifying every column number.
The file is tab delimited:
Affy-id chr 0 pos NA06984 NA06985 NA06986 NA06989
You can do this by swapping values of the first two fields:
awk ' { t = $1; $1 = $2; $2 = t; print; } ' input_file
I tried the answer of perreal with cygwin on a windows system with a tab separated file. It didn't work, because the standard separator is space.
If you encounter the same problem, try this instead:
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file
Incoming separator is defined by -F $'\t' and the seperator for output by OFS=$'\t'.
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file > output_file
Try this more relevant to your question :
awk '{printf("%s\t%s\n", $2, $1)}' inputfile
This might work for you (GNU sed):
sed -i 's/^\([^\t]*\t\)\([^\t]*\t\)/\2\1/' file
Have you tried using the cut command? E.g.
cat myhugefile | cut -c10-20,c1-9,c21- > myrearrangedhugefile
This is also easy in perl:
perl -pe 's/^(\S+)\t(\S+)/$2\t$1/;' file > outputfile
You could do this in Perl:
perl -F\\t -nlae 'print join("\t", #F[1,0,2..$#F])' inputfile
The -F specifies the delimiter. In most shells you need to precede a backslash with another to escape it. On some platforms -F automatically implies -n and -a so they can be dropped.
For your problem you wouldn't need to use -l because the last columns appears last in the output. But if in a different situation, if the last column needs to appear between other columns, the newline character must be removed. The -l switch takes care of this.
The "\t" in join can be changed to anything else to produce a different delimiter in the output.
2..$#F specifies a range from 2 until the last column. As you might have guessed, inside the square brackets, you can put any single column or range of columns in the desired order.
No need to call anything else but your shell:
bash> while read col1 col2 rest; do
echo $col2 $col1 $rest
done <input_file
Test:
bash> echo "first second a c d e f g" |
while read col1 col2 rest; do
echo $col2 $col1 $rest
done
second first a b c d e f g
Maybe even with "inlined" Python - as in a Python script within a shell script - but only if you want to do some more scripting with Bash beforehand or afterwards... Otherwise it is unnecessarily complex.
Content of script file process.sh:
#!/bin/bash
# inline Python script
read -r -d '' PYSCR << EOSCR
from __future__ import print_function
import codecs
import sys
encoding = "utf-8"
fn_in = sys.argv[1]
fn_out = sys.argv[2]
# print("Input:", fn_in)
# print("Output:", fn_out)
with codecs.open(fn_in, "r", encoding) as fp_in, \
codecs.open(fn_out, "w", encoding) as fp_out:
for line in fp_in:
# split into two columns and rest
col1, col2, rest = line.split("\t", 2)
# swap columns in output
fp_out.write("{}\t{}\t{}".format(col2, col1, rest))
EOSCR
# ---------------------
# do setup work?
# e. g. list files for processing
# call python script with params
python3 -c "$PYSCR" "$inputfile" "$outputfile"
# do some more processing
# e. g. rename outputfile to inputfile, ...
If you only need to swap the columns for a single file, then you can also just create a single Python script and statically define the filenames. Or just use an answer above.
awk swapping sans temp-variable :
echo '777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541' |
mawk '1; ($1 = $2 substr(_, ($2 = $1)^_))^_' FS=':' OFS=':'
777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541
317 647 14423 262927714037 :777777744444444464449: 0x2A29D5A1BAA7A95541