Sed or awk: how to call line addresses from separate file? - sed

I have 'file1' with (say) 100 lines. I want to use sed or awk to print lines 23, 71 and 84 (for example) to 'file2'. Those 3 line numbers are in a separate file, 'list', with each number on a separate line.
When I use either of these commands, only line 84 gets printed:
for i in $(cat list); do sed -n "${i}p" file1 > file2; done
for i in $(cat list); do awk 'NR==x {print}' x=$i file1 > file2; done
Can a for loop be used in this way to supply line addresses to sed or awk?

This might work for you (GNU sed):
sed 's/.*/&p/' list | sed -nf - file1 >file2
Use list to build a sed script.

You need to do > after the loop in order to capture everything. Since you are using it inside the loop, the file gets overwritten. Inside the loop you need to do >>.
Good practice is to or use > outside the loop so the file is not open for writing during every loop iteration.
However, you can do everything in awk without for loop.
awk 'NR==FNR{a[$1]++;next}FNR in a' list file1 > file2

You have to >>(append to the file) . But you are overwriting the file. That is why, You are always getting 84 line only in the file2.
Try use,
for i in $(cat list); do sed -n "${i}p" file1 >> file2; done

With sed:
sed -n $(sed -e 's/^/-e /' -e 's/$/p/' list) input
given the example input, the inner command create a string like this: `
-e 23p
-e 71p
-e 84p
so the outer sed then prints out given lines

You can avoid running sed/awk in a for/while loop altgether:
# store all lines numbers in a variable using pipe
lines=$(echo $(<list) | sed 's/ /|/g')
# print lines of specified line numbers and store output
awk -v lineS="^($lines)$" 'NR ~ lineS' file1 > out

Related

Parse file and insert new line after each occurrence

On a Unix system I am trying to add a new line in a file using sed or perl but it seems I am missing something.
Supposing my file has multiple lines of texts, always ending like this {TNG:}}${1:F01.
I am trying to find a to way to add a new line after the }$, in this way {1 should always start on a new line.
I tried it by escaping $ sign using this:
perl -e '$/ = "\${"; while (<>) { s/\$}\{$/}\n{/; print; }' but it does not work.
Any ideas will be appreciated.
give this a try:
sed 's/{TNG:}}\$/&\n/' file > newfile
The sed will by default use BRE, that is, the {}s are literal characters. But we must escape the $.
kent$ cat f
{TNG:}}${1:F01.
kent$ sed 's/{TNG:}}\$/&\n/' f
{TNG:}}$
{1:F01.
With perl:
$ cat input.txt
line 1 {TNG:}}${1:F01
line 2 {TNG:}}${1:F01
$ perl -pe 's/TNG:\}\}\$\K/\n/' input.txt
line 1 {TNG:}}$
{1:F01
line 2 {TNG:}}$
{1:F01
(Read up on the -p and -n options in perlrun and use them instead of trying to do what they do in a one-liner yourself)

Which is the simple and fast UNIX command to print all lines from the last occurrence of a pattern?

Which is the simple and fast UNIX command to print all lines from the last occurrence of a pattern to the end of the file ?
sed -n '/pattern/,$p' file
This sed command prints from the first occurrence onwards.
This might work for you (GNU sed):
sed 'H;/pattern/h;$!d;x;//!d' file
Stashes the last pattern and following lines in the hold space and at end-of-file prints them out.
Or using the same method in awk:
awk '{x=x ORS $0};/pattern/{x=$0};END{if(x ~ //)print x}' file
However on my machine jaypals way with sed seems to be the quickest:
tac file | sed '/pattern/q' | tac
Reverse the file, print until the first pattern, exit and reverse the file.
tac file | awk '/pattern/{print;exit}1' | tac
Here's a Perlish way to do it:
perl -ne '$seen = 1, #a = () if /pattern/; push #a, $_; END { print #a if $seen }' file
Simplest solution is just to use a regex matching on the entire file:
perl -0777 -ne 'print $1 if /pattern(.*?)$/' file
A standalone awk:
awk '/pattern/{delete a;c=0}{a[c++]=$0}END{for (i=0;i<c;i++){print a[i]}}' file
Here is an pure awk
awk 'FNR==NR {if ($0~/pattern/) f=FNR;next} FNR==f {a=1}a' file{,}
It reads the file twice, and first time set a flag for last found of pattern, then print form pattern and out.
Or you can store data in an array like this:
awk '/pattern/ {f=NR} {a[NR]=$0} END {for (i=f;i<=NR;i++) print a[i]}' file
Using GNU awk for multi-char RS and gensub():
$ awk -v RS='^$' -v ORS= '{print gensub(/.*(pattern)/,"\\1","")}' file
e.g.:
$ cat file
a
b
c
b
d
$ awk -v RS='^$' -v ORS= '{print gensub(/.*(b)/,"\\1","")}' file
b
d
The above simply deletes from the start of the file up to just before the last occurrence of "b".

Remove all the characters from string after last '/'

I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.
You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt
perl -lne 'print $1 if(/(.*)\//)' your_file
Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file

delete a column with awk or sed

I have a file with three columns. I would like to delete the 3rd column(in-place editing). How can I do this with awk or sed?
123 abc 22.3
453 abg 56.7
1236 hjg 2.3
Desired output
123 abc
453 abg
1236 hjg
try this short thing:
awk '!($3="")' file
With GNU awk for inplace editing, \s/\S, and gensub() to delete
1) the FIRST field:
awk -i inplace '{sub(/^\S+\s*/,"")}1' file
or
awk -i inplace '{$0=gensub(/^\S+\s*/,"",1)}1' file
2) the LAST field:
awk -i inplace '{sub(/\s*\S+$/,"")}1' file
or
awk -i inplace '{$0=gensub(/\s*\S+$/,"",1)}1' file
3) the Nth field where N=3:
awk -i inplace '{$0=gensub(/\s*\S+/,"",3)}1' file
Without GNU awk you need a match()+substr() combo or multiple sub()s + vars to remove a middle field. See also Print all but the first three columns.
This might work for you (GNU sed):
sed -i -r 's/\S+//3' file
If you want to delete the white space before the 3rd field:
sed -i -r 's/(\s+)?\S+//3' file
It seems you could simply go with
awk '{print $1 " " $2}' file
This prints the two first fields of each line in your input file, separated with a space.
Try using cut... its fast and easy
First you have repeated spaces, you can squeeze those down to a single space between columns if thats what you want with tr -s ' '
If each column already has just one delimiter between it, you can use cut -d ' ' -f-2 to print fields (columns) <= 2.
for example if your data is in a file input.txt you can do one of the following:
cat input.txt | tr -s ' ' | cut -d ' ' -f-2
Or if you better reason about this problem by removing the 3rd column you can write the following
cat input.txt | tr -s ' ' | cut -d ' ' --complement -f3
cut is pretty powerful, you can also extract ranges of bytes, or characters, in addition to columns
excerpt from the man page on the syntax of how to specify the list range
Each LIST is made up of one range, or many ranges separated by commas.
Selected input is written in the same order that it is read, and is
written exactly once. Each range is one of:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte, character or field
-M from first to M'th (included) byte, character or field
so you also could have said you want specific columns 1 and 2 with...
cat input.txt | tr -s ' ' | cut -d ' ' -f1,2
Try this :
awk '$3="";1' file.txt > new_file && mv new_file file.txt
or
awk '{$3="";print}' file.txt > new_file && mv new_file file.txt
Try
awk '{$3=""; print $0}'
If you're open to a Perl solution...
perl -ane 'print "$F[0] $F[1]\n"' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the following perl code

How do I remove selected endlines with sed?

I'm trying to remove endlines for all lines in my file where the endline splits two equal signs
ie:
1
a=
=b
2
to
1
a==b
2
I have
sed -i.bak -e 's/=\n =//g' fileName
however, it doesn't seem to make any changes to my file. Is my script correct?
Try this. It saves the whole file content in pattern space and the removes all newline characters between equal signs.
sed -i.bak -e ':a ; $! { N; b a }; s/=\n=/==/g' fileName
It yields:
1
a==b
2
This might work for you (GNU sed):
sed '$!N;s/=\n=/==/;P;D' file
or
sed -e '$!N' -e 's/='$"\n"'=/==/' -e 'P' -e 'D' file
Different seds on different OSs treat newlines in different ways. The most portable way to specify a newline in sed is to use backslash before a return:
sed -e 's/=\
=//g' file
BUT that's not going to work for you until you invoke some other magic sed characters to slurp up multiple lines into a buffer, etc....
Just use awk:
$ cat file
1
a=
=b
2
$ awk '{printf "%s%s", $0, (/=$/ ? "" : "\n")}' file
1
a==b
2
Just prints the current line followed by nothing if the current line ends in an "=" or a newline otherwise. Couldn't be simpler and it's highly portable....
Oh, and if you want to change your original file, that's just:
awk '{printf "%s%s", $0, (/=$/ ? "" : "\n")}' file > tmp && mv tmp file