How do I remove selected endlines with sed? - sed

I'm trying to remove endlines for all lines in my file where the endline splits two equal signs
ie:
1
a=
=b
2
to
1
a==b
2
I have
sed -i.bak -e 's/=\n =//g' fileName
however, it doesn't seem to make any changes to my file. Is my script correct?

Try this. It saves the whole file content in pattern space and the removes all newline characters between equal signs.
sed -i.bak -e ':a ; $! { N; b a }; s/=\n=/==/g' fileName
It yields:
1
a==b
2

This might work for you (GNU sed):
sed '$!N;s/=\n=/==/;P;D' file
or
sed -e '$!N' -e 's/='$"\n"'=/==/' -e 'P' -e 'D' file

Different seds on different OSs treat newlines in different ways. The most portable way to specify a newline in sed is to use backslash before a return:
sed -e 's/=\
=//g' file
BUT that's not going to work for you until you invoke some other magic sed characters to slurp up multiple lines into a buffer, etc....
Just use awk:
$ cat file
1
a=
=b
2
$ awk '{printf "%s%s", $0, (/=$/ ? "" : "\n")}' file
1
a==b
2
Just prints the current line followed by nothing if the current line ends in an "=" or a newline otherwise. Couldn't be simpler and it's highly portable....
Oh, and if you want to change your original file, that's just:
awk '{printf "%s%s", $0, (/=$/ ? "" : "\n")}' file > tmp && mv tmp file

Related

How to replace only specific spaces in a file using sed?

I have this content in a file where I want to replace spaces at certain positions with pipe symbol (|). I used sed for this, but it is replacing all the spaces in the string. But I don't want to replace the space for the 3rd and 4th string.
How to achieve this?
Input:
test test test test
My attempt:
sed -e 's/ /|/g file.txt
Expected Output:
test|test|test test
Actual Output:
test|test|test|test
sed 's/ /\
/3;y/\n / |/'
As newline cannot appear in a sed pattern space, you can change the third space to a newline, then change all newlines and spaces to spaces and pipes.
GNU sed can use \n in the replacement text:
sed 's/ /\n/3;y/\n / |/'
If the original input doesn't contain any pipe characters, you can do
sed -e 's/ /|/g' -e 's/|/ /3' file
to retain the third white space. Otherwise see other answers.
You could replace the 'first space' twice, e.g.
sed -e 's/ /|/' -e 's/ /|/' file.txt
Or, if you want to specify the positions (e.g. the 2nd and 1st spaces):
sed -e 's/ /|/2' -e 's/ /|/1' file.txt
Using GNU sed to replace the first and second one or more whitespace chunks:
sed -i -E 's/\s+/|/;s/\s+/|/' file
See the online demo.
Details
-i - inline replacements on
-E - POSIX ERE syntax enabled
s/\s+/|/ - replaces the first one or more whitespace chars
; - and then
s/\s+/|/ the second one or more whitespace chars on each line (if present).
Keep it simple and use awk, e.g. using any awk in any shell on every Unix box no matter what other characters your input contains:
$ awk '{for (i=1;i<NF;i++) sub(/ /,"|")} 1' file
test|test|test test
The above replaces all but the last " " on each line. If you want to replace a specific number, e.g. 2, then just change NF to 2.

Parse file and insert new line after each occurrence

On a Unix system I am trying to add a new line in a file using sed or perl but it seems I am missing something.
Supposing my file has multiple lines of texts, always ending like this {TNG:}}${1:F01.
I am trying to find a to way to add a new line after the }$, in this way {1 should always start on a new line.
I tried it by escaping $ sign using this:
perl -e '$/ = "\${"; while (<>) { s/\$}\{$/}\n{/; print; }' but it does not work.
Any ideas will be appreciated.
give this a try:
sed 's/{TNG:}}\$/&\n/' file > newfile
The sed will by default use BRE, that is, the {}s are literal characters. But we must escape the $.
kent$ cat f
{TNG:}}${1:F01.
kent$ sed 's/{TNG:}}\$/&\n/' f
{TNG:}}$
{1:F01.
With perl:
$ cat input.txt
line 1 {TNG:}}${1:F01
line 2 {TNG:}}${1:F01
$ perl -pe 's/TNG:\}\}\$\K/\n/' input.txt
line 1 {TNG:}}$
{1:F01
line 2 {TNG:}}$
{1:F01
(Read up on the -p and -n options in perlrun and use them instead of trying to do what they do in a one-liner yourself)

Data transformation using sed

I have a file like:
A
B
C
D
E
F
G
H
I
J
K
L
and I want it to come out like
A,B,C,D
E,F,G,H
I'm assuming I'd use sed, but actually I'm not even sure if that's the best tool. I'm open to using anything commonly available on a Linux system.
In perl, I did it like this ... it works, but it's dirty and has a trailing comma. Was hoping for something simpler:
$ perl -ne 'if (/^(\w)\R/) {print "$1,";} else {print "\n";}' test
A,B,C,D,
E,F,G,H,
I,J,K,L,
Set the input record separator to paragraph mode (-00) and then split each record on any remaining whitespace:
$ perl -00 -ne 'print join("," => split), "\n"' test
Add -l to enable automatic newlines (but make sure it comes before -00, because we want $\ to be set to the value of $/ before modification):
$ perl -l -00 -ne 'print join("," => split)' test
Add -a to enable autosplit mode and implicitly split to #F:
$ perl -l -00 -ane 'print join("," => #F)' test
Swap out -n for -p for automatic printing:
$ perl -l -00 -ape '$_ = join("," => #F)' test
You could use
awk 'BEGIN {RS=""; FS="\n"; ORS="\n"; OFS=","} {$1=$1} 1' file
I see the gawk manual says this:
If RS
is set to the null string, then records are separated by blank lines. When RS is set to the null string, the newline character always acts as a field separator, in addition to whatever value FS may have.
So we don't actually need to specify FS to get the desired output:
awk 'BEGIN {RS=""; ORS="\n"; OFS=","} {$1=$1} 1' file
xargs could do it,
$ xargs -n4 < file | tr ' ' ','
A,B,C,D
E,F,G,H
I,J,K,L
Replacing newlines with sed is a bit complicated (see this question). It is easier to use tr for the newlines. The rest can be done by sed.
The following command assumes that yourFile does not contain any ,.
tr '\n' , < yourFile | sed 's/,*$/\n/;s/,,/\n/g'
The tr part converts all newlines to ,. The resulting string will have no newlines.
s/,*$/\n/ removes trailing commas and appends a newline (text files usually end with a newline).
s/,,/\n/g replaces ,, by a newline. Two consecutive commas appear only where your original file contained two consecutive newlines, that is where the sections are separated by an empty line.

Sed or awk: how to call line addresses from separate file?

I have 'file1' with (say) 100 lines. I want to use sed or awk to print lines 23, 71 and 84 (for example) to 'file2'. Those 3 line numbers are in a separate file, 'list', with each number on a separate line.
When I use either of these commands, only line 84 gets printed:
for i in $(cat list); do sed -n "${i}p" file1 > file2; done
for i in $(cat list); do awk 'NR==x {print}' x=$i file1 > file2; done
Can a for loop be used in this way to supply line addresses to sed or awk?
This might work for you (GNU sed):
sed 's/.*/&p/' list | sed -nf - file1 >file2
Use list to build a sed script.
You need to do > after the loop in order to capture everything. Since you are using it inside the loop, the file gets overwritten. Inside the loop you need to do >>.
Good practice is to or use > outside the loop so the file is not open for writing during every loop iteration.
However, you can do everything in awk without for loop.
awk 'NR==FNR{a[$1]++;next}FNR in a' list file1 > file2
You have to >>(append to the file) . But you are overwriting the file. That is why, You are always getting 84 line only in the file2.
Try use,
for i in $(cat list); do sed -n "${i}p" file1 >> file2; done
With sed:
sed -n $(sed -e 's/^/-e /' -e 's/$/p/' list) input
given the example input, the inner command create a string like this: `
-e 23p
-e 71p
-e 84p
so the outer sed then prints out given lines
You can avoid running sed/awk in a for/while loop altgether:
# store all lines numbers in a variable using pipe
lines=$(echo $(<list) | sed 's/ /|/g')
# print lines of specified line numbers and store output
awk -v lineS="^($lines)$" 'NR ~ lineS' file1 > out

How can I apply Unix's / Sed's / Perl's transliterate (tr) to only a specific column?

I have program output that looks like this (tab delim):
$ ./mycode somefile
0000000000000000000000000000000000 238671
0000000000000000000000000000000001 0
0000000000000000000000000000000002 0
0000000000000000000000000000000003 0
0000000000000000000000000000000010 0
0000000000000000000000000000000011 1548.81
0000000000000000000000000000000012 0
0000000000000000000000000000000013 937.306
What I want to do is on FIRST column only: replace 0 with A, 1 with C, 2 with G, and 3 with T.
Is there a way I can transliterate that output piped directly from "mycode".
Yielding this:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 238671
...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACT 937.306
Using Perl:
C:\> ./mycode file | perl -lpe "($x,$y)=split; $x=~tr/0123/ACGT/; $_=qq{$x\t$y}"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 238671
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC 0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG 0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAT 0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACA 0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACC 1548.81
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACG 0
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACT 937.306
You can use single quotes in Bash:
$ ./mycode file | perl -lpe '($x,$y)=split; $x=~tr/0123/ACGT/; $_="$x\t$y"'
As #ysth notes in the comments, perl actually provides the command line options -a and -F:
-a autosplit mode with -n or -p (splits $_ into #F)
...
-F/pattern/ split() pattern for -a switch (//'s are optional)
Using those:
perl -lawnF'\t' -e '$,="\t"; $F[0] =~ y/0123/ACGT/; print #F'
It should be possible to do it with sed, put this in a file (you can do it command-line to, with -e, just don't forget those semicolons, or use separate -e for each line). (EDIT: Keep in mind, since your data is tab delimited, it should in fact be a tab character, not a space, in the first s//, make sure your editor doesn't turn it into spaces)
#!/usr/bin/sed -f
h
s/ .*$//
y/0123/ACGT/
G
s/\n[0-3]*//
and use
./mycode somefile | sed -f sedfile
or chmod 755 sedfile and do
./mycode somefile | sedfile
The steps performed are:
copy buffer to hold space (replacing held content from previous line, if any)
remove trailing stuff (from first space to end of line)
transliterate
append contents from hold space
remove the newline (from the append step) and all digits following it (up to the space)
Worked for me on your data at least.
EDIT:
Ah, you wanted a one-liner...
GNU sed
sed -e "h;s/ .*$//;y/0123/ACGT/;G;s/\n[0-3]*//"
or old-school sed (no semicolons)
sed -e h -e "s/ .*$//" -e "y/0123/ACGT/" -e G -e "s/\n[0-3]*//"
#sarathi
\AWK solution for this
awk '{gsub("0","A",$1);gsub("1","C",$1);gsub("2","G",$1);gsub("3","T",$1); print $1"\t"$2}' temp.txt