unescaped newline inside substitute pattern - sed

I have a txt file with a list of 100 countries, without quotation marks around them. I am trying to change this
Canada
USA
into this
countries['Canada']=true
etc.
This is the sed command I am trying, with '\1' representing the country in quotation marks.
sed -e "s/\(.*\)/countries['\1']=true" source.txt > output.txt
The error I'm getting is
unescaped newline inside substitute pattern
What sed command do I need to achieve what I'm trying to do, and why am I getting this error

You just missed a trailing / at the end:
v
$ sed -e "s/\(.*\)/countries['\1']=true/" file
countries['Canada']=true
countries['USA']=true
Note also that you don't need to catch group, just match everything with .* and then use & to print it back:
|-------------|
vv v
$ sed -e "s/.*/countries['&']=true/" a
countries['Canada']=true
countries['USA']=true

I would just add stuff at the beginning and end:
sed -e "s/^/countries['/" -e "s/$/']=true/" source.txt > output.txt

Related

How do I join the previous line with the current line with sed?

I have a file with the following content.
test1
test2
test3
test4
test5
If I want to concatenate all lines into one line separated by commas, I can use vi and run the following command:
:%s/\n/,/g
I then get this, which is what I want
test1,test2,test3,test4,test5,
I'm trying to use sed to do the same thing but I'm missing some unknown command/option to make it work. When I look at the file in vi and search for "\n" or "$", it finds the newline or end of line. However, when I tell sed to look for a newline, it pretends it didn't find one.
$ cat test | sed --expression='s/\n/,/g'
test1
test2
test3
test4
test5
$
If I tell sed to look for end of line, it finds it and inserts the comma but it doesn't concatenate everything into one line.
$ cat test | sed --expression='s/$/,/g'
test1,
test2,
test3,
test4,
test5,
$
What command/option do I use with sed to make it concatenate everything into one line and replace the end of line/newline with a comma?
sed reads one line at a time, so, unless you're doing tricky things, there's never a newline to replace.
Here's the trickiness:
$ sed -n '1{h; n}; H; ${g; s/\n/,/gp}' test.file
test1,test2,test3,test4,test5
h, H, g documented at https://www.gnu.org/software/sed/manual/html_node/Other-Commands.html
When using a non-GNU sed, as found on MacOS, semi-colons before the closing braces are needed.
However, paste is really the tool for this job
$ paste -s -d, test.file
test1,test2,test3,test4,test5
If you really want the trailing comma:
printf '%s,\n' "$(paste -sd, file)"
tr instead of sed for this one:
$ tr '\n' ',' < input.txt
test1,test2,test3,test4,test5,
Just straight up translate newlines to commas.
Based on how can i replace each newline n with a space using sed:
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/,/g' <file>
testing:
$ cat file.txt
test1
test2
test3
test4
test5
$ sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/,/g' file.txt
test1,test2,test3,test4,test5
Of course, if the question would have been more generic: How do I replace \n with any character using sed then one should only replace the , with ones desired char:
export CHAR_TO_REPLACE=','
export FILE_TO_PROCESS=<filename>
sed -e ':a' -e 'N' -e '$!ba' -e "s/\n/${CHAR_TO_REPLACE}/g" $FILE_TO_PROCESS
This answer is to satisfy the requirement of using sed. Otherwise, you can use alternatives like tr, awk etc.
This might work for you (GNU sed):
sed 'H;1h;$!d;x;y/\n/,/' file
Append all lines but the first to the hold space (the first replaces the hold space).
If it is not the last line of the file, delete it.
Otherwise, swap to the hold space and translate all newlines to commas.

How to replace only specific spaces in a file using sed?

I have this content in a file where I want to replace spaces at certain positions with pipe symbol (|). I used sed for this, but it is replacing all the spaces in the string. But I don't want to replace the space for the 3rd and 4th string.
How to achieve this?
Input:
test test test test
My attempt:
sed -e 's/ /|/g file.txt
Expected Output:
test|test|test test
Actual Output:
test|test|test|test
sed 's/ /\
/3;y/\n / |/'
As newline cannot appear in a sed pattern space, you can change the third space to a newline, then change all newlines and spaces to spaces and pipes.
GNU sed can use \n in the replacement text:
sed 's/ /\n/3;y/\n / |/'
If the original input doesn't contain any pipe characters, you can do
sed -e 's/ /|/g' -e 's/|/ /3' file
to retain the third white space. Otherwise see other answers.
You could replace the 'first space' twice, e.g.
sed -e 's/ /|/' -e 's/ /|/' file.txt
Or, if you want to specify the positions (e.g. the 2nd and 1st spaces):
sed -e 's/ /|/2' -e 's/ /|/1' file.txt
Using GNU sed to replace the first and second one or more whitespace chunks:
sed -i -E 's/\s+/|/;s/\s+/|/' file
See the online demo.
Details
-i - inline replacements on
-E - POSIX ERE syntax enabled
s/\s+/|/ - replaces the first one or more whitespace chars
; - and then
s/\s+/|/ the second one or more whitespace chars on each line (if present).
Keep it simple and use awk, e.g. using any awk in any shell on every Unix box no matter what other characters your input contains:
$ awk '{for (i=1;i<NF;i++) sub(/ /,"|")} 1' file
test|test|test test
The above replaces all but the last " " on each line. If you want to replace a specific number, e.g. 2, then just change NF to 2.

How to add quote at the end of line by SED

sed -i 's/$/\'/g'
sed -i "s/$/\'/g"
How to escape both $ and ' by 1 command?
This might work for you (GNU sed):
sed 's/$/'\''/' file
Adds a single quote to the end of a line.
sed 's/\$/'\''/' file
Replaces a $ by a single quote.
sed 's/\$$/'\''/' file
Replaces a $ at the end of line by a single quote.
N.B. Surrounding sed commands by double quotes is fine for some interpolation but may return unexpected results.
Use octal values
sed 's/$/\o47/'
Care to use backslash + letter o minus + octal number 1 to 3 digit
Just don't use single quotes to start the sed script?
sed "s/$/'/"
The /g at the end means to apply everywhere it's found on each stream (line) - you don't need this since $ is a special character indicating end of stream.
To add a quote at the end of a line use
sed -i "s/$/'/g" file
sed -i 's/$/'"'"'/g' file
See proof.
If there are already single quotes, and you want to make sure there is single occurrence at the end of string use
sed -i "s/'*$/'/g" file
sed -i 's/'"'*"'$/'"'"'/g' file
See this proof.
To escape $ and ' chars use
sed -i "s/[\$']/\\\\&/g" file
See proof
[\$'] - matches $ (escaped as in double quotes it can be treated as a variable interpolation char) or '
\\\\& - a backslash (need 4, that is literal 2 backslashes, it is special in the replacement), and & is the whole match.

sed to copy part of line to end

I'm trying to copy part of a line to append to the end:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz
becomes:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz
I have tried:
sed 's/\(.*(GCA_\)\(.*\))/\1\2\2)'
$ f1=$'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz'
$ echo "$f1"
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz
$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\1\2\3\/\2\4/' <<<"$f1"
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz
sed -E (or -r in some systems) enables extended regex support in sed , so you don't need to escape the group parenthesis ( ).
The format (GCA_.[^.]*) equals to "get from GCA_ all chars up and excluding the first found dot" :
$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\2/' <<<"$f1"
GCA_900169985
Similarly (.[^_]*) means get all chars up to first found _ (excluding _ char). This is the regex way to perform a non greedy/lazy capture (in perl regex this would have been written something like as .*_?)
$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\3/' <<<"$f1"
.1
Short sed approach:
s="ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz"
sed -E 's/(GCA_[^._]+)\.([^_]+)/\1.\2\/\1/' <<< "$s"
The output:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz

Delete line if string between the 4th and 5th delimiter is empty

"text";"text";"text";"text";;"text";"text"
If after the 4th delimiter the next one is following the line should be deleted.
Actually i'm doing that by using sed
sed -n '/;;/!p' input.txt
Is this a reliable solution?
Thanks for help.
Securing a bit potential escaped double quote and internal ";" (thanks #SLePort for remark)
sed -e 'h;s/\\"//g' -e ':c' -e 's/^\(\("[^"]*";\)*"[^"]*\);/\1/;t c' -e '/^\([^;]*;\)\{4\};/d;h'
sed -r '/^([^;]+;){4}\s*;/d' input.txt
awk -F';' '$5' input.txt
To remove lines containing ; after fourth delimiter:
sed '/^\("*[^"]*"*;\)\{4\};/d' input.txt
This might work for you (GNU sed):
sed -r '/^("(\\.|[^"])*";){4};/d' file
If the fourth grouping of double quotes followed by semi colon, where the characters within the grouping are either a pair of a quote and any other character or not a double quote, is followed by a further semi colon, then delete the line.
A more efficient regexp would be:
sed -r '/^("[^"\\]*(\\.[^"\\]*)*";){4};/d' file
This uses the pattern normal*(abnormal normal*)*