Appending text at EOL not working

Appending text at EOL not working - sed

I'm trying to append a string at the end of each line in a file, but it's not working.
The file has the format
1992988282,78.93,SOMETEXT
and I tried
sed -e 's/$/,2012-09-03/' sample.csv
but the text is getting appended and replacing the characters at the beginning of the line. I have tried
awk '{ print $0 ",2012-09-03" }' < sample.csv
and I'm getting the exact same problem.

sed -r 's/[[:cntrl:]]*$/,2012-09-03&/' sample.csv
so you keep the ended control char in place (unless you want to use it without, than the last & is to be removed)

Related

Replace string with file content

$ cat input.txt
abc
$ sed -e '/PLACE_HOLDER/ {
s/PLACE_HOLDER//g
r input.txt
}' <<< '<div>PLACE_HOLDER</div>'
<div></div>
abc
I try to replace PLACE_HOLDER with the content of a file. But it pastes the file content after the matching line. How to just replace the match?
This is not a duplicate of
Use the contents of a file to replace a string using SED
none of the answers there answer my question specifically. For the 2nd one, which use a bash variable. It is not appropriate when the file is very large. For the first one does not have the problem in my example. In fact, my code is exactly the same as the first answer.

Like you discovered, the r command inserts new lines after the current line.
That's not suitable if you want to embed the contents of another file in the middle of other text on the same line which should not be replaced.
A crude fix is to build a sed script from your input file. Notice then that any & characters in the input file have to be escaped, as well as any literal newlines.
Because we will be escaping ampersands, I decided to use that as the separator for the s command, too.
sed 's/\&/\\&/g
1s/^/s\&PLACE_HOLDER\&/
$!s/$/\\/
$s/$/\&/' input.txt |
sed -f - targetfile
Unfortunately, because standard input is tied to -f - your script can't process standard input for replacements. A simple workaround for that is to save the generated sed script to a temporary file and pass that as the value for the -f option; this will also be necessary if your sed is one which does not accept the script on standard input.
I believe this should be reasonably portable, apart from the notes about -f - above.
Demo: https://ideone.com/oVgIni

Using any awk:
$ awk '
BEGIN { old="PLACE_HOLDER" }
NR==FNR { new=(NR>1 ? new ORS : "") $0; next }
s=index($0,old) { $0=substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
' input.txt - <<< '<div>PLACE_HOLDER</div>'
<div>abc</div>
The above will work no matter which characters are present in the string you want to match or the file you want to replace it with.

This might work for you (GNU sed):
sed -i 's/PLACE_HOLDER/$(cat input.txt)/g;s/.*/echo "&"/e' file
Substitute the evaluated expression cat input.txt for each match of PLACE_HOLDER globally throughout file.

GREP Print Blank Lines For Non-Matches

I want to extract strings between two patterns with GREP, but when no match is found, I would like to print a blank line instead.
Input
This is very new
This is quite old
This is not so new
Desired Output
is very
is not so
I've attempted:
grep -o -P '(?<=This).*?(?=new)'
But this does not preserve the second blank line in the above example. Have searched for over an hour, tried a few things but nothing's worked out.
Will happily used a solution in SED if that's easier!

You can use
#!/bin/bash
s='This is very new
This is quite old
This is not so new'
sed -En 's/.*This(.*)new.*|.*/\1/p' <<< "$s"
See the online demo yielding
is very
is not so
Details:
E - enables POSIX ERE regex syntax
n - suppresses default line output
s/.*This(.*)new.*|.*/\1/ - finds any text, This, any text (captured into Group 1, \1, and then any text again, or the whole string (in sed, line), and replaces with Group 1 value.
p - prints the result of the substitution.
And this is what you need for your actual data:
sed -En 's/.*"user_ip":"([^"]*).*|.*/\1/p'
See this online demo. The [^"]* matches zero or more chars other than a " char.

With your shown samples, please try following awk code.
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} NF!=3{print ""}' Input_file
OR
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} {print ""}' Input_file
Explanation: Simple explanation would be, setting This\\s+ OR \\s+new as field separators for all the lines of Input_file. Then in main program checking condition if NF(number of fields) are 3 then print 2nd field (where next will take cursor to next line). In another condition checking if NF(number of fields) is NOT equal to 3 then simply print a blank line.

sed:
sed -E '
/This.*new/! s/.*//
s/.*This(.*)new.*/\1/
' file
first line: lines not matching "This.*new", remove all characters leaving a blank line
second lnie: lines matching the pattern, keep only the "middle" text
this is not the pcre non-greedy match: the line
This is new but that is not new
will produce the output
is new but that is not
To continue to use PCRE, use perl:
perl -lpe '$_ = /This(.*?)new/ ? $1 : ""' file

This might work for you:
sed -E 's/.*This(.*)new.*|.*/\1/' file
If the first match is made, the line is replace by everything between This and new.
Otherwise the second match will remove everything.
N.B. The substitution will always match one of the conditions. The solution was suggested by Wiktor Stribiżew.

sed - Replace comma after first regex match

i m trying to perform the following substitution on lines of the general format:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......
as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .
I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?
sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"
Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:
gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'
That yielded the same file unchanged.

CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need
gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," }
{ for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file
See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.

The following sed will convert all decimal separators in quoted numeric fields:
sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'
See: https://www.regular-expressions.info/floatingpoint.html

This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file
This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.
N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.

If your input always follows that format of only one quoted field containing 1 comma then all you need is:
$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......
If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.

Assuming you have this:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC
Try this:
awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","
Output will be:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC
You simply need to know the field numbers for replacing the field separator between them.

In order to use regexp as in perl you have to activate extended regular expression with -r.
So if you want to replace all numbers and omit the " sign, then you can use this:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'
If you want to replace first occurrence only you can use that:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'
https://www.gnu.org/software/sed/manual/sed.txt

put all separate paragraphs of a file into a separate line

I have a file that contains sequence data, where each new paragraph (separated by two blank lines) contain a new sequence:
#example
ASDHJDJJDMFFMF
AKAKJSJSJSL---
SMSM-....SKSKK
....SK
SKJHDDSNLDJSCC
AK..SJSJSL--HG
AHSM---..SKSKK
-.-GHH
and I want to end up with a file looking like:
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
each sequence is the same length (if that helps).
I would also be looking to do this over multiple files stored in different directiories.
I have just tried
sed -e '/./{H;$!d;}' -e 'x;/regex/!d' ./text.txt
however this just deleted the entire file :S
any help would bre appreciated - doesn't have to be in sed, if you know how to do it in perl or something else then that's also great.
Thanks.

All you're asking to do is convert a file of blank-lines-separated records (RS) where each field is separated by newlines into a file of newline-separated records where each field is separated by nothing (OFS). Just set the appropriate awk variables and recompile the record:
$ awk '{$1=$1}1' RS= OFS= file
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH

awk '
/^[[:space:]]*$/ {if (line) print line; line=""; next}
{line=line $0}
END {if (line) print line}
'
perl -00 -pe 's/\n//g; $_.="\n"'
For multiple files:
# adjust your glob pattern to suit,
# don't be shy to ask for assistance
for file in */*.txt; do
newfile="/some/directory/$(basename "$file")"
perl -00 -pe 's/\n//g; $_.="\n"' "$file" > "$newfile"
done

A Perl one-liner, if you prefer:
perl -nle 'BEGIN{$/=""};s/\n//g;print $_' file
The $/ variable is the equivalent of awk's RS variable. When set to the empty sting ("") it causes two or more empty lines to be treated as one empty line. This is the so-called "paragraph-mode" of reading. For each record read, all newline characters are removed. The -l switch adds a newline to the end of each output string, thus giving the desired result.

just try to find those double linebreaks: \n or \r and replace first those with an special sign like :$:
after that you replace every linebreak with an empty string to get the whole file in one line.
next, replace your special sign with a simple line break :)

Perl from command line: When replace a string in a file it removes also the new lines

I'm using perl from command line to to replace duplicate spaces from a text file.
The command I use is:
perl -pi -e 's/\s+/ /g' file.csv
The problem: This procedure removes also the new lines in the resulting file....
Any idea why this occur?
Thanks!

\s means the five characters: [ \f\n\r\t]. So, you're replacing newlines by single spaces.
In your case, the simplest way is to enable automatic line-ending processing with -l flag:
perl -pi -le 's/\s+/ /g' file.csv
This way, newlines will be chomped before -e statement and appended after.

Will add my two cents to the previous answer.
If you use this regexp in perl script itself, then you can just change it to:
s/[ ]+/ /gis;
That will change every line and won't delete line-endings.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Appending text at EOL not working - sed

sed -r 's/[[:cntrl:]]*$/,2012-09-03&/' sample.csv so you keep the ended control char in place (unless you want to use it without, than the last & is to be removed)

Related

Replace string with file content

GREP Print Blank Lines For Non-Matches

sed - Replace comma after first regex match

put all separate paragraphs of a file into a separate line

Perl from command line: When replace a string in a file it removes also the new lines

Categories

Resources