Have sed make substitute on string but SKIP first occurrence - sed

I have been through the sed one liners but am still having trouble with my goal. I want to substitue matching strings on all but the first occurrence of a line. My exact usage would be:
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | sed <<MAGIC HAPPENS>
cd /Users/joeuser/bump\ bonding/initial\ trials
The line replaced the space in bump bonding with the slash space bump\ bonding so that I can execute this line (since when the spaces aren't escaped I wouldn't be able to cd to it).
Update: I solved this by just using single quotes and outputting
cd 'blah blah/thing/another space/'
and then using source to execute the command. But it didn't answer my question. I'm still curious though... how would you use sed to fix it?

s/ /\\ /2g
The 2 specifies that the second one should apply, and the g specifies that all the rest should apply too. (This probably only works on GNU sed. According to the Open Group Base Specification, "If both g and n are specified, the results are unspecified.")

You can avoid the problem with g and n
Replace all of them, then undo the first one:
sed -e 's/ /\\ /g' -e 's/\\ / /1'
Here's another method which uses the t branch-if-substituted command:
sed ':a;s/\([^ ]* .*[^\\]\) \(.*\)/\1\\ \2/;ta'
which has the advantage of leaving existing backslash-space sequences in the input intact.

use awk
$ echo cd 'blah blah/thing/another space/' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd blah\ blah/thing/another\ space/
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd /Users/joeuser/bump\ bonding/initial\ trials

Related

Parsing a line with sed using regular expression

Using sed I want to parse Heroku's log-runtime-metrics like this one:
2016-01-29T00:38:43.662697+00:00 heroku[worker.2]: source=worker.2 dyno=heroku.17664470.d3f28df1-e15f-3452-1234-5fd0e244d46f sample#memory_total=54.01MB sample#memory_rss=54.01MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=17492pages sample#memory_pgpgout=3666pages
the desired output is:
worker.2: 54.01MB (54.01MB is being memory_total)
I could not manage although I tried several alternatives including:
sed -E 's/.+source=(.+) .+memory_total=(.+) .+/\1: \2/g'
What is wrong with my command? How can it be corrected?
The .+ after source= and memory_total= are both greedy, so they accept as much of the line as possible. Use [^ ] to mean "anything except a space" so that it knows where to stop.
sed -E 's/.+source=([^ ]+) .+memory_total=([^ ]+) .+/\1: \2/g'
Putting your content into https://regex101.com/ makes it really obvious what's going on.
I'd go for the old-fashioned, reliable, non-extended sed expressions and make sure that the patterns are not too greedy:
sed -e 's/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/'
The -e is not the opposite of -E, which is primarily a Mac OS X (BSD) sed option; the normal option for GNU sed is -r instead. The -e simply means that the next argument is an expression in the script.
This produces your desired output from the given line of data:
worker.2: 54.01MB
Bonus question: There are some odd lines within the stream, I can usually filter them out using a grep pipe like | grep memory_total. However if I try to use it along with the sed command, it does not work. No output is produced with this:
heroku logs -t -s heroku | grep memory_total | sed.......
Sometimes grep | sed is necessary, but it is often redundant (unless you are using a grep feature that isn't readily supported by sed, such as Perl regular expressions).
You should be able to use:
sed -n -e '/memory_total=/ s/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/p'
The -n means "don't print by default". The /memory_total=/ matches the lines you're after; the s/// content is the same as before. I removed the g suffix that was there previously; the regex would never match multiple times anyway. I added the p to print the line when the substitution occurs.

Using variables in sed -f (where sed script is in a file rather than inline)

We have a process which can use a file containing sed commands to alter piped input.
I need to replace a placeholder in the input with a variable value, e.g. in a single -e type of command I can run;
$ echo "Today is XX" | sed -e "s/XX/$(date +%F)/"
Today is 2012-10-11
However I can only specify the sed aspects in a file (and then point the process at the file), E.g. a file called replacements.sed might contain;
s/XX/Thursday/
So obviously;
$ echo "Today is XX" | sed -f replacements.sed
Today is Thursday
If I want to use an environment variable or shell value, though, I can't find a way to make it expand, e.g. if replacements.txt contains;
s/XX/$(date +%F)/
Then;
$ echo "Today is XX" | sed -f replacements.sed
Today is $(date +%F)
Including double quotes in the text of the file just prints the double quotes.
Does anyone know a way to be able to use variables in a sed file?
This might work for you (GNU sed):
cat <<\! > replacements.sed
/XX/{s//'"$(date +%F)"'/;s/.*/echo '&'/e}
!
echo "Today is XX" | sed -f replacements.sed
If you don't have GNU sed, try:
cat <<\! > replacements.sed
/XX/{
s//'"$(date +%F)"'/
s/.*/echo '&'/
}
!
echo "Today is XX" | sed -f replacements.sed | sh
AFAIK, it's not possible. Your best bet will be :
INPUT FILE
aaa
bbb
ccc
SH SCRIPT
#!/bin/sh
STRING="${1//\//\\/}" # using parameter expansion to prevent / collisions
shift
sed "
s/aaa/$STRING/
" "$#"
COMMAND LINE
./sed.sh "fo/obar" <file path>
OUTPUT
fo/obar
bbb
ccc
As others have said, you can't use variables in a sed script, but you might be able to "fake" it using extra leading input that gets added to your hold buffer. For example:
[ghoti#pc ~/tmp]$ cat scr.sed
1{;h;d;};/^--$/g
[ghoti#pc ~/tmp]$ sed -f scr.sed <(date '+%Y-%m-%d'; printf 'foo\n--\nbar\n')
foo
2012-10-10
bar
[ghoti#pc ~/tmp]$
In this example, I'm using process redirection to get input into sed. The "important" data is generated by printf. You could cat a file instead, or run some other program. The "variable" is produced by the date command, and becomes the first line of input to the script.
The sed script takes the first line, puts it in sed's hold buffer, then deletes the line. Then for any subsequent line, if it matches a double dash (our "macro replacement"), it substitutes the contents of the hold buffer. And prints, because that's sed's default action.
Hold buffers (g, G, h, H and x commands) represent "advanced" sed programming. But once you understand how they work, they open up new dimensions of sed fu.
Note: This solution only helps you replace entire lines. Replacing substrings within lines may be possible using the hold buffer, but I can't imagine a way to do it.
(Another note: I'm doing this in FreeBSD, which uses a different sed from what you'll find in Linux. This may work in GNU sed, or it may not; I haven't tested.)
I am in agreement with sputnick. I don't believe that sed would be able to complete that task.
However, you could generate that file on the fly.
You could change the date to a fixed string, like
__DAYOFWEEK__.
Create a temp file, use sed to replace __DAYOFWEEK__ with $(date +%Y).
Then parse your file with sed -f $TEMPFILE.
sed is great, but it might be time to use something like perl that can generate the date on the fly.
To add a newline in the replacement expression using a sed file, what finally worked for me is escaping a literal newline. Example: to append a newline after the string NewLineHere, then this worked for me:
#! /usr/bin/sed -f
s/NewLineHere/NewLineHere\
/g
Not sure it matters but I am on Solaris unix, so not GNU sed for sure.

sed to remove URLs from a file

I am trying to write a sed expression that can remove urls from a file
example
http://samgovephotography.blogspot.com/ updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N https://hollywoodmomblog.com/?p=2442 Thx to HMB Contributor #kdpartak :)
But I dont get it:
sed 's/[\w \W \s]*http[s]*:\/\/\([\w \W]\)\+[\w \W \s]*/ /g' posFile
FIXED!!!!!
handles almost all cases, even malformed URLs
sed 's/[\w \W \s]*http[s]*[a-zA-Z0-9 : \. \/ ; % " \W]*/ /g' positiveTweets | grep "http" | more
The following removes http:// or https:// and everything up until the next space:
sed -e 's!http\(s\)\{0,1\}://[^[:space:]]*!!g' posFile
updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N Thx to HMB Contributor #kdpartak :)
Edit:
I should have used:
sed -e 's!http[s]\?://\S*!!g' posFile
"[s]\?" is a far more readable way of writing "an optional s" compared to "\(s\)\{0,1\}"
"\S*" a more readable version of "any non-space characters" than "[^[:space:]]*"
I must have been using the sed that came installed with my Mac at the time I wrote this answer (brew install gnu-sed FTW).
There are better URL regular expressions out there (those that take into account schemes other than HTTP(S), for instance), but this will work for you, given the examples you give. Why complicate things?
The accepted answer provides the approach that I used to remove URLs, etc. from my files. However it left "blank" lines. Here is a solution.
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' input_file
perl -i -pe 's/^'`echo "\012"`'${2,}//g' input_file
The GNU sed flags, expressions used are:
-i Edit in-place
-e [-e script] --expression=script : basically, add the commands in script
(expression) to the set of commands to be run while processing the input
^ Match start of line
$ Match end of line
? Match one or more of preceding regular expression
{2,} Match 2 or more of preceding regular expression
\S* Any non-space character; alternative to: [^[:space:]]*
However,
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g'
leaves nonprinting character(s), presumably \n (newlines). Standard sed-based approaches to remove "blank" lines, tabs and spaces, e.g.
sed -i 's/^[ \t]*//; s/[ \t]*$//'
do not work, here: if you do not use a "branch label" to process newlines, you cannot replace them using sed (which reads input one line at a time).
The solution is to use the following perl expression:
perl -i -pe 's/^'`echo "\012"`'${2,}//g'
which uses a shell substitution,
'`echo "\012"`'
to replace an octal value
\012
(i.e., a newline, \n), that occurs 2 or more times,
{2,}
(otherwise we would unwrap all lines), with something else; here:
//
i.e., nothing.
[The second reference below provides a wonderful table of these values!]
The perl flags used are:
-p Places a printing loop around your command,
so that it acts on each line of standard input
-i Edit in-place
-e Allows you to provide the program as an argument,
rather than in a file
References:
perl flags: Perl flags -pe, -pi, -p, -w, -d, -i, -t?
ASCII control codes: https://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/
remove URLs: sed to remove URLs from a file
branch labels: How can I replace a newline (\n) using sed?
GNU sed manual: https://www.gnu.org/software/sed/manual/sed.html
quick regex guide: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
Example:
$ cat url_test_input.txt
Some text ...
https://stackoverflow.com/questions/4283344/sed-to-remove-urls-from-a-file
https://www.google.ca/search?dcr=0&ei=QCsyWtbYF43YjwPpzKyQAQ&q=python+remove++citations&oq=python+remove++citations&gs_l=psy-ab.3...1806.1806.0.2004.1.1.0.0.0.0.61.61.1.1.0....0...1c.1.64.psy-ab..0.0.0....0.-cxpNc6youY
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://bbengfort.github.io/tutorials/2016/05/19/text-classification-nltk-sckit-learn.html
http://datasynce.org/2017/05/sentiment-analysis-on-python-through-textblob/
https://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
http://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
ftp://ftp.ncbi.nlm.nih.gov/
ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/alignment_indices/20100804.alignment.index
Some more text.
$ sed -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' url_test_input.txt > a
$ cat a
Some text ...
Some more text.
$ perl -i -pe 's/^'`echo "\012"`'${2,}//g' a
Some text ...
Some more text.
$

How do I push `sed` matches to the shell call in the replacement pattern?

I need to replace several URLs in a text file with some content dependent on the URL itself. Let's say for simplicity it's the first line of the document at the URL.
What I'm trying is this:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \1 | head -n 1)/" file.txt
This doesn't work, since \1 is not set. However, the shell is getting called. Can I somehow push the sed match variables to that subprocess?
The accept answer is just plain wrong. Proof:
Make an executable script foo.sh:
#! /bin/bash
echo $* 1>&2
Now run it:
$ echo foo | sed -e "s/\\(foo\\)/$(./foo.sh \\1)/"
\1
$
The $(...) is expanded before sed is run.
So you are trying to call an external command from inside the replacement pattern of a sed substitution. I dont' think it can be done, the $... inside a pattern just allows you to use an already existent (constant) shell variable.
I'd go with Perl, see the /e option in the search-replace operator (s/.../.../e).
UPDATE: I was wrong, sed plays nicely with the shell, and it allows you do to that. But, then, the backlash in \1 should be escaped. Try instead:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \\1 | head -n 1)/" file.txt
Try this:
sed "s/^URL=\(.*\)/\1/" file.txt | while read url; do sed "s#URL=\($url\)#TITLE=$(curl -s $url | head -n 1)#" file.txt; done
If there are duplicate URLs in the original file, then there will be n^2 of them in the output. The # as a delimiter depends on the URLs not including that character.
Late reply, but making sure people don't get thrown off by the answers here -- this can be done in gnu sed using the e command. The following, for example, decrements a number at the beginning of a line:
echo "444 foo" | sed "s/\([0-9]*\)\(.*\)/expr \1 - 1 | tr -d '\n'; echo \"\2\";/e"
will produce:
443 foo

sed script to delete all characters up to & including the 2nd comma on a line

Can anyone explain how to use sed to delete all characters up to & including the 2nd comma on a line in a CSV file?
The beginning of a typical line might look like
1234567890,ABC/DEF, and the number of digits in the first column varies i.e. there might be 9 or 10 or 11 separate digits in random order, and the letters in the second column could also be random. This randomness and varying length makes it impossible to use any explicit pattern searching.
You could do it with sed like this
sed -e 's/^\([^,]*,\)\{2\}//'
not 100% sure on the syntax, I tried it, and it seems to work though. It'll delete zero-or-more of anything-but-a-comma followed by a comma, and all that is matched twice in succession.
But even easier would be to use cut, like this
cut -d, -f3-
which will use comma as a delimiter, and print fields 3 and up.
EDIT:
Just for the record, both sed and cut can work with a file as a parameter, just append it at the end like so
cut -d, -f3- myfile.txt
or you can pipe the output of your program through them
./myprogram | cut -d, -f3-
sed is not the "right" choice of tool (although it can be done). since you have structured data, you can use fields/delimiter method instead of creating complicated regex.
you can use cut
$ cut -f3- -d"," file
or gawk
$ gawk -F"," '{$1=$2=""}1' file
$ gawk -F"," '{for(i=3;i<NF;i++) printf "%s,",$i; print $NF}' file
Thanks for all replies - with the help provided I have written the simple executable script below which does what I want.
#!/bin/bash
cut -d, -f3- ~/Documents/forex_convert/input.csv |
sed -e '1d' \
-e 's/-/,/g' \
-e 's/ /,/g' \
-e 's/:/,/g' \
-e 's/,D//g' > ~/Documents/forex_convert/converted_input
exit