Replace a matching string from the end of line using sed - sed

I have a file content like,
/var/lib/mlocate
/var/lib/dpkg/info/mlocate.conffiles
/var/lib/dpkg/info/mlocate.list
/var/lib/dpkg/info/mlocate.md5sums
/var/lib/dpkg/info/mlocate.postinst
/var/lib/dpkg/info/mlocate.postrm
/var/lib/dpkg/info/mlocate.prerm
In the above file content I want to replace the last slash(/) on every line with space using sed like below
/var/lib mlocate
/var/lib/dpkg/info mlocate.conffiles
/var/lib/dpkg/info mlocate.list
/var/lib/dpkg/info mlocate.md5sums
/var/lib/dpkg/info mlocate.postinst
/var/lib/dpkg/info mlocate.postrm
/var/lib/dpkg/info mlocate.prerm
Please help me in this.

sed 's|\(.*\)/|\1 |' file
sed's regexes are greedy by default, so .*/ matches everything up to the last / instance.
sed uses BREs (basic regular expressions) by default, in which, perhaps surprisingly, ( and ) must be \-escaped in order to enclose a capture group (capture the enclosed sub-expression separately).
GNU (Linux) Sed and BSD/macOS Sed support nonstandard option -E to enable EREs (extended regular expressions), in which case you don't need the escaping of parentheses:
sed -E 's|(.*)/|\1 |' file # or -r with GNU Sed / non-macOS BSD Sed
Note how regex delimiter | was chosen instead of the customary /; using | allows unescaped use of / as a literal to match.
In the replacement part, \1 refers to what the 1st (and only) capture group (\(...\)) matched, which is everything up to, but not including, the last /. By following \1 with a literal space you get the desired result.

awk version: NO reason to do the task with this. But it works :)
awk 'BEGIN{FS=OFS="/"} {$(NF-1)=$(NF-1) " " $NF; NF--} 1' inputfle
/var/lib mlocate
/var/lib/dpkg/info mlocate.conffiles
/var/lib/dpkg/info mlocate.list
/var/lib/dpkg/info mlocate.md5sums
/var/lib/dpkg/info mlocate.postinst
/var/lib/dpkg/info mlocate.postrm
/var/lib/dpkg/info mlocate.prerm

Related

Use sed to replace every character by itself followed by $n times a char?

I'm trying to run the command below to replace every char in DECEMBER by itself followed by $n question marks. I tried both escaping {$n} like so {$n} and leaving it as is. Yet my output just keeps being D?{$n}E?{$n}... Is it just not possible to do this with a sed?
How should i got about this.
echo 'DECEMBER' > a.txt
sed -i "s%\(.\)%\1\(?\){$n}%g" a.txt
cat a.txt
This might work for you (GNU sed):
n=5
sed -E ':a;s/[^\n]/&\n/g;x;s/^/x/;/x{'"$n"'}/{z;x;y/\n/?/;b};x;ba' file
Append a newline to each non-newline character in a line $n times then replace all newlines by the intended character ?.
N.B. The newline is chosen as the initial substitute character as it is not possible for it to be within a line (sed uses newlines to separate lines) and if the final substitution character already exists within the current line, the substitutions are correct.
Range (also, interval or limiting quantifiers), like {3} / {3,} / {3,6}, are part of regex, and not replacement patterns.
You can use
sed -i "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" a.txt
See the online demo:
#!/bin/bash
sed "s/./&$(for i in {1..7}; do echo -n '?'; done)/g" <<< "DECEMBER"
# => D???????E???????C???????E???????M???????B???????E???????R???????
Here, . matches any char, and & in the replacement pattern puts it back and $(for i in {1..7}; do echo -n '?'; done) adds seven question marks right after it.
This one-liner should do the trick:
sed 's/./&'$(printf '%*s' "$n" '' | tr ' ' '?')'/g' a.txt
with the assumption that $n expands to a positive integer and the command is executed in a POSIX shell.
Efficiently using any awk in any shell on every Unix box after setting n=2:
$ awk -v n="$n" '
BEGIN {
new = sprintf("%*s",n,"")
gsub(/./,"?",new)
}
{
gsub(/./,"&"new)
print
}
' a.txt
D??E??C??E??M??B??E??R??
To make the changes "inplace" use GNU awk with -i inplace just like GNU sed has -i.
Caveat - if the character you want to use in the replacement text is & then you'd need to use gsub(/./,"\\\\\\&",new) in the BEGIN section to make it is treated as literal instead of a backreference metachar. You'd have that issue and more (e.g. handling \1 or /) with any sed solution and any solution that uses double quotes around the script would have more issues with handling $s and the solutions that have a shell script expanding unquoted would have even more issues with globbing chars.

How can I achieve the following in sed?

The original text is:
apr_array_pstrcat(anythingbutalwayshereincludingspaces,anythingbutalwayshereincludingspaces, ',')
I want to change it to:
apr_array_pstrcat(samethingasabove,samethingasabove, ", ")
I got the following sed command, but it is not working:
find . -type f -exec sed -i "s/apr_array_pstrcat\((.*),(.*),(.*)','\)/apr_array_pstrcat\($1,$2,$3\", \"\)/g" {} +
How can I do this? I am able to understand PCRE regex, but I am not sure about this sed one.
Issues with OP's attempts:
-E is needed to enable ERE, otherwise \( and ( need to be reversed with default BRE
$1, $2, etc should be \1, \2, etc
there should be only two capture groups as per given sample
also, g flag isn't needed if there can be only one match per line
sed -E "s/apr_array_pstrcat\((.*),(.*)','\)/apr_array_pstrcat\(\1,\2\", \"\)/g"
This can be simplified to:
sed -E "s/(apr_array_pstrcat\(.*),(.*)','\)/\1,\2\", \"\)/g"
# or this one, since using double quotes for entire expression can lead to
# conflict with shell double quote interpretation
sed -E 's/(apr_array_pstrcat\(.*),(.*)\x27,\x27\)/\1,\2", "\)/g'
This can be further simplified depending on what kind of data is present in the input:
# change ',' to ", " if a line contains apr_array_pstrcat(
sed '/apr_array_pstrcat(/ s/\x27,\x27/", "/'
sed has the -E flag for "use extended regular expressions in the script".
I'd also match the arguments with 'anything that's not a comma': "[^,]+"
So this works for me:
sed -E "s/(apr_array_pstrcat\([^,]+, [^,]+,) ','\)/\1 \", \")/"

Escape backslash character in sed

I need to modify some Windows paths.
For instance,
D:\usr
to
D:\first\usr
So, I have created a variable.
$path = "first\usr"
then used the following command:
sed -i -e 's!\\usr!${path}/g;' test.txt
However, this ends up with the following:
D:\firstSr
How do I escape \u in sed?
Assuming your path variable was assigned properly (without spaces in the assignment: path='first\usr'), fixing step by step for an input file test.txt with one example path:
$ cat test.txt
D:\usr
Your original command
$ sed 's!\\usr!${path}/g;' test.txt
sed: -e expression #1, char 18: unterminated `s' command
doesn't do much, as you've mixed ! and / as the delimiter.
Fixing delimiters:
$ sed 's!\\usr!${path}!g;' test.txt
D:${path}
Now no interpolation happens at all because of the single quotes. I suspect these are just copy-paste mistakes, as you obviously got some output.
Double quotes:
$ sed "s!\\usr!${path}!g" test.txt
bash: !\\usr!${path}!g: event not found
Now this clashes with history expansion. We could escape the !, or use a different delimiter.
/ as delimiter:
$ sed "s/\\usr/${path}/g" test.txt
D:\firstSr
Now we're where the question actually started. ${path} expands to first\usr, but \u has a special meaning in GNU sed in the replacement string: it uppercases the following character, hence the S.
Even without the special meaning, \u would most likely just expand to u and the backslash would be gone.
Escaping the backslash:
$ path='first\\usr'
$ sed "s/\\usr/${path}/g" test.txt
D:\first\usr
This works.
Depending on which shell you are using, you may be able to use parameter expansion to double \ in your substitution string and prevent the \u interpretation:
path="first\usr"
sed -e "s/\\usr/${path//\\/\\\\}/g" <<< "D:\usr"
The syntax for replacing a pattern with the shell parameter expansion is ${parameter/pattern/string} (one replacement) or ${parameter//pattern/string} (replace all matches).
This substitution is not specified by POSIX, but is available in Bash.
Where it is not available, you may need to filter $path through a process:
path=$(echo "$path" | sed 's/[][\\*.%$]/\\&/g')
(N.B. I have also quoted other sed metacharacters in this filter).

What is wrong with this sed expression?

sed 's_((checksum|compressed)=\").*(\")_\1\2_' -i filename
I am using this command to replace the checksum and compressed filed with empty? But it didn't change anything?
for example, I want change this line " checksum="XXXXX" with checksum="", and also replace
compressed="XXXX" with compressed=""
What is wrong with my sed command?
It's because sed uses a funny regex dialect by default: you have to escape capturing brackets.
If you want to use "normal" regex that you're familiar with, use the -r flag (if you're on unix, GNU sed) or the -E flag (Mac OS X BSD sed):
sed -r 's_((checksum|compressed)=\").*(\")_\1\3_' -i filename
Additionally, note that you have three sets of capturing brackets in your sed, and I think you want to change the \1\2 to \1\3. (\1 contains checksum=", \2 contains checksum, and \3 contains ").
(For interest, here's how you would do it without the extended-regexp (-r/-E) flag, note that capturing brackets and the OR | are only considered in the regex sense if they are escaped:
sed 's_\(\(checksum\|compressed\)=\"\).*\(\"\)_\1\3_' -i filename
)
This might work for you:
echo 'checksum="XXXXX" compressed="YYYYYYY"' |
sed 's/\(checksum\|compressed\)="[^"]*"/\1=""/g'
checksum="" compressed=""
In sed (without the -r switch), ()|+?{}'s must have a \ prepended to give them the qualities of grouping. alternation, one or more, zero or one and intervals. .[]* work as metacharacters either way.
Try:
sed 's/\(\(checksum\|compressed\)\)="[^"]*"/\1=""/' -i filename

sed to remove URLs from a file

I am trying to write a sed expression that can remove urls from a file
example
http://samgovephotography.blogspot.com/ updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N https://hollywoodmomblog.com/?p=2442 Thx to HMB Contributor #kdpartak :)
But I dont get it:
sed 's/[\w \W \s]*http[s]*:\/\/\([\w \W]\)\+[\w \W \s]*/ /g' posFile
FIXED!!!!!
handles almost all cases, even malformed URLs
sed 's/[\w \W \s]*http[s]*[a-zA-Z0-9 : \. \/ ; % " \W]*/ /g' positiveTweets | grep "http" | more
The following removes http:// or https:// and everything up until the next space:
sed -e 's!http\(s\)\{0,1\}://[^[:space:]]*!!g' posFile
updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N Thx to HMB Contributor #kdpartak :)
Edit:
I should have used:
sed -e 's!http[s]\?://\S*!!g' posFile
"[s]\?" is a far more readable way of writing "an optional s" compared to "\(s\)\{0,1\}"
"\S*" a more readable version of "any non-space characters" than "[^[:space:]]*"
I must have been using the sed that came installed with my Mac at the time I wrote this answer (brew install gnu-sed FTW).
There are better URL regular expressions out there (those that take into account schemes other than HTTP(S), for instance), but this will work for you, given the examples you give. Why complicate things?
The accepted answer provides the approach that I used to remove URLs, etc. from my files. However it left "blank" lines. Here is a solution.
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' input_file
perl -i -pe 's/^'`echo "\012"`'${2,}//g' input_file
The GNU sed flags, expressions used are:
-i Edit in-place
-e [-e script] --expression=script : basically, add the commands in script
(expression) to the set of commands to be run while processing the input
^ Match start of line
$ Match end of line
? Match one or more of preceding regular expression
{2,} Match 2 or more of preceding regular expression
\S* Any non-space character; alternative to: [^[:space:]]*
However,
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g'
leaves nonprinting character(s), presumably \n (newlines). Standard sed-based approaches to remove "blank" lines, tabs and spaces, e.g.
sed -i 's/^[ \t]*//; s/[ \t]*$//'
do not work, here: if you do not use a "branch label" to process newlines, you cannot replace them using sed (which reads input one line at a time).
The solution is to use the following perl expression:
perl -i -pe 's/^'`echo "\012"`'${2,}//g'
which uses a shell substitution,
'`echo "\012"`'
to replace an octal value
\012
(i.e., a newline, \n), that occurs 2 or more times,
{2,}
(otherwise we would unwrap all lines), with something else; here:
//
i.e., nothing.
[The second reference below provides a wonderful table of these values!]
The perl flags used are:
-p Places a printing loop around your command,
so that it acts on each line of standard input
-i Edit in-place
-e Allows you to provide the program as an argument,
rather than in a file
References:
perl flags: Perl flags -pe, -pi, -p, -w, -d, -i, -t?
ASCII control codes: https://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/
remove URLs: sed to remove URLs from a file
branch labels: How can I replace a newline (\n) using sed?
GNU sed manual: https://www.gnu.org/software/sed/manual/sed.html
quick regex guide: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
Example:
$ cat url_test_input.txt
Some text ...
https://stackoverflow.com/questions/4283344/sed-to-remove-urls-from-a-file
https://www.google.ca/search?dcr=0&ei=QCsyWtbYF43YjwPpzKyQAQ&q=python+remove++citations&oq=python+remove++citations&gs_l=psy-ab.3...1806.1806.0.2004.1.1.0.0.0.0.61.61.1.1.0....0...1c.1.64.psy-ab..0.0.0....0.-cxpNc6youY
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://bbengfort.github.io/tutorials/2016/05/19/text-classification-nltk-sckit-learn.html
http://datasynce.org/2017/05/sentiment-analysis-on-python-through-textblob/
https://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
http://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
ftp://ftp.ncbi.nlm.nih.gov/
ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/alignment_indices/20100804.alignment.index
Some more text.
$ sed -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' url_test_input.txt > a
$ cat a
Some text ...
Some more text.
$ perl -i -pe 's/^'`echo "\012"`'${2,}//g' a
Some text ...
Some more text.
$