Sed command to remove all lines not containing punctuation - sed

I'm struggling to find a sed command to remove all lines in a text file that do not contain punctuation (of any kind) without doing each manually.
For example:
111.222.123.234
222.11.34.54
word # To remove
www.facebook.com
www.stackoverflow.com
another # To remove
random#email.com
Does such a command exist?

You can use the [:punct:] character class, which corresponds to
[!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~]
and negate it:
$ sed '/[[:punct:]]/!d' infile
111.222.123.234
222.11.34.54
www.facebook.com
www.stackoverflow.com
random#email.com
Or, instead of the negated match, negate the character class directly:
sed '/[^[:punct:]]/d'
Or don't print anything unless a line does contain a punctuation character:
sed -n '/[[:punct:]]/p'
Or use grep instead of sed:
grep '[[:punct:]]' infile

Related

sed - Replace comma after first regex match

i m trying to perform the following substitution on lines of the general format:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......
as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .
I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?
sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"
Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:
gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'
That yielded the same file unchanged.
CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need
gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," }
{ for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file
See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.
The following sed will convert all decimal separators in quoted numeric fields:
sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'
See: https://www.regular-expressions.info/floatingpoint.html
This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file
This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.
N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.
If your input always follows that format of only one quoted field containing 1 comma then all you need is:
$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......
If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.
Assuming you have this:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC
Try this:
awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","
Output will be:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC
You simply need to know the field numbers for replacing the field separator between them.
In order to use regexp as in perl you have to activate extended regular expression with -r.
So if you want to replace all numbers and omit the " sign, then you can use this:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'
If you want to replace first occurrence only you can use that:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'
https://www.gnu.org/software/sed/manual/sed.txt

sed command not working properly on ubuntu

I have one file named `config_3_setConfigPW.ldif? containing the following line:
{pass}
on terminal, I used following commands
SLAPPASSWD=Pwd&0011
sed -i "s#{pass}#$SLAPPASSWD#" config_3_setConfigPW.ldif
It should replace {pass} to Pwd&0011 but it generates Pwd{pass}0011.
The reason is that the SLAPPASSWD shell variable is expanded before sed sees it. So sed sees:
sed -i "s#{pass}#Pwd&0011#" config_3_setConfigPW.ldif
When an "&" is on the right hand side of a pattern it means "copy the matched input", and in your case the matched input is "{pass}".
The real problem is that you would have to escape all the special characters that might arise in SLAPPASSWD, to prevent sed doing this. For example, if you had character "#" in the password, sed would think it was the end of the substitute command, and give a syntax error.
Because of this, I wouldn't use sed for this. You could try gawk or perl?
eg, this will print out the modified file in awk (though it still assumes that SLAPPASSWD contains no " character
awk -F \{pass\} ' { print $1"'${SLAPPASSWD}'"$2 } ' config_3_setConfigPW.ldif
That's because$SLAPPASSWD contains the character sequences & which is a metacharacter used by sed and evaluates to the matched text in the s command. Meaning:
sed 's/{pass}/match: &/' <<< '{pass}'
would give you:
match: {pass}
A time ago I've asked this question: "Is it possible to escape regex metacharacters reliably with sed". Answers there show how to reliably escape the password before using it as the replacement part:
pwd="Pwd&0011"
pwdEscaped="$(sed 's/[&/\]/\\&/g' <<< "$pwd")"
# Now you can safely pass $pwd to sed
sed -i "s/{pass}/$pwdEscaped/" config_3_setConfigPW.ldif
Bear in mind that sed NEVER operates on strings. The thing sed searches for is a regexp and the thing it replaces it with is string-like but has some metacharacters you need to be aware of, e.g. & or \<number>, and all of it needs to avoid using the sed delimiters, / typically.
If you want to operate on strings you need to use awk:
awk -v old="{pass}" -v new="$SLAPPASSWD" 's=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' file
Even the above would need tweaked if old or new contained escape characters.

Printing all words that start with "#" using sed in BASH

I have a file with a lot of text, but I want to print only words that contain "#" at the beginning. Ex:
My name is #Laura and I live in #London. Name=#Laura. City=#London
How can I print all words that start with #?.I did this the following and it worked, but I want to do it using sed. I tried several patters, but I cannot make it print anything.
grep -o -E "#\w+" file.txt
Thanks
Use this sed command:
sed 's/[^#]*\(#[^ .]*\)/\1\n/g' file.txt
Explanation: we invoke the substitution command of sed. This has following structure: sed 's/regex/replace/options'. We will search for a regex and replace it using the g option. g makes sure the match is made multiple times per line.
We look for a series of non at chars followed by an # and a number of non-spaces #[^ ]*. We put this last part in a group \(\) and sub it during the replacement \1.
Note that we add a newline at the end of each match, you can also get the output on a single line by omitting the \n.

sed pattern negation with a comma separated line

I have a text file full of lines looking like:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
I am trying to change all of the commas , to pipes |, except for the commas within the quotes.
Trying to use sed (which I am new to)... and it is not working. Using:
sed '/".*"/!s/\,/|/g' textfile.csv
Any thoughts?
As a test case, consider this file:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
"x,y,z",foo,"a,b,c",foo,"yes,no",foo
Here is a sed command to replace non-quoted commas with pipe symbols:
$ sed -r ':a; s/^([^"]*("[^"]*"[^"]*)*),/\1|/g; t a' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
"x,y,z"|foo|"a,b,c"|foo|"yes,no"|foo
Explanation
This looks for commas that appear after pairs of double quotes and replaces them with pipe symbols.
:a
This defines a label a.
s/^([^"]*("[^"]*"[^"]*)*),/\1|/g
If 0, 2, 4, or any an even number of quotes precede a comma on the line, then replace that comma with a pipe symbol.
^
This matches at the start of the line.
(`
This starts the main grouping (\1).
[^"]*
This looks for zero or more non-quote characters.
("[^"]*"[^"]*)*
The * outside the parens means that we are looking for zero or more of the pattern inside the parens. The pattern inside the parens consists of a quote, any number of non-quotes, a quote and then any number on non-quotes.
In other words, this grouping only matches pairs of quotes. Because of the * outside the parens, it can match any even number of quotes.
)
This closes the main grouping
,
This requires that the grouping be followed by a comma.
t a
If the previous s command successfully made a substitution, then the test command tells sed to jump back to label a and try again.
If no substitution was made, then we are done.
using awk could be eaiser:
kent$ cat f
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
kent$ awk -F'"' -v OFS='"' '{for(i=1;i<=NF;i++)if(i%2)gsub(",","|",$i)}7' f
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
I suggest a language with a proper CSV parser. For example:
ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file
Female|$0 to $25,000|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
Here I would have used gnu awks FPAT. It define how a field looks like FS that tells what the separator is. Then you can just set the output separator to |
awk '{$1=$1}1' OFS=\| FPAT="([^,]+)|(\"[^\"]+\")" file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
If your awk does not support FPAT, this can be used:
awk -F, '{for (i=1;i<NF;i++) {c+=gsub(/\"/,"&",$i);printf "%s"(c%2?FS:"|"),$i}print $NF}' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
sed 's/"\(.*\),\(.*\)"/"\1##HOLD##\2"/g;s/,/|/g;s/##HOLD##/,/g'
This will match the text in quotes and put a placeholder for the commas, then switch all the other commas to pipes and put the placeholder back to commas. You can change the ##HOLD## text to whatever you want.

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.