How to sed stuff within pairs of quotes?

How to sed stuff within pairs of quotes? - sed

I want to change lines like:
<A HREF="classes_index_additions.html"class="hiddenlink">
to
<A HREF="classes_index_additions.html" class="hiddenlink">
(note the added ' ' before class) but it should leave lines like
<meta name="generator" content="JDiff v1.1.1">
alone. sed -e 's|\("[^"]*"\)\([^ />]\)|\1 \2|g' satisfies the first condition but it changes the other text to
<meta name="generator" content=" JDiff v1.1.1"/>
How do I get sed to process the correct pairs of double quotes?

You can try this:
sed -e 's/"\([^" ]*\)=/" \1=/g'
But with sed, it may be possible that the regular expression matches other parts of your document that you didn't intend, so best to try it and look over the results to see if there are any unintended side effects!

You can try putting each attributes on a new line and then triming trailing spaces on each line before removing new lines.
sed -r 's/(\w*="[^"]*")/\n\1/g; s/ *\n/\n/g; s/\n/ /g'
This works as follow :
s/(\w*="[^"]*")/\n\1/g
Put every attributes on a new line so your node looks like this
<A
HREF="classes_index_additions.html"
class="hiddenlink">
After that you remove trailing spaces
s/ *\n/\n/g
And remove new lines
s/\n/ /g

Related

how to find a specific character combination and add a newline

I have a large file that looks like this
(something,something1,something2),(something,something1,something2)
how do I use sed and find ),( and replace it with );( or add a newline between the parentheses that has a comma character.
I did try sed 's/),(/),\n(/g' filename.txt but for some reason it does not work

for those who come here and want to know how this work without getting a lot of stackoverflow "greetings"
since I was on Mac os x you need to replace your \n with \'$'\n''
so to find ),( and add a new line between the parentheses this is the command I used
sed 's/;/\'$'\n''/g' testdone.txt > testdone2.txt
ES

echo "(something,something1,something2),(something,something1,something2)" | sed "s|),(|);(|"
This prints the below for me.
(something,something1,something2);(something,something1,something2)
For new line
echo "(something,something1,something2),(something,something1,something2)" | sed "s|),(|)\n(|"
And the above prints the below.
(something,something1,something2)
(something,something1,something2)

sed pattern negation with a comma separated line

I have a text file full of lines looking like:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
I am trying to change all of the commas , to pipes |, except for the commas within the quotes.
Trying to use sed (which I am new to)... and it is not working. Using:
sed '/".*"/!s/\,/|/g' textfile.csv
Any thoughts?

As a test case, consider this file:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
"x,y,z",foo,"a,b,c",foo,"yes,no",foo
Here is a sed command to replace non-quoted commas with pipe symbols:
$ sed -r ':a; s/^([^"]*("[^"]*"[^"]*)*),/\1|/g; t a' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
"x,y,z"|foo|"a,b,c"|foo|"yes,no"|foo
Explanation
This looks for commas that appear after pairs of double quotes and replaces them with pipe symbols.
:a
This defines a label a.
s/^([^"]*("[^"]*"[^"]*)*),/\1|/g
If 0, 2, 4, or any an even number of quotes precede a comma on the line, then replace that comma with a pipe symbol.
^
This matches at the start of the line.
(`
This starts the main grouping (\1).
[^"]*
This looks for zero or more non-quote characters.
("[^"]*"[^"]*)*
The * outside the parens means that we are looking for zero or more of the pattern inside the parens. The pattern inside the parens consists of a quote, any number of non-quotes, a quote and then any number on non-quotes.
In other words, this grouping only matches pairs of quotes. Because of the * outside the parens, it can match any even number of quotes.
)
This closes the main grouping
,
This requires that the grouping be followed by a comma.
t a
If the previous s command successfully made a substitution, then the test command tells sed to jump back to label a and try again.
If no substitution was made, then we are done.

using awk could be eaiser:
kent$ cat f
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
kent$ awk -F'"' -v OFS='"' '{for(i=1;i<=NF;i++)if(i%2)gsub(",","|",$i)}7' f
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive

I suggest a language with a proper CSV parser. For example:
ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file
Female|$0 to $25,000|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive

Here I would have used gnu awks FPAT. It define how a field looks like FS that tells what the separator is. Then you can just set the output separator to |
awk '{$1=$1}1' OFS=\| FPAT="([^,]+)|(\"[^\"]+\")" file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
If your awk does not support FPAT, this can be used:
awk -F, '{for (i=1;i<NF;i++) {c+=gsub(/\"/,"&",$i);printf "%s"(c%2?FS:"|"),$i}print $NF}' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive

sed 's/"\(.*\),\(.*\)"/"\1##HOLD##\2"/g;s/,/|/g;s/##HOLD##/,/g'
This will match the text in quotes and put a placeholder for the commas, then switch all the other commas to pipes and put the placeholder back to commas. You can change the ##HOLD## text to whatever you want.

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you

With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.

There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt

With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file

Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

Use Sed to modify a line that has an initial space and contains a comma

This should be extremely simple, but for the life of me I just can't get gnu-sed to do it this afternoon.
The file in question has lines that look like this:
PART NUMBER PART NUMBER QUANTITY WEIGHT -999 -4,999 -9,999
w/ UL APPROVAL
MIN-3
I need to prepend every line like the "MIN-3" line with a ">" character, and the only thing specifically differentiating those lines from the others are two things:
The first character is a space " ".
The lines do not contain a comma.
I've tried mostly things like any of the following:
/^ +[^,]+$/ s/^/>/
/^ +[\w\-]+$/ s/^/>/
/^ +(\w|\-)+$/ s/^/>/
I will admit, I am somewhat new to sed. :)
Edit: Answers that use perl, or awk could also be appreciated, though my initial target is sed.

try this:
sed '/^ [^,]*$/s/^/>/'
the output is, only the line with MIN-3 with leading >
sed default uses basic regex. so the + should be \+ in your script. I think that could be the problem killing your time. You could add -r however, to let sed use extended-regex.

According to your description this should do:
sed 's/^\([ ][^,]*\)$/> \1/' input
which matches the complete line if the line starts with a space and then contains anything but a comma until the end.

Here is a simple answer:
sed 's/^ [^,]*$/>&/'

Remove Leading Whitespace from File

My shell has a call to 'fortune' in my .login file, to provide me with a little message of the day. However, some of the fortunes begin with one leading whitespace line, some begin with two, and some don't have any leading whitespace lines at all. This bugs me.
I sat down to wrapper fortune with my own shell script, which would remove all the leading whitespace from the input, without destroying any formatting of the actual fortune, which may intentionally have lines of whitespace.
It doesn't appear to be an easy one-liner two-minute fix, and as I read(reed) through the man pages for sed and grep, I figured I'd ask our wonderful patrons here.

Using the same source as Dav:
# delete all leading blank lines at top of file
sed '/./,$!d'
Source: http://www.linuxhowtos.org/System/sedoneliner.htm?ref=news.rdf
Additionally, here's why this works:
The comma separates a "range" of operation. sed can accept regular expressions for range definitions, so /./ matches the first line with "anything" (.) on it and $ specifies the end of the file. Therefore,
/./,$ matches "the first not-blank line to the end of the file".
! then inverts that selection, making it effectively "the blank lines at the top of the file".
d deletes those lines.

# delete all leading blank lines at top of file
sed '/./,$!d'
Source: http://www.linuxhowtos.org/System/sedoneliner.htm?ref=news.rdf
Just pipe the output of fortune into it:
fortune | sed '/./,$!d'

How about:
sed "s/^ *//" < fortunefile

i am not sure about how your fortune message actually looks like, but here's an illustration
$ string=" my message of the day"
$ echo $string
my message of the day
$ echo "$string"
my message of the day
or you could use awk
echo "${string}" | awk '{gsub(/^ +/,"")}1'