sed: change word order and replace

sed: change word order and replace - sed

I'm trying to replace;
randomtext{{XX icon}}
by
randomtext{{ref-XX}}
..in a file, where XX could be any sequence of 2 or 3 lowercase letters.
I attempted rearranging the word order with awk before replacing "icon" with "ref-" with sed;
awk '{print $2, $1}'
..but since there is no space before the first word nor after the second one, it messed up the curly brackets;
icon}} {{XX
What is the simplest way to achieve this using sed?

sed 's/{{\([a-z]\{2,3\}\)\sicon/{{ref-\1/'
This one liner uses the substitute command s/PATTERN/REPLACE/. {{ matches two brackets. \([a-z]\{2,3\}\) captures the pattern that matches 2 or 3 lowercase letters. \s matches a white space. icon matches the literal string "icon". Then we replace the match, that is, {{....icon with the literal string {{ref- and the captured 2 or 3 letter word.

Here's a more generic version using hash tags (#) as regex delimiter:
sed 's#{{\([^ ]*\) [^}]*#{{ref-\1#'
{{ anchors the regex at the double open curly braces.
\([^ ]*\) captures everything up until a space.
[^}]* eats everything up until a closing curly brace.

Related

sed charachter to leave a match untouched

If I have
123456red100green
123456bee010yellow
123456usb110orange
123456sos011querty
123456let101bottle
and I want it to be
123456red111green
123456bee111yellow
123456usb111orange
123456sos111querty
123456let111bottle
notice: the first 6 characters don't change,,,,
the following 6 change,,,,
also these strings might be anywhere in a file (beginning, end, anywhere)
I want to specify sed to
1)find 123456
2)skip the next three characters
3)replace the next three with 111
The closest I've come to is:
sed '/s/123456....../123456...111/g'
I know dots mean anything but I don't know the equivalent on the other side. In short how to command sed to leave characters in a match untouched.
sorry for having been unclear of what I want please bear with me

Matching 123456 followed by three characters that are not to be modified, and then replacing the next three characters with 111:
sed 's/\(123456...\).../\1111/g' file
The \( ... \) captures the part of the string that we don't want to modify. These are re-inserted with \1. The whole matching bit of the line is replaced by "the bit in the \( ... \) (i.e. \1) followed by 111".

If you want to change each and every zero (as in your examples), then just sed 's/0/1/g' would do. Or sed -e '/^123456/ s/0/1/g' to do the same on lines starting with 123456.
But to count characters, as you ask, use ( .. ) to capture the varying parts and \1 to replace them (using sed -E). So:
echo 123456abcdefgh | sed -Ee 's/^(123456...).../\1111/'
outputs 123456abc111gh. The \1 puts back the part matched by 123456..., the next three ones are literal characters.
(Without -E, you'd need \( .. \) to group.)

How to use Sed to change letter to uppercase in first and second column in text file to upper case

I have text file input.txt which has
april,december,month.gmail.com
lion,tiger,animal.gmail.com
Using sed change first and second columns to uppercase? Is there a way to do it?

With GNU sed:
sed 's/^[a-z]*,[a-z]*,/\U&/' file
s: substitute command
[a-z]*,: search for zero ore more lowercase letter followed by a ,. The pattern is repeated for second field
the \U sequence turns the replacement to uppercase
\U is applied to & which reference the matched string
or if there is only three comma separated fields:
sed 's/^[a-z].*,/\U&/' file
output:
APRIL,DECEMBER,month.gmail.com
LION,TIGER,animal.gmail.com
As #Sundeep suggests, the second sed can be shortened to:
s/^.*,/\U&/
which converts all characters until last , is found
For more on GNU sed substitution command, see this article

sed pattern negation with a comma separated line

I have a text file full of lines looking like:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
I am trying to change all of the commas , to pipes |, except for the commas within the quotes.
Trying to use sed (which I am new to)... and it is not working. Using:
sed '/".*"/!s/\,/|/g' textfile.csv
Any thoughts?

As a test case, consider this file:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
"x,y,z",foo,"a,b,c",foo,"yes,no",foo
Here is a sed command to replace non-quoted commas with pipe symbols:
$ sed -r ':a; s/^([^"]*("[^"]*"[^"]*)*),/\1|/g; t a' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
"x,y,z"|foo|"a,b,c"|foo|"yes,no"|foo
Explanation
This looks for commas that appear after pairs of double quotes and replaces them with pipe symbols.
:a
This defines a label a.
s/^([^"]*("[^"]*"[^"]*)*),/\1|/g
If 0, 2, 4, or any an even number of quotes precede a comma on the line, then replace that comma with a pipe symbol.
^
This matches at the start of the line.
(`
This starts the main grouping (\1).
[^"]*
This looks for zero or more non-quote characters.
("[^"]*"[^"]*)*
The * outside the parens means that we are looking for zero or more of the pattern inside the parens. The pattern inside the parens consists of a quote, any number of non-quotes, a quote and then any number on non-quotes.
In other words, this grouping only matches pairs of quotes. Because of the * outside the parens, it can match any even number of quotes.
)
This closes the main grouping
,
This requires that the grouping be followed by a comma.
t a
If the previous s command successfully made a substitution, then the test command tells sed to jump back to label a and try again.
If no substitution was made, then we are done.

using awk could be eaiser:
kent$ cat f
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
kent$ awk -F'"' -v OFS='"' '{for(i=1;i<=NF;i++)if(i%2)gsub(",","|",$i)}7' f
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive

I suggest a language with a proper CSV parser. For example:
ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file
Female|$0 to $25,000|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive

Here I would have used gnu awks FPAT. It define how a field looks like FS that tells what the separator is. Then you can just set the output separator to |
awk '{$1=$1}1' OFS=\| FPAT="([^,]+)|(\"[^\"]+\")" file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
If your awk does not support FPAT, this can be used:
awk -F, '{for (i=1;i<NF;i++) {c+=gsub(/\"/,"&",$i);printf "%s"(c%2?FS:"|"),$i}print $NF}' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive

sed 's/"\(.*\),\(.*\)"/"\1##HOLD##\2"/g;s/,/|/g;s/##HOLD##/,/g'
This will match the text in quotes and put a placeholder for the commas, then switch all the other commas to pipes and put the placeholder back to commas. You can change the ##HOLD## text to whatever you want.

how to use sed/awk to remove words with multiple pattern count

I have a file of string records where one of the fields - delimited by "," - can contain one or more "-" inside it.
The goal is to delete the field value if it contains more than two "-".
i am trying to recoup my past knowledge of sed/awk but can't make much headway
==========
info,whitepaper,Data-Centers,yes-the-6-top-problems-in-your-data-center-lane
info,whitepaper,Data-Centers,the-evolution-center
info,whitepaper,Data-Centers,the-evolution-of-lan-technology-lanner
==========
expected outcome:
info,whitepaper,Data-Centers
info,whitepaper,Data-Centers,the-evolution-center
info,whitepaper,Data-Centers
thanks

Try
sed -r 's/(^|,)([^,-]+-){3,}[^,]+(,|$)/\3/g'
or if you're into slashes
sed 's/\(^\|,\)\([^,-]\+-\)\{3,\}[^,]\+\(,\|$\)/\3/g'
Explanation:
I'm using the most basic sed command: substitution. The syntax is: s/pattern/replacement/flags.
Here pattern is (^|,)([^,-]+-){3,}[^,]+(,|$), replacement is \3, flags is g.
The g flag means global replacement (all matching parts are replaced, not only the first in line).
In pattern:
brackets () create a group. Somewhat like in math. They also allow to refer to a group with a number later.
^ and $ mean beginning and end of the string.
| means "or", so (^|,) means "comma or beginning of the string".
square brackets [] mean a character class, ^ inside means negation. So [^,-] means "anything but comma or hyphen". Not that usually the hyphen has a special meaning in character classes: [a-z] means all lowercase letters. But here it's just a hyphen because it's not in the middle.
+ after an expression means "match it 1 or more times" (like * means match it 0 or more times).
{N} means "match it exactly N times. {N,M} is "from N to M times". {3,} means "three times or more". + is equivalent to {1,}.
So this is it. The replacement is just \3. This refers to the third group in (), in this case (,|$). This will be the only thing left after the substitution.
P.S. the -r option just changes what characters need to be escaped: without it all of ()-{}| are treated as regular chars unless you escape them with \. Conversely, to match literal ( with -r option you'll need to escape it.
P.P.S. Here's a reference for sed. man sed is your friend as well.
Let me know if you have further questions.

You could try perl instead of sed or awk:
perl -F, -lane 'print join ",", grep { !/-.*-.*-/ } #F' < file.txt

This might work for you:
sed 's/,\{,1\}[^,-]*\(-[^,]*\)\{3,\}//g file

sed 's/\(^\|,\)\([^,]*-\)\{3\}[^,]*\(,\|$\)//g'
This should work in more cases:
sed 's/,$/\n/g;s/\(^\|,\|\n\)\([^,\n]*-\)\{3\}[^,\n]*\(,\|\n\|$\)/\3/g;s/,$//;s/\n/,/g'

Basic SED question for Linux

Temp file has only the number 22.5 in it.
I use
sed 's/.//' Temp
and I expect 225 but get 2.5
Why?

The dot is a special character meaning "match any character".
$ sed s/\\.// temp
225
You would think that you could do sed s/\.// temp, but your shell will escape that single backslash and pass s/.// to sed.. So, you need to put two backslashes to pass a literal backslash to sed, which will properly treat \. as a literal dot. Or, you could quote the command to retain the literal backslash:
$ sed "s/\.//" temp
225
The reason you get 2.5 when you do s/.// is that the dot matches the first character in the file and removes it.

Because '.' is a regular expression that matches any character. You want 's/\.//'

. is a wildcard character for any character, so the first character is replaced by nothing, then sed is done.
You want sed 's/\.//' Temp. The backslash is used to escape special characters so that they regain their face value.

'.' is special: it matches any single character. So in your case, the sed expression matches the first character on the line. Try escaping it like this:
s/\.//

you can also use awk
awk '{sub(".","")}1' temp

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

sed: change word order and replace - sed

Here's a more generic version using hash tags (#) as regex delimiter: sed 's#{{\([^ ]\) [^}]#{{ref-\1#' {{ anchors the regex at the double open curly braces. \([^ ]\) captures everything up until a space. [^}] eats everything up until a closing curly brace.

Related

sed charachter to leave a match untouched

How to use Sed to change letter to uppercase in first and second column in text file to upper case

sed pattern negation with a comma separated line

how to use sed/awk to remove words with multiple pattern count

Basic SED question for Linux

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

sed: change word order and replace - sed

Here's a more generic version using hash tags (#) as regex delimiter: sed 's#{{\([^ ]*\) [^}]*#{{ref-\1#' {{ anchors the regex at the double open curly braces. \([^ ]*\) captures everything up until a space. [^}]* eats everything up until a closing curly brace.

Related

sed charachter to leave a match untouched

How to use Sed to change letter to uppercase in first and second column in text file to upper case

sed pattern negation with a comma separated line

how to use sed/awk to remove words with multiple pattern count

Basic SED question for Linux

Categories

Resources

Here's a more generic version using hash tags (#) as regex delimiter: sed 's#{{\([^ ]\) [^}]#{{ref-\1#' {{ anchors the regex at the double open curly braces. \([^ ]\) captures everything up until a space. [^}] eats everything up until a closing curly brace.