Cannot use sed with regex via script - sed

I have the following .sed script:
# replace female-male with F-M
s/female/F/
s/male/M/
# capitalize the name when the sport is volleyball or taekwondo
s/^([^,]*,)([^,]+)((,[^,]*){5},(volleyball|taekwondo),)/\1\U\2\L\3/
And the following csv file (first 10 lines)
id,name,nationality,sex,date_of_birth,height,weight,sport,gold,silver,bronze,info
736041664,A Jesus Garcia,ESP,male,1969-10-17,1.72,64,athletics,0,0,0,
532037425,A Lam Shin,KOR,female,1986-09-23,1.68,56,handball,0,0,0,
435962603,Aaron Brown,CAN,male,1992-05-27,1.98,79,athletics,0,0,1,
521041435,Aaron Cook,MDA,male,1991-01-02,1.83,80,taekwondo,0,0,0,
33922579,Aaron Gate,NZL,male,1990-11-26,1.81,,cycling,0,0,0,
173071782,Aaron Royle,AUS,male,1990-01-26,1.80,67,triathlon,0,0,0,
266237702,Aaron Russell,USA,male,1993-06-04,,98,volleyball,0,0,1,
382571888,Aaron Younger,AUS,male,1991-09-25,1.93,100,football,0,0,0,
87689776,Aauri Lorena Bokesa,ESP,female,1988-12-14,1.80,62,athletics,0,0,0,
The output must be done by the following command
sed -f script.sed ./file.csv
The problem I have is that despite making sure the regex is matching all the pertinent lines, I can only get it to replace the female-male values with F-M, the rest of the file is still the exact same. The names are not being capitalized.
If I run each regex directly (i.e 'sed -E 's/^([^,],)([^,]+)((,[^,]){5},(volleyball|taekwondo),)/\1\U\2\L\3/' file.csv') it works. But I need to do it via script, and with -f.
What am I missing? Thank you.

You still need to indicate that you're using extended regular expresssions:
sed -Ef script.sed file.csv
Otherwise, sed uses basic regular expressions, where escaping rules are different, specifically for () for capture groups, and {} for counts.

Have you tried using sed -Ef <script> <csv file>? You need -E to use extended regex expressions.

Related

Remove a specific word from a file using shell script

I would request some help with a basic shell script that should do the following job.
File a particular word from a given file (file path is always constant)
Backup the file
Delete the specific word or replace the word with ;
Save the file changes
Example
File Name - abc.cfg
Contains the following lines
network;private;Temp;Windows;System32
I've used the following SED command for the operation
sed -i -e "/Temp;/d" abc.cfg
The output is not as expected. The complete line is removed instead of just the word Temp;
Any help would be appreciated. Thank you
sed matches against lines, and /d is the delete directive, which is why you get a deleted line. Instead, use substitution to replace the offending word with nothing:
sed 's/Temp;//g' abc.cfg
The /g modifier means "globlal", in case the offending word appears more than once. I would hold off on the -i (inline) flag until you are sure of your command, in general, or use -i .backup.
Thank you. I used your suggestion but couldn't get through. I appreciate the input though.
I was able to achieve this using the following SED syntax
sed -e "s/Temp//g" -i.backup abc.cfg
I wanted to take the backup before the change & hence -i was helpful.

Trouble using sed to replace contents of .coveralls.yml configuration in Travis-CI

I'm trying to take an environment variable in travis-ci and replace the contents of a file at runtime using sed.
The file in question contains:
service_name: travis-ci
repo_token: COVERALLS_TOKEN
On an ubuntu system, using sed -i 's/COVERALLS_TOKEN/ASDF/g' .coveralls.yml in the command line works, but carrying that over to the travis-ci configuration something like sed -i 's/COVERALLS_TOKEN/$COVERALLS_TOKEN/g' .coveralls.yml doesn't pull the environment variable.
What really throws me off is that I have a project today where the below .travis.yml entry works, but adapting it to this circumstances it doesn't.
Original implementation, still works today
sed -ri 's/^MY_ENV_VAR=/MY_ENV_VAR='$MY_ENV_VAR'/' .env
Adaptation (doesn't work)
sed -ri 's/^COVERALLS_TOKEN/$COVERALLS_TOKEN/' .coveralls.yml
You have two problems with your command. First, the ^ means it will only match COVERALLS_TOKEN where it occurs at the very beginning of a line. Since it's not at the beginning of a line in your YAML file, there is no match and the sed command does nothing.
Second, there is no variable substitution inside single quotation marks.
So remove the ^and use double quotes instead of single ones:
sed -ri "s/COVERALLS_TOKEN/$COVERALLS_TOKEN/" .coveralls.yml
Some notes:
The variable $COVERALLS_TOKEN must be set in the shell at the time that you run the sed command.
The substitution will fail with a syntax error if the value of $COVERALLS_TOKEN contains the delimiter you use on the substitution command. The command above uses /, but you can change that if needed - just pick something that doesn't occur in the token string.
The token value will not be quoted in any way in the YAML file. Normally that's ok, but if there are any weird characters in the value, you will need to put quotes around it in the YAML by adding them to the replacement side of the substitution command as well:
sed -ri "s/COVERALLS_TOKEN/'$COVERALLS_TOKEN'/" .coveralls.yml
The single quotes suppress variable expansion.
This is works for me.
sed -ri 's,IMAGE_REPOSITORY,'"$IMAGE_REPO"',g' chart/values.yaml

Replace specials characters with sed

I am trying to use a sed command to replace specials characters in my file.
The characters are %> to replace by ].
I'am using sed -r s/\%>\/\]\/g but i have this error bash: /]/g: No such file or directory, looks like sed doesn't like it.
Put your sed code inside quotes and also add the file-path you want to work with and finally don't escape the sed delimiters.
$ echo '%>' | sed 's/%>/]/g'
]
ie,
sed 's/%>/]/g' file
To complement Avinash Raj's correct and helpful answer:
Since you were using an overall unquoted string (neither single- nor double-quoted), you were on the right track by \-escaping individual characters in your sed command.
However, you neglected to \-quote >, which is what caused your problem:
> is one of the shell's so-called metacharacters
Metacharacters have special meaning and separate words
Thus, s/\%>\/\]\/g is mistakenly split into 2 arguments by >:
s/\% is passed to sed - as s/%, because the shell removes the \ instances (a process called quote removal).
As you can see, this is not a valid sed command, but that doesn't even come into play - see below.
>\/\]\/g is interpreted by the shell (bash), because it starts with output-redirection operator >; after quote removal, the shell sees >/]/g, tries to open file /]/g for writing, and fails, because your system doesn't have a subdirectory named ] in its root directory.
bash tries to open an output file specified by a redirection before running the command and, if it fails to open the file, does not run the command - which is what happened here:
bash complained about the nonexistent target directory and aborted processing of the command - sed was never even invoked.
Upshot:
In a string that is neither enclosed in single nor in double-quotes, you must \-quote:
all metacharacters: | & ; ( ) < > space tab
additionally, to prevent accidental pathname expansion (globbing): * ? [
Also note that if you need to quote (escape) characters for sed,you need to add an extra layer of quoting; for instance to instruct sed to use a literal . in the regex, you must pass \\. - two backslashes - so that sed sees the properly escaped \..
Given the above, it is much simpler to (habitually) use single quotes around your sed command, because it ensures that the string is passed as is to sed.
Let's compare a working version of your command to the one from Avinash Raj's answer (leaving out the -r for brevity):
sed s/\%\>\/\]\/g # ok - all metachars. \-quoted, others are, but needn't be quoted
sed s/%\>/]/g # ok - minimum \-quoting
sed 's/%>/]/g' # simplest: single-quoted command
I'm not sure whether I got the question correctly. If you want to replace either % or > by ] then sed is not required here. Use tr in this case:
tr '%>' ']' < input.txt
If you want to replace the sequence %> by ] then the sed command as shown by #AvinashRaj is the way to go.

sed: matching unicode blocks with

I am desperately trying to replace certain unicode characters (graphemes) from a file using sed. However I keep failing for some of them, namely the ones from unicode blocks:
\p{InHigh_Surrogates}: U+D800–U+DB7F
\p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF
\p{InLow_Surrogates}: U+DC00–U+DFFF
I tried (in a sed config file loaded via the -f switch):
s/\p{InHigh_Surrogates}/###/ --> no effect at all
s/\\p\{InHigh_Surrogates\}/###_D-NON-UTF8_###/ -> error message 'Invalid content of \{\}'
Anybody got a suggestion? Also, I am not necessarily focused on using the blocks - but I also failed trying to define a character range of the form \xd800-\xdfff.
Thanks,
Thomas
Try using the -r flag for sed:
$ sed -r 's/\\p\{InHigh_Surrogates\}/###/g' file
###: U+D800–U+DB7F
\p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF
\p{InLow_Surrogates}: U+DC00–U+DFFF
From man sed:
-r, --regexp-extended
use extended regular expressions in the script.

sed command to extract from xml

I'm using my mac terminal to do a script, it basically does:
wget http://p2.edms-pr.ccomrcdn.com/player/player_dispatcher.html?section=radio&action=listen_live
This file returns an XML which I can save to txt or XML, I'm saving it as "url.xml"
<PlayerContent>
<ListenLiveInitialize>
<StreamInfo>
<stream id="4694" primary_location="rtmp://cp58082.live.edgefcs.net/live/COR_5103_OR#s5137?auth=daEaIcRcbb.afahbOdwbWdjdYcEdYaOaDdc-bn7nM7-4q-PN0X1_3nqDHom4EBvmEuwr&aifp=1234&CHANNELID=4694&CPROG=_&MARKET=PREMIERE&REQUESTOR=EDMS-PR&SERVER_NAME=p2.edms-pr.ccomrcdn.com&SITE_ID=13293&STATION_ID=EDMS-PR&MNM=_&TYPEOFPLAY=0" backup_location=""/>
</StreamInfo>
<JustPlayed/>
I want to used SED to return the AUTH code inside "primary_location". So basically I want to store
daEaIcRcbb.afahbOdwbWdjdYcEdYaOaDdc-bn7nM7-4q-PN0X1_3nqDHom4EBvmEuwr
on a variable.
I found this online but it doesn't seem to be working.
sed -n 's/.*\(auth=......................................... ...........................\).*/\1/p' url.xml
Try
sed -n 's|^<stream.*auth\=\(.*\)\&ai.*|\1|p' url.xml
which reads the file and matches the line up to the = before the auth code, stores everything from there up to the & in &ai as \1 which is then substituted for the whole pattern space.
You have a stray space () in the middle of your .s!
This is neater and will output auth= with the value (it looks like it's a string of alphanumerics with hyphens and underscores):
% grep -o 'auth=[[:alnum:]_-]\+' url.xml
You could even use it like so:
% eval $(grep -o 'auth=[[:alnum:]_-]\+' url.xml)
% echo ${auth}
daEaIcRcbb.afahbOdwbWdjdYcEdYaOaDdc-bn7nM7-4q-PN0X1_3nqDHom4EBvmEuwr
Works on OSX.