sed: matching unicode blocks with - unicode

I am desperately trying to replace certain unicode characters (graphemes) from a file using sed. However I keep failing for some of them, namely the ones from unicode blocks:
\p{InHigh_Surrogates}: U+D800–U+DB7F
\p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF
\p{InLow_Surrogates}: U+DC00–U+DFFF
I tried (in a sed config file loaded via the -f switch):
s/\p{InHigh_Surrogates}/###/ --> no effect at all
s/\\p\{InHigh_Surrogates\}/###_D-NON-UTF8_###/ -> error message 'Invalid content of \{\}'
Anybody got a suggestion? Also, I am not necessarily focused on using the blocks - but I also failed trying to define a character range of the form \xd800-\xdfff.
Thanks,
Thomas

Try using the -r flag for sed:
$ sed -r 's/\\p\{InHigh_Surrogates\}/###/g' file
###: U+D800–U+DB7F
\p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF
\p{InLow_Surrogates}: U+DC00–U+DFFF
From man sed:
-r, --regexp-extended
use extended regular expressions in the script.

Related

Cannot use sed with regex via script

I have the following .sed script:
# replace female-male with F-M
s/female/F/
s/male/M/
# capitalize the name when the sport is volleyball or taekwondo
s/^([^,]*,)([^,]+)((,[^,]*){5},(volleyball|taekwondo),)/\1\U\2\L\3/
And the following csv file (first 10 lines)
id,name,nationality,sex,date_of_birth,height,weight,sport,gold,silver,bronze,info
736041664,A Jesus Garcia,ESP,male,1969-10-17,1.72,64,athletics,0,0,0,
532037425,A Lam Shin,KOR,female,1986-09-23,1.68,56,handball,0,0,0,
435962603,Aaron Brown,CAN,male,1992-05-27,1.98,79,athletics,0,0,1,
521041435,Aaron Cook,MDA,male,1991-01-02,1.83,80,taekwondo,0,0,0,
33922579,Aaron Gate,NZL,male,1990-11-26,1.81,,cycling,0,0,0,
173071782,Aaron Royle,AUS,male,1990-01-26,1.80,67,triathlon,0,0,0,
266237702,Aaron Russell,USA,male,1993-06-04,,98,volleyball,0,0,1,
382571888,Aaron Younger,AUS,male,1991-09-25,1.93,100,football,0,0,0,
87689776,Aauri Lorena Bokesa,ESP,female,1988-12-14,1.80,62,athletics,0,0,0,
The output must be done by the following command
sed -f script.sed ./file.csv
The problem I have is that despite making sure the regex is matching all the pertinent lines, I can only get it to replace the female-male values with F-M, the rest of the file is still the exact same. The names are not being capitalized.
If I run each regex directly (i.e 'sed -E 's/^([^,],)([^,]+)((,[^,]){5},(volleyball|taekwondo),)/\1\U\2\L\3/' file.csv') it works. But I need to do it via script, and with -f.
What am I missing? Thank you.
You still need to indicate that you're using extended regular expresssions:
sed -Ef script.sed file.csv
Otherwise, sed uses basic regular expressions, where escaping rules are different, specifically for () for capture groups, and {} for counts.
Have you tried using sed -Ef <script> <csv file>? You need -E to use extended regex expressions.

Remove a specific word from a file using shell script

I would request some help with a basic shell script that should do the following job.
File a particular word from a given file (file path is always constant)
Backup the file
Delete the specific word or replace the word with ;
Save the file changes
Example
File Name - abc.cfg
Contains the following lines
network;private;Temp;Windows;System32
I've used the following SED command for the operation
sed -i -e "/Temp;/d" abc.cfg
The output is not as expected. The complete line is removed instead of just the word Temp;
Any help would be appreciated. Thank you
sed matches against lines, and /d is the delete directive, which is why you get a deleted line. Instead, use substitution to replace the offending word with nothing:
sed 's/Temp;//g' abc.cfg
The /g modifier means "globlal", in case the offending word appears more than once. I would hold off on the -i (inline) flag until you are sure of your command, in general, or use -i .backup.
Thank you. I used your suggestion but couldn't get through. I appreciate the input though.
I was able to achieve this using the following SED syntax
sed -e "s/Temp//g" -i.backup abc.cfg
I wanted to take the backup before the change & hence -i was helpful.

sed repetition-operator operand invalid *****

I have text files that contain ***** in some locations. I need to replace the ***** with 9.999. This obviously came from some formatting error, but I do not have the program that created the files I now have to work with. I tried using the following command in csh:
sed -i "" 's/*****/9.999/g' *.dat
However, as I expected, I get the following error message:
sed: 1: "s/*****/9.999/g": RE error: repetition-operator operand invalid
I'm assuming this is because ***** is considered a special operator or something like that, but I can't figure out how to exempt them while using the sed command.
Does anyone have a hint that could help?
sed -E 's/\*{5}/9.999/g' file

Replace specials characters with sed

I am trying to use a sed command to replace specials characters in my file.
The characters are %> to replace by ].
I'am using sed -r s/\%>\/\]\/g but i have this error bash: /]/g: No such file or directory, looks like sed doesn't like it.
Put your sed code inside quotes and also add the file-path you want to work with and finally don't escape the sed delimiters.
$ echo '%>' | sed 's/%>/]/g'
]
ie,
sed 's/%>/]/g' file
To complement Avinash Raj's correct and helpful answer:
Since you were using an overall unquoted string (neither single- nor double-quoted), you were on the right track by \-escaping individual characters in your sed command.
However, you neglected to \-quote >, which is what caused your problem:
> is one of the shell's so-called metacharacters
Metacharacters have special meaning and separate words
Thus, s/\%>\/\]\/g is mistakenly split into 2 arguments by >:
s/\% is passed to sed - as s/%, because the shell removes the \ instances (a process called quote removal).
As you can see, this is not a valid sed command, but that doesn't even come into play - see below.
>\/\]\/g is interpreted by the shell (bash), because it starts with output-redirection operator >; after quote removal, the shell sees >/]/g, tries to open file /]/g for writing, and fails, because your system doesn't have a subdirectory named ] in its root directory.
bash tries to open an output file specified by a redirection before running the command and, if it fails to open the file, does not run the command - which is what happened here:
bash complained about the nonexistent target directory and aborted processing of the command - sed was never even invoked.
Upshot:
In a string that is neither enclosed in single nor in double-quotes, you must \-quote:
all metacharacters: | & ; ( ) < > space tab
additionally, to prevent accidental pathname expansion (globbing): * ? [
Also note that if you need to quote (escape) characters for sed,you need to add an extra layer of quoting; for instance to instruct sed to use a literal . in the regex, you must pass \\. - two backslashes - so that sed sees the properly escaped \..
Given the above, it is much simpler to (habitually) use single quotes around your sed command, because it ensures that the string is passed as is to sed.
Let's compare a working version of your command to the one from Avinash Raj's answer (leaving out the -r for brevity):
sed s/\%\>\/\]\/g # ok - all metachars. \-quoted, others are, but needn't be quoted
sed s/%\>/]/g # ok - minimum \-quoting
sed 's/%>/]/g' # simplest: single-quoted command
I'm not sure whether I got the question correctly. If you want to replace either % or > by ] then sed is not required here. Use tr in this case:
tr '%>' ']' < input.txt
If you want to replace the sequence %> by ] then the sed command as shown by #AvinashRaj is the way to go.

Unable to use SED to edit files fast

The file is initially
$cat so/app.yaml
application: SO
...
I run the following command. I get an empty file.
$sed s/SO/so/ so/app.yaml > so/app.yaml
$cat so/app.yaml
$
How can you use SED to edit the file and not giving me an empty file?
$ sed -i -e's/SO/so/' so/app.yaml
The -i means in-place.
The > used in piping will open the output file when the pipes are all set up, i.e. before command execution. Thus, the input file is truncated prior to sed executing. This is a problem with all shell redirection, not just with sed.
Sheldon Young's answer shows how to use in-place editing.
You are using the wrong tool for the job. sed is a stream editor (that's why it's called sed), so it's for in-flight editing of streams in a pipe. ed OTOH is a file editor, which can do everything sed can do, except it works on files instead of streams. (Actually, it's the other way round: ed is the original utility and sed is a clone that avoids having to create temporary files for streams.)
ed works very much like sed (because sed is just a clone), but with one important difference: you can move around in files, but you can't move around in streams. So, all commands in ed take an address parameter that tells ed, where in the file to apply the command. In your case, you want to apply the command everywhere in the file, so the address parameter is just , because a,b means "from line a to line b" and the default for a is 1 (beginning-of-file) and the default for b is $ (end-of-file), so leaving them both out means "from beginning-of-file to end-of-file". Then comes the s (for substitute) and the rest looks much like sed.
So, your sed command s/SO/so/ turns into the ed command ,s/SO/so/.
And, again because ed is a file editor, and more precisely, an interactive file editor, we also need to write (w) the file and quit (q) the editor.
This is how it looks in its entirety:
ed -- so/app.yaml <<-HERE
,s/SO/so/
w
q
HERE
See also my answer to a similar question.
What happens in your case, is that executing a pipeline is a two-stage process: first construct the pipeline, then run it. > means "open the file, truncate it, and connect it to filedescriptor 1 (stdout)". Only then is the pipe actually run, i.e. sed is executed, but at this time, the file has already been truncated.
Some versions of sed also have a -i parameter for in-place editing of files, that makes sed behave a little more like ed, but using that is not advisable: first of all, it doesn't support all the features of ed, but more importantly, it is a non-standardized proprietary extension of GNU sed that doesn't work on many non-GNU systems. It's been a while since I used a non-GNU system, but last I used one, neither Solaris nor OpenBSD nor HP-UX nor IBM AIX sed supported the -i parameter.
I believe that redirecting output into the same file you are editing is causing your problem.
You need redirect standard output to some temporary file and when sed is done overwrite the original file by the temporary one.