sed replace from csv include last character of search term - sed

I am trying to replace a list of words found in a csv file with index markup (docbook). The csv is in this format:
testword[ -?],testword<indexterm><primary>testword</primary></indexterm>
This finds all occurrences of the testword with punctuation at the end. This part works. However, I need the final punctuation mark to be included in the replace part of the sed command.
sed -e 's/\(.*\)/s,\1,g/' index.csv > index.sed
sed -i -f index.sed file.xml
So e.g. This is a testword, in a test.
Would get replaced with This is a testword,<indexterm><primary>testword</primary></indexterm> in a test.

Problem is the string in the csv file that steers the proces, here you loose the punctuation.
Replacing the:
testword[ -?],testword<indexterm><primary>testword</primary></indexterm>
by:
testword\([ -?]\),testword\1<indexterm><primary>testword</primary></indexterm>
Would already solve your problem.

Related

Adding a space before each capital letter in a selected set of lines in a yaml file using sed

I want to write a regex for a shell script. It is used to match only this kind of lines in yml file. (lines with the tag summary: Example Summary)
summary: GetMembersSavedSearchesByMemberId
So What I want to do is add a space before each Uppercase letter and output like this
summary: Get Members Saved Searches By Member Id
I tried this regex
matchregex="summary[:][[:space:]].\([A-Z]\)"
replacement="summary: .\1"
sed -e "s/${matchregex}/${replacement}/g"
It is not working. What is the correct way of writing this?
Would you please try the following:
sed -E '/^summary:/ s/([a-z])([A-Z])/\1 \2/g'
Result:
summary: Get Members Saved Searches By Member Id
This might work for you (GNU sed):
sed 's/\B[[:upper:]]/ &/g' file
Globally insert a space inside a word where the following character is uppercase.

Use sed for Mixed Case Tags

Trying to reformat tags in an xlm file with gnu sed v4.7 on win10 (shoot me). sed is in the path and run from the Command Prompt. Need to escape some windows command-line characters with ^.
sourcefile
BEGIN
...
<trn:description>V7906 03/11 ALFREDOCAMEL HATSWOOD 74564500125</trn:description>
...
END
(There are three spaces at the start of the line.)
Expected output:
BEGIN
...
<trn:description>V7906 03/11 Alfredocamel Hatswood 74564500125</trn:description>
...
END
I want Title Case but this does in-place to lower case:
sed -i 's/^<trn:description^>\(.*\)^<\/trn:description^>$/^<trn:description^>\L\1^<\/trn:description^>/g' sourcefile
This command changes to Title Case:
sed 's/.*/\L^&/; s/\w*/\u^&/g' sourcefile
Can this be brought together as a one-liner to edit the original sourcefile in-place?
I want to use sed because it is available on the system and the code is consistently structured. I'm aware I should use a tool like xmlstarlet as explained:
sed ... code can't distinguish a comment that talks about sessionId tags from a real sessionId tag; can't recognize element encodings; can't deal with unexpected attributes being present on your tag; etc.
Thanks to Whirlpool Forum members for the answer and discussion.
It was too hard to achieve pattern matching "within the tags" in sed and the file was well formed so the required lines were changed:
sed -i.bak '/^<trn:description^>/s/\w\+/\L\u^&/g; s/^&.*;\^|Trn:Description/\L^&/g' filename
Explanation
in-place edit saving original file with .bak extension
select lines containing <trn:description>
for one or more words
replace first character with uppercase and rest with lowercase
select strings starting with & and ending with ; or Trn:Description
restore codes by replacing characters with lowercase
source/target filename
Note: ^ is windows escape character and is not required in other implementations

sed command to replace a value in a file not using find and replace

I have a file with a string log.txt and inside the file i have multiple lines
line 1 text
line2/random/string/version:0.0.30
line 3 randome stuff
http://someurl:8550/
So currently I use sed to find and replace 0.0.30 to a new value like 0.0.31
with
sed -i s/0.0.30/0.0.31/g log.txt
The problem with this is I need to know the previous value.
Is there a way to always remove 0.0.30 from the string in the file and replace it with a new value ?
Maybe a indexof or a substring.
You can use a regex definition to match 0.0.30 and replace it with 0.0.31 as below. The --posix flag is to ensure no GNU dialects are applied and plain BRE (Basic Regular Expressions) library is used. Since \{2\} is a BRE syntax to match 2 occurrences of the digit.
sed -i --posix 's/[[:digit:]]\.[[:digit:]]\.[[:digit:]]\{2\}/0.0.31/' file
See explanation for regex here.

sed - Replace comma after first regex match

i m trying to perform the following substitution on lines of the general format:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......
as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .
I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?
sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"
Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:
gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'
That yielded the same file unchanged.
CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need
gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," }
{ for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file
See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.
The following sed will convert all decimal separators in quoted numeric fields:
sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'
See: https://www.regular-expressions.info/floatingpoint.html
This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file
This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.
N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.
If your input always follows that format of only one quoted field containing 1 comma then all you need is:
$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......
If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.
Assuming you have this:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC
Try this:
awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","
Output will be:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC
You simply need to know the field numbers for replacing the field separator between them.
In order to use regexp as in perl you have to activate extended regular expression with -r.
So if you want to replace all numbers and omit the " sign, then you can use this:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'
If you want to replace first occurrence only you can use that:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'
https://www.gnu.org/software/sed/manual/sed.txt

Sed replace specific substring

I have a file that's generated as an output to an SQL query. I need to replace the nulls in the file with blanks, so something like
sed -e"s/null//g" would work.
However there's a valid string of the form 'null/' (with a trailing forward slash) and that should not be replaced. Is there a way to replace only 'null' values while leaving 'null/' intact?
The sed one-liner:
sed 's#null\([^/]\|$\)#\1#g' file
should work for your requirement.
It searches pattern: null and followed by a non-slash char (or EOL),
replace with the followed non-slash char.
Thus, null/ won't be touched.
I think this command should be enough:
sed -e "s/null[^/]//g"