Selectively replace characters on specific lines

Selectively replace characters on specific lines - sed

I've got a batch file that creates a file for upload to z/OS, concatenating all Pascal files from a directory. The file is like the following:
./ ADD LIST=ALL,NAME=AFTER_W
text
text
text
./ ADD LIST=ALL,NAME=WHATEVER
text
more text
./ ADD LIST=ALL,NAME=A-FILE
text
and other text
./ ADD LIST=ALL,NAME=(C)OPY
text
blah
blah
The problem is that I cannot use certain characters ((, ), -, and _) in z/OS PDS member names, so I need something that changes these four characters into something acceptable (C, C, #, and $), but only on the lines that start with ./ ADD. So for the above input, the output would be:
./ ADD LIST=ALL,NAME=AFTER$W
text
text
text
./ ADD LIST=ALL,NAME=WHATEVER
text
more text
./ ADD LIST=ALL,NAME=A#FILE
text
and other text
./ ADD LIST=ALL,NAME=CCCOPY
text
blah
blah

If you just want to make those substitution (()-_ to CC#$) on the lines matching ./ ADD, this should suffice:
sed '/\.\/ ADD /y/()-_/CC#$/' yourinput
This code applies the y command¹ to all lines that match ./ ADD, where . is escaped because it is a metacharacter, / is escaped because it's the (immutable) delimiter of the regex, and everything else is literal; the y command has the syntax y/abc/def/ and substitutes a with d, b with e, and c with f (and the delimiter can be changed to something else, e.g. y!abc!edf! is fine).
[1] It is the transliterate command; I don't know Y it is called y, but t is taken by the test command.

Related

How to find and replace with sed, except when between curly braces?

I have a command like this, it is marking words to appear in an index in the document:
sed -i "s/\b$line\b/\\\keywordis\{$line\}\{$wordis\}\{$definitionis\}/g" file.txt
The problem is, it is finding matches within existing matches, which means its e.g. "hello" is replaced with \keywordis{hello}{a common greeting}, but then "greeting" might be searched too, and \keywordis{hello}{a common \keywordis{greeting}{a phrase used when meeting someone}}...
How can I tell sed to perform the replacement, but ignore text that is already inside curly brackets?
Curley brackets in this case will always appear on the same line.

How can I tell sed to perform the replacement, but ignore text that is already inside curly brackets?
First tokenize input. Place something unique, like | or byte \x01 between every \keywordis{hello}{a common greeting} and store that in hold space. Something along s/\\the regex to match{hello}{a common greeting}/\x01&\x01/g'.
Ten iterate over elements in hold space. Use \n to separate elements already parsed from not parsed - input from output. If the element matches the format \keywordis{hello}{a common greeting}, just move it to the front before the newline in hold space, if it does not, perform the replacement. Here's an example: Identify and replace selective space inside given text file , it uses double newline \n\n as input/output separator.
Because, as you noted, replacements can have overlapping words with the patterns you are searching for, I believe the simplest would be after each replacement shuffling the pattern space like for ready output and starting the process all over for the current line.
Then on the end, shuffle the hold space to remove \x01 and newline and any leftovers and output.
Overall, it's Latex. I believe it would be simpler to do it manually.
By "eating" the string from the back and placing it in front of input/output separator inside pattern space, I simplified the process. The following program:
sed '
# add our input/output separator - just a newline
s/^/\n/
: loop
# l1000
# Ignore any "\keywords" and "{stuff}"
/^\([^\n]*\)\n\(.*\)\(\\[^{}]*\|{[^{}]*}\)$/{
s//\3\1\n\2/
b loop
}
# Replace hello followed by anthing not {}
# We match till the end because regex is greedy
# so that .* will eat everything.
/^\([^\n]*\)\n\(.*\)hello\([{}]*\)$/{
s//\\keywordis{hello}{a common greeting}\3\1\n\2/
b loop
}
# Hello was not matched - ignore anything irrelevant
# note - it has to match at least one character after newline
/^\([^\n]*\)\n\(.*\)\([^{}]\+\)$/{
s//\3\1\n\2/
b loop
}
s/\n//
' <<<'
\keywordis{hello}{hello} hello {some other hello} another hello yet
'
outputs:
\keywordis{hello}{hello} \keywordis{hello}{a common greeting} {some other hello} another \keywordis{hello}{a common greeting} yet

Sed command to delete "\" which causes "*** multiple target patterns. Stop." error

In a file, I'm having the lines like this -
a.lo a.o: abc/util.c \
/usr/lib/def.h
b.lo b.o: hash/imp.h \
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
high/scan_f.c
Here you can see one extra \ (back slash) at the end of line number 4 (/usr/lib/toy.c ). How can I use sed command to remove this / (back slash)? Because of this I'm getting "*** multiple target patterns. Stop." error.
P.S. - I'm having this extra \ (back slash) at multiple places in my file. So using sed to delete it by line number won't be feasible. Need something which can check for .lo .o and check a line before, if it finds a \ (back slash) remove it.

Maybe not the simplest but this should work:
sed -nE '${s/\\$//;p;};N;s/\\([^\\]*:)/\1/;P;D' input_file
The main idea is to concatenate input lines in the pattern space (a sed internal text buffer), such that it always contains 2 consecutive lines, separated by a newline character. We then just delete the last \ before a :, if any, print the first of the 2 lines and remove it from the pattern space before continuing with the next line.
sed commands are separated by semi-columns (;) and grouped with curly braces ({...}). They are optionally preceded by a line(s) specification, for instance $ that stands for the last line of the input. So, in our case, ${s/\\$//;p;} applies only to the last line while the rest (N;s/\\([^\\]*:)/\1/;P;D) applies to all lines.
The -n option suppresses the default output. We need this to control the output ourselves with the p (print) command.
The -E option enables the use of extended regular expressions.
Let's first explain the tricky part: N;s/\\([^\\]*:)/\1/;P;D. It is a list of 4 commands that are run for each line of the input because there is no line(s) specification before the commands.
When sed starts processing the input the pattern space already contains the first line (a.lo a.o: abc/util.c \ in your example). This is how sed works: by default it puts the current line in the pattern space, applies the commands and restarts with the next line.
N appends the next input line (/usr/lib/def.h) to the pattern space with a newline character as separator. The pattern space now contains:
a.lo a.o: abc/util.c \
/usr/lib/def.h
N also increments the current line number which becomes 2.
s/\\([^\\]*:)/\1/ deletes the last \ before the first : in the pattern space, if there is one. In our example the only \ is after the first :. The pattern space is not modified.
P prints the first part of the pattern space, up to the first newline character. In our example what is printed is:
a.lo a.o: abc/util.c \
D deletes the first part of the pattern space, up to the first newline character (what has just been printed). The pattern space contains:
/usr/lib/def.h
D also starts a new cycle but different from the normal sed processing, it does not read the next line and leaves the pattern space and current line number unmodified. So when restarting the pattern space contains line number 2 and the the current line number is still 2.
By induction we see that, each time sed restarts executing the list of commands, the pattern space contains the current line, as normal. When processing line number 4 of your example it contains:
/usr/lib/toy.c \
After N it contains:
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
And there, the substitution command (s/\\([^\\]*:)/\1/) matches and deletes the first \:
/usr/lib/toy.c
c.lo c.o: high/scan.c \
It is thus:
/usr/lib/toy.c
that is printed and removed from the pattern space. Exactly what you want.
The last line needs a special treatment. When we start processing it the pattern space contains:
high/scan_f.c
If we don't do anything special N does not change it (there is no next line to concatenate) and terminates the processing. The last line is never printed.
This is why another list of commands is needed, just for the last line: ${s/\\$//;p;}. It applies only to the last line because it is preceded by a line(s) specification ($ for last line). The first command in the list (substitute s/\\$//) removes a trailing \, if there is one. The second (p) prints the pattern space.
Note: if you know that the last line does not end with a trailing backslash you can simplify a bit:
sed -nE '$p;N;s/\\([^\\]*:)/\1/;P;D' input_file

I agree with #G.M. in general, but this will work.
sed captures text before trailing "\" (if present) on lines starting with "\" and prints only that text on those lines. All other text is also printed, of course
sed -e 's/\(.* \)\\$/\1/' input_file

The question is a bit unclear about how to identify the lines from which a trailing backslash should be removed, but inasmuch as the input looks like set of a makefile-format prerequisite lists from which some lines have been removed, I take the objective to be to remove backslashes where they appear after the last (remaining) prerequisite in a list. That requires looking ahead to the next line, so it will be helpful to make use of sed's hold space to store data while you look ahead at the next line to figure out what to do with it.
This would be a pretty robust solution for that problem:
sed -nE 's/\s*(\\){0,1}$/ \\/; :a; /:/ { x; s/\s*\\$//; p; d; }; H; $ { s/.*/:/; b a }' input
That builds up each prerequisite list in the hold space, with backslashes and newlines embedded, then dumps it when the next target list or the end of the input arrives.
Details:
the -n option turns off automatically printing the pattern space after each line
the -E option turns on extended regular expressions
the sed expression contains several sub-expressions, joined by semicolons:
s/\s*(\\){0,1}$/ \\/ : ensure that the current line in the pattern space ends with a space and backslash, without adding a second backslash to lines that already have one
:a : labels that point in the script 'a'
/:/ { x; s/\s*\\$//; p; d; } : on lines that contain a colon, swap the pattern and hold spaces, remove the trailing backslash from (the new contents of) the pattern space, print the result, then start the next cycle
H : (if control reaches this point) append a newline and the contents of the pattern space to the hold space
$ { s/.*/:/; b a } : on the last line of input trigger dumping the hold space by putting a colon in the pattern space and jumping to label 'a'
[end of expression] : read the next line into the pattern space and start over
Alternatively, it would more exactly follow your request, and avoid introducing a leading blank line, to do this:
sed -n ':a; /\\$/! { p; d; }; h; :b; $ { x; s/\\//; p; }; n; /:/ { x; s/\\$//; p; x; b a; }; H; /\\$/ b b; s/.*//; x; p' input
That also assembles pieces in the hold space before ultimately printing them, but it goes about it in a different way:
it starts (at label a) by checking whether the line in the pattern space ends with a backslash. If not (/\\$/!), then it prints the pattern space and starts the next cycle.
otherwise, it replaces the current contents of the hold space with the contents of the pattern space (which must already end with a backslash), then
(at label b) if the current line is the last then it retrieves the contents of the hold space, strips the trailing newline, and prints the result ($ { x; s/\\//; p; }). Either way,
it attempts to read the next input line, and terminates if there are no more (n).
if that results in the pattern space containing a colon within, then the contents of the hold space are printed, less trailing backslash, and control is sent back to label a to process the colon-containing line as a new first line (/:/ { x; s/\\$//; p; x; b a; }).
otherwise, a newline and the contents of the pattern space are appended to the hold space (H).
if the pattern space ends with a backslash then control branches back to label b to consider reading another line (/\\$/ b b).
otherwise, the hold space is printed and cleared (s/.*//; x; p), and
if there are any more lines then the next is read and a new cycle started.
That makes fewer assumptions about the nature of the input, but it is a bit more complicated.

Using SED to replaces leading and trailing spces in a csv file

I am using the following command to strip leading and trailing spaces from a file A.csv
sed "s/^ \+//g;s/[ \t]*$//;s/ \{1,\}/ /g" <A.csv> B.csv
Here is an example to A.csv
"a"," v b","z"
"a"," vd","z"
"a"," v, b, c ","z "
"a"," vb ","z "
The problem is that not all leading and trailing spaces are removed as shown below:
"a"," v b","z"
"a"," vd","z"
"a"," v, b, c ","z "
"a"," vb ","z "
Below is an example of what I was expecting:
"a","v b","z"
"a","vd","z"
"a","v, b, c","z"
"a","vb","z"
How can I get this right?

sed 's/" \+/"/g;s/[ \t]*"/"/g;s/ \{1,\}/ /g' A.csv
The ouput:
"a","v b","z"
"a","vd","z"
"a","v, b, c","z"
"a","vb","z"
Your own command, only s/ \{1,\}/ /g is working.
Thing is, sed will treat csv file as a simple text file, without knowing the commas and quotes are for columns.
So the ^ and $ will only match the beginning and the end of each line.
Also you forgot to put g to the second s.

You can't/shouldn't do this properly with just sed. I recommend switching to some better language that can work with CSV files.
There is also a tool called csvtool:
$ cat /path/to/trim
#!/usr/bin/env bash
shopt -s extglob
for c; do
c=${c##*([[:space:]])} c=${c%%*([[:space:]])}
printf '"%s"\n' "${c//'"'/'""'}"
done | paste -sd,
$ csvtool call /path/to/trim A.csv
"a","v b","z"
"a","vd","z"
"a","v, b, c","z"
"a","vb","z"
As much as I like csvtool for simple stuff, this will unfortunately be painfully slow! It took my VBox nearly 15 seconds to process a short 4000-line CSV.

This might work for you (GNU sed):
sed -r 's/"\s*([^[:space:]"]+(\s*[^[:space:]"]+)*)\s*"/"\1"/g' file
Remove immediate white space either side of a pair of double quotes, globally throughout the file.

apply "backspace" to the line directly after the matching line perl

I'd like to re-format a text file such that the line immediately bellow the matching string gets cut and appended to the line with the matching string. Here is an example text:
Answer:
renice
X.
find / -name filename &
Y.
find / -name filename
Z.
bg find / -name filename
I'm looking for the end result:
Answer: renice
X. find / -name filename &
Y. find / -name filename
Z. bg find / -name filenames
I'm unable to get the following right trim suggestion:
$str =~ s/\s+$//;
To generate the result I need inline. The space is gone, but the string I need is still on the line bellow. The lines to cut and paste only occur directly bellow "Answer:" "X." "Y." or "Z."

It would help to see your full solution, but here's a one-liner that does what you'd like:
perl -pe's/\s*$/ / if /^.+?[:.]/'
This replaces any whitespace, including a newline, at the end of the string with a single space, but only if the pattern matches. The pattern looks for some characters at the beginning of the line followed by a period or colon. Add -i.bak to modify files in-place. Hope this helps!

i need to replace source '/root with source '/root/app in a script via sed

I have written few scripts which use source a lot. Now I need to execute them in a different folder, which means I need to replace
source '/root/
with
source '/root/app
I have tried sed -r and other options but didn't work.

sed "s|source '/root/|source '/root/app|g"
s => Replace command.
| => Delimiter used with the replace command; the string between the first and seconds | is replaced with the string between the second and third |.Usually / is used for the delimiter, but because you have that in your strings, something else must be used.
g => The g at the end is a modifier to the s command, which would usually only replace the first occurrence of the target string in each line; with g every occurrence in the line is changed.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Selectively replace characters on specific lines - sed

Related

How to find and replace with sed, except when between curly braces?

Sed command to delete "\" which causes "*** multiple target patterns. Stop." error

Using SED to replaces leading and trailing spces in a csv file

apply "backspace" to the line directly after the matching line perl

i need to replace source '/root with source '/root/app in a script via sed

Categories

Resources