Deleting multiline text from multiple files - sed

I have a bunch of java files from which I want to remove the javadoc lines with the license [am changing it on my code].
The pattern I am looking for is
^\* \* ProjectName .* USA\.$
but matched across lines
Is there a way sed [or a commonly used editor in Windows/Linux] can do a search/replace for a multiline pattern?

Here's the appropriate reference point in my favorite sed tutorial.

Probably someone is still looking for such solution from time to time. Here is one.
Use awk to find the lines to be removed. Then use diff to remove the lines and let sed clean up.
awk "/^\* \* ProjectName /,/ USA\.$/" input.txt \
| diff - input.txt \
| sed -n -e"s/^> //p" \
>output.txt
A warning note: if the first pattern exist while the second does not, you will loose all text below the first pattern - so check that first.

Yes. Are you using sed, awk, perl, or something else to solve this problem?
Most regular expression tools allow you to specify multi-line patterns. Just be careful with regular expressions that are too greedy, or they'll match the code between comments if it exists.
Here's an example:
/\*(?:.|[\r\n])*?\*/
perl -0777ne 'print m!/\*(?:.|[\r\n])*?\*/!g;' <file>
Prints out all the comments run
together. The (?: notation must be
used for non-capturing parenthesis. /
does not have to be escaped because !
delimits the expression. -0777 is used
to enable slurp mode and -n enables
automatic reading.
(From: http://ostermiller.org/findcomment.html )

Related

How to replace a specific character in bash

I want to replace '_v' with a whitespace and the last dot . into a dash "-". I tried using
sed 's/_v/ /' and tr '_v' ' '
Original Text
src-env-package_v1.0.1.18
output
src-en -package 1.0.1.18
Expected Output
src-env-package 1.0.1-18
This might work for you (GNU sed):
sed -E 's/(.*)_v(.*)\./\1 \2-/' file
Use the greed of the .* regexp to find the last occurrence of _v and likewise . and substitute a space for the former and a - for the latter.
If one of the conditions may occur but not necessarily both, use:
sed -E 's/(.*)_v/\1 /;s/(.*)\./\1-/' file
With your shown samples please try following sed code. Using sed's capability to store matched regex values into temp buffer(called capturing groups) here. Also using -E option here to enable ERE(extended regular expressions) for handling regex in better way.
Here is the Online demo for used regex.
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' Input_file
OR if its a variable value on which you want to run sed command then use following:
var="src-env-package_v1.0.1.18"
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' <<<"$var"
src-env-package 1.0.1-18
Bonus solution: Adding a perl one-liner solution here, using capturing groups concept(as explained above) in perl and getting the values as per requirement.
perl -pe 's/^(src-env-package)_v((?:[0-9]+\.){1,}[0-9]+)\.([0-9]+)$/\1 \2-\3/' Input_file

Why is my sed multiline find-and-replace not working as expected?

I have a simple sed command that I am using to replace everything between (and including) //thistest.com-- and --thistest.com with nothing (remove the block all together):
sudo sed -i "s#//thistest\.com--.*--thistest\.com##g" my.file
The contents of my.file are:
//thistest.com--
zone "awebsite.com" {
type master;
file "some.stuff.com.hosts";
};
//--thistest.com
As I am using # as my delimiter for the regex, I don't need to escape the / characters. I am also properly (I think) escaping the . in .com. So I don't see exactly what is failing.
Why isn't the entire block being replaced?
You have two problems:
Sed doesn't do multiline pattern matches—at least, not the way you're expecting it to. However, you can use multiline addresses as an alternative.
Depending on your version of sed, you may need to escape alternate delimiters, especially if you aren't using them solely as part of a substitution expression.
So, the following will work with your posted corpus in both GNU and BSD flavors:
sed '\#^//thistest\.com--#, \#^//--thistest\.com# d' /tmp/corpus
Note that in this version, we tell sed to match all lines between (and including) the two patterns. The opening delimiter of each address pattern is properly escaped. The command has also been changed to d for delete instead of s for substitute, and some whitespace was added for readability.
I've also chosen to anchor the address patterns to the start of each line. You may or may not find that helpful with this specific corpus, but it's generally wise to do so when you can, and doesn't seem to hurt your use case.
# separation by line with 1 s//
sed -n -e 'H;${x;s#^\(.\)\(.*\)\1//thistest.com--.*\1//--thistest.com#\2#;p}' YourFile
# separation by line with address pattern
sed -e '\#//thistest.com--#,\#//--thistest.com# d' YourFile
# separation only by char (could be CR, CR/LF, ";" or "oneline") with s//
sed -n -e '1h;1!H;${x;s#//thistest.com--.*\1//--thistest.com##;p}' YourFile
Note:
assuming there is only 1 section thistest per file (if not, it remove anything between the first opening until the last closing section) for the use of s//
does not suite for huge file (load entire file into memory) with s//
sed using addresses pattern cannot select section on the same line, it search 1st pattern to start, and a following line to stop but very efficient on big file and/or multisection

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

need a regular expression for extracting data and writing to file

I have a file with following content:
[A hi] [B hello]
[A how] [A why] [C some where]
I basically want to extract the "text" with marker 'A' I mean
hi
how
why
in a new file on separate lines.
I tried using sed but I could not get the regular expression. Can someone suggest me what can I use ?
Try doing this using grep :
grep -oP '\[A\s+\K[^\]]+' file.txt > new_file.txt
or
grep -oP '\[A\s+\K[^\]]+' file.txt | tee new_file.txt
RESULT
hi
how
why
EXPLANATIONS
-o for grep stands for "get only the matching part"
-P for grep stands for "Perl extented regex"
for the \K regex trick, see Support of \K in regex (it's an advanced look-around regex trick)
The same regex in perl with comments :
use strict; use warnings;
use feature qw/say/;
while (<>) {
say for
/ # starting regex
\[A # a literal "[" and "A"
\s+ # at least one whitespace (\n, \r, \t, \f, and " ")
\K # restart the match
[^\]]+ # at least one character that is not a literal "]"
/gsx; # end of the regex and the modifiers
}
LINKS
To learn regex, see
http://www.regular-expressions.info/
Learning Regular Expressions
I'm not sure how to do this with sed (not too familiar with it), but you could use GNU grep with Perl-compatible regular expressions (see this answer for another example).
Here's a quick regex I've put together for your test input (assuming your data is in a file named 'foo'):
cat foo | grep -Po '(?<=\[A )[^\]]+'
This outputs:
hi
how
why
update - How this works:
The first portion of the regex (?<=\[A ) uses a negative-lookbehind, which basically means you ensure this think you are looking for is preceded by something (in this case \[A). This helps give context to what you are looking for. This can also be accomplished with capture groups, but since I've not done this sort of thing before with grep, I wasn't sure how to use them here. The syntax for one of the lookbehinds is (?<=THING_TO_PRECEDE_YOUR_MATCH_WITH).
The second chunk [^\]]+ just says "find one or more characters that are not \]. Note that we have to escape the square-brackets because they mean something in regular expressions. [^CHARSET] just says anything but some given character set or character-class. The + just says find one or more of what we just mentioned.
Depending on your experience with regular expressions this may or may not have been helpful, let me know if there are any points that I could better explain. I'm not sure of the best place to learn these. Having used python a lot, I find their syntax reference quite handy. Also, google tends to point to http://www.regular-expressions.info/ a lot, but I can't say from experience how useful it is.
This might work for you (GNU sed):
sed -r '/\[A\s+([^]]*)\]/{s//\n\1\n/;s/[^\n]*\n//;P};D' file

Confining Substitution to Match Space Using sed?

Is there a way to substitute only within the match space using sed?
I.e. given the following line, is there a way to substitute only the "." chars that are contained within the matching single quotes and protect the "." chars that are not enclosed by single quotes?
Input:
'ECJ-4YF1H10.6Z' ! 'CAP' ! '10.0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Desired result:
'ECJ-4YF1H10-6Z' ! 'CAP' ! '10_0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Or is this just a job to which perl or awk might be better suited?
Thanks for your help,
Mark
Give the following a try which uses the divide-and-conquer technique:
sed "s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g" inputfile
Explanation:
s/\('[^']*'\)/\n&\n/g - Add newlines before and after each pair of single quotes with their contents
s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "Z"
s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "uF"
s/\n//g - Remove the newlines added in the first step
You can restrict the command to acting only on certain lines:
sed "/foo/{s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g}" inputfile
where you would substitute some regex in place of "foo".
Some versions of sed like to be spoon fed (instead of semicolons between commands, use -e):
sed -e "/foo/{s/\('[^']*'\)/\n&\n/g" -e "s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g" -e "s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g" -e "s/\n//g}" inputfile
$ cat phoo1234567_sedFix.sed
#! /bin/sed -f
/'[0-9][0-9]\.[0-9][a-zA-Z][a-zA-Z]'/s/'\([0-9][0-9]\)\.\([0-9][a-zA-Z][a-zA-Z]\)'/\1_\2/
This answers your specific question. If the pattern you need to fix isn't always like the example you provided, they you'll need multiple copies of this line, with reg-expressions modified to match your new change targets.
Note that the cmd is in 2 parts, "/'[0-9][0-9].[0-9][a-zA-Z][a-zA-Z]'/" says, must match lines with this pattern, while the trailing "s/'([0-9][0-9]).([0-9][a-zA-Z][a-zA-Z])'/\1_\2/", is the part that does the substitution. You can add a 'g' after the final '/' to make this substitution happen on all instances of this pattern in each line.
The \(\) pairs in match pattern get converted into the numbered buffers on the substitution side of the command (i.e. \1 \2). This is what gives sed power that awk doesn't have.
If your going to do much of this kind of work, I highly recommend O'Rielly's Sed And Awk book. The time spent going thru how sed works will be paid back many times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer.
this is a job most suitable for awk or any language that supports breaking/splitting strings.
IMO, using sed for this task, which is regex based , while doable, is difficult to read and debug, hence not the most appropriate tool for the job. No offense to sed fanatics.
awk '{
for(i=1;i<=NF;i++) {
if ($i ~ /\047/ ){
gsub(".","_",$i)
}
}
}1' file
The above says for each field (field seperator by default is white space), check to see if there is a single quote, and if there is , substitute the "." to "_". This method is simple and doesn't need complicated regex.