Perl conditional extraction of lines from file - perl

Could anyone suggest me a method where I could extract a few lines of text while reading it.
file sample structure:
A blah blah string1
B blah blah
C blah string2
D blah string3 blah
E blah blah
F blah string2
G blah string3 blah
H blah blah string1
I blah blah
J blah string2
Here I want to extract lines starting with string "string1" followed/ended by "string2"
In effect I want lines A-C and H-J in the above example.
My experiments are failing with the presence of line F which I would want to ignore.

Perl one liner and the Flip-flop Operator ..:
$ perl -ne 'print if /\bstring1\b/ .. /\bstring2\b/' file
A blah blah string1
B blah blah
C blah string2
H blah blah string1
I blah blah
J blah string2
\b in the above regex is called word boundary. It matches between a word characetr and a non word character.
From Perl --help
-n assume "while (<>) { ... }" loop around program
-e program one line of program (several -e's allowed, omit programfile)

This can be done as well in awk and sed by indicating the patterns between which you want to print the lines:
sed -n '/string1/,/string2/p' file
awk '/string1/,/string2/' file
In Perl you can say:
perl -e 'while (<>){print if (/string1/../string2/);}' file
Which is equivalent to
perl -ne '{print if (/string1/../string2/)}' file
^
All of them return:
A blah blah string1
B blah blah
C blah string2
H blah blah string1
I blah blah
J blah string2

Related

Using sed, how can I append a string to lines that contain a pattern and don't already have the string appended?

Eq.
I have file with the following contents
ERR001 just some random text
ERR002 blah blah blah
ERR001 again some text //IGNORE
ERR001 blah blahblah blah blah
ERR002 abc def ghi
I to write a sed command that append //IGNORE to all line that have ERR001 but do not already have //IGNORE appended. So the sed command should give the below output for the above file
ERR001 just some random text //IGNORE
ERR002 blah blah blah
ERR001 again some text //IGNORE
ERR001 blah blahblah blah blah //IGNORE
ERR002 abc def ghi
sed solution:
sed '/\/\/IGNORE$/! s/^ERR001 .*/& \/\/IGNORE/' inputfile
/\/\/IGNORE$/! - negated matching, ensures that line is not ended with //IGNORE (! - negation sign)
s/^ERR001 .*/& \/\/IGNORE/ - substitute a line that started with ERR001
The output:
ERR001 just some random text //IGNORE
ERR002 blah blah blah
ERR001 again some text //IGNORE
ERR001 blah blahblah blah blah //IGNORE
ERR002 abc def ghi

Idiomatic way to long multistring constants (or vars) in Lisp code

What is idiomatic way for inserting long multistrings vars or constants in Common Lisp code?
Is there something like HEREDOC in unix shell or some other languages, to eliminate indenting spaces inside string literals?
For example:
(defconstant +help-message+
"Blah blah blah blah blah
blah blah blah blah blah
some more more text here")
; ^^^^^^^^^^^ this spaces will appear - not good
And writing this way kinda ugly:
(defconstant +help-message+
"Blah blah blah blah blah
blah blah blah blah blah
some more more text here")
How we should write it. If there any way, when you don't need to escape quotes it'll be even better.
I don't know about idiomatic, but format can do this for you. (naturally. format can do anything.)
See Hyperspec section 22.3.9.3, Tilde newline. Undecorated, it removes both newline and subsequent whitespace. If you want to preserve the newline, use the # modifier:
(defconstant +help-message+
(format nil "Blah blah blah blah blah~#
blah blah blah blah blah~#
some more more text here"))
CL-USER> +help-message+
"Blah blah blah blah blah
blah blah blah blah blah
some more more text here"
There is no such thing.
Indentation is usually:
(defconstant +help-message+
"Blah blah blah blah blah
blah blah blah blah blah
some more more text here")
Maybe use a reader macro or read-time eval. Sketch:
(defun remove-3-chars (string)
(with-output-to-string (o)
(with-input-from-string (s string)
(write-line (read-line s nil nil) o)
(loop for line = (read-line s nil nil)
while line
do (write-line (subseq line 3) o)))))
(defconstant +help-message+
#.(remove-3-chars
"Blah blah blah blah blah
blah blah blah blah blah
some more more text here"))
CL-USER 18 > +help-message+
"Blah blah blah blah blah
blah blah blah blah blah
some more more text here
"
Would need more polishing... You could use 'string-trim' or similar.
I sometimes use this form:
(concatenate 'string "Blah blah blah"
"Blah blah blah"
"Blah blah blah"
"Blah blah blah")

Replace complete line getting number from variable

I have a file with a certain line, let's say...
AAA BBB CCC
I need to replace that entire line, after finding it, so I did:
q1=`grep -Hnm 1 "AAA" FILE | cut -d : -f 2`
That outputs me the line number of the first occurrence (in q1), because it has more than one occurrence, now, here comes my problem... In a previous step I was using this sed to replace a certain line in the file:
sed -e '3s/.*/WHATEVER/' FILE
To replace (in the example, line 3) the full line with WHATEVER, but now if I try to use $q1 instead of the "3" indicating the line number it doesn't work:
sed -e '$q1s/.*/WHATEVER/' FILE
It's probably a stupid syntax mistake, any help is welcome; thanks in advance
Try:
sed -e "${q1}s/.*/WHATEVER/" FILE
I'd use awk for this:
awk '/AAA/ && !r {print "WHATEVER"; r=1; next} {print}' <<END
a
b
AAA BBB CCC
d
e
AAA foo bar
f
END
a
b
WHATEVER
d
e
AAA foo bar
f
If you want to replace the first occurrence of a string in a file, you could use this awk script:
awk '/occurrence/ && !f++ {print "replacement"; next}1' file
The replacement will only be printed the first time, as !f++ will only evaluate to true once (on subsequent evaluations, f will be greater than zero so !f will be false. The 1 at the end is always true, so for each line other than the matched one, awk does the default action and prints the line.
Testing it out:
$ cat file
blah
blah
occurrence 1 and some other stuff
blah
blah
some more stuff and occurrence 2
blah
$ awk '/occurrence/ && !f++ {print "replacement"; next}1' file
blah
blah
replacement
blah
blah
some more stuff and occurrence 2
blah
The "replacement" string could easily be set to the value of a shell variable in the following way:
awk -v rep="$r" '/occurrence/ && !f++ {print rep; next}1' file
where $r is a shell variable.
Using the same file as above and the example variable in your comment:
$ q2="x=\"Second\""
$ awk -v rep="$q2" '/occurrence/ && !f++ {print rep; next}1' file
blah
blah
x="Second"
stuff
blah
blah
some more stuff and occurrence 2
blah
sed "${q1} c\
WHATEVER" YourFile
but you can directly use
sed '/YourPatternToFound/ {s/.*/WHATEVER/
:a
N;$!ba
}' YourFile

how to find specific lines when it hits a certain pattern

whenever it see a pattern of ; and abc[0] i need to print first line just after ; and a line which contain abc[0].
i have something like this
blah blah;
blah blah blah;
xyz blah blah,
blah blah
abc[2]
abc[1],
abc[0]
blah blah,
blah blah
abc[1],
abc[0]
blah blah
blah blah;
pqr blah blah
blah blah blah
abc[0]
required output is as shown below
xyz blah blah,
abc[0]
pqr blah blah
abc[0]
Thanks.
awk '/;/ { f=1; next } f{ print $0 ; f=0; next} /abc\[0\]/ { print }' inputfile
Explaination:
/;/ { f=1; next } - Set the flag to 1 when you encounter a line with `;` pattern.
Since I believe you want to print the line after the `;` and not one with the `;`
you do next to skip the entire pattern action statements
f{ print $0 ; f=0; next} - If the flag is true, you print the line, set the flag to false
and skip the rest.
/abc\[0\]/ { print } - If you find the second pattern you print it.
Using GNU Grep
With the input from your question stored as /tmp/corpus, the following will provide the correct output by filtering out context lines.
{ egrep -A1 ';|abc\[0\]' | egrep -v ';|^--'; } < /tmp/corpus
If you have GNU Grep, but your shell isn't Bash or doesn't support the redirection as above, the following pipeline is equivalent but (in my opinion) less readable:
egrep -A1 ';|abc\[0\]' /tmp/corpus | egrep -v ';|^--'
Pure GNU sed
sed -n '/;/ {:k n;h; // bk}; /abc\[0\]/ {x ;//! {p;x;p;x}}' file
xyz blah blah,
abc[0]
pqr blah blah
abc[0]

SED renaming with unknown amount of characters before a "

I have a file1 that has some PHP code in it. I need to find the following: action="blahblah" and replace it with action="error.php". Problem is, I don't know how many characters are between the quotes in the original.
Here's what I have that doesn't work:
sed 's:action="^[^"]*":action="error.php":' <file1> file2
How can I do this?
Why have you got the ^ start-of-line marker before the character class? Try it with:
sed 's:action="[^"]*":action="error.php":' <file1 > file2
Here's a transcript showing your version alongside that correction:
pax$ echo 'blah action="something" blah' | sed '
...$ s:action="^[^"]*":action="error.php":'
blah action="something" blah
pax$ echo 'blah action="something" blah' | sed '
...$ s:action="[^"]*":action="error.php":'
blah action="error.php" blah