need to delete the entire line except the matching strings - perl

What I need is:
I need to delete the entire line but need to keep the matching strings.
matching pattern starting with Unhandled and ending with a :
I tried the below code which prints the matching pattern, but I need to delete the extra lines from the file.
perl -0777 -ne 'print "Unhandled error at$1\n" while /Unhandled\ error\ at(.*?):/gs' filename
Below is the sample input:
2012-04-09 01:52:13,717 - uhrerror - ERROR - 22866 - /home/shabbir/web/middleware.py process_exception - 217 - Unhandled error at /user/resetpassword/: : {'mod_wsgi.listener_port': '8080', 'HTTP_COOKIE': "__utma=1.627673239.1309689718.1333823126.1333916263.156; __utmz=1.1333636950.152.101.utmgclid=CMmkz934na8CFY4c6wod_R8JbA|utmccn=(not%20set)|utmcmd=(not%20set)|utmctr=non-stick%20kadai%20online; subpopdd=yes; _msuuid_1690zlm11992=FCC09820-3004-413A-97A3-1088EE128CE9; _we_wk_ls_=%7Btime%3A'1322900804422'%7D; _msuuid_lf2uu38ua0=08D1CEFE-3C19-4B9E-8096-240B92BA0ADD; nevermissadeal=True; sessionid=c1e850e2e7db09e98a02415fc1ef490; __utmc=1; __utmb=1.7.10.1333916263; 'wsgi.file_wrapper': , 'HTTP_ACCEPT_ENCODING': 'gzip, deflate'}

The code you gave already provides the requested behaviour.
That said, there's a huge redundant string in your program you can eliminate.
perl -0777nE'say $1 while /(Unhandled error at .*?):/gs' filename
Finally, slurping the entire file seems entirely superfluous.
perl -nE'say $1 if /(Unhandled error at .*?):/g' filename

perl -0777 -i -pe 's/.*?(Unhandled error .*?):.*/$1/g' filename
This will replace error block with matched string in the file.
-0777 : will force Perl to read the whole file in one shot.
-i : means in-place editing of files.
-p : means loop line-by-line through contents of file,execute code in single quotes i.e.'s/.*?(Unhandled error .*?):.*/$1/g',and print the result(matched string),which is written back to file using -i option.
-e : for command-line

If one match is all you want to keep from the whole string, you could replace the string value with the match afterwards. (i.e. Simply assign the new value)
If you have several matches within the string, the least complicated method may be to store the matches temporarily in an array. Then just discard the original variable if you don't need it anymore.

I would use -l option to handle line endings (less version dependent, prints a new line for each match), and a for loop to print all the matches, not just the first one $1. No need to slurp the file with -0777.
perl -nwle 'print for /Unhandled error at .*?:/g'
Note that with the /g modifier, a capturing parenthesis is not required.
If only one (the first) match is to be printed, /g is redundant and you can just use $1:
perl -nlwe 'print $1 if /(Unhandled error at .*?):/'

Related

How to replace a comment line which start with specific characters

I have a fortran code with global comments, which start with a double exclamation mark (i.e., !!) and personal comments, which start with a single exclamation mark (i.e., !), and I just want to hide my personal comment lines (or substitute the line with another line, e.g., '! jw'). For example, the original code looks like this:
!! This is a global comment
Code..
Code..
! This is a personal comment
code... ! This is a personal comment
!! This is a global comment
code...
Then, I want to update the original code as:
!! This is a global comment
Code..
Code..
! jw
code... ! jw
!! This is a global comment
code...
I have tried to use "sed" and "awk", but I failed. So, would someone can help me? I prefer to use "sed" instead "awk" by the way.
Use Perl one-liner with negative lookbehind pattern:
perl -pe 's/(?<!!)!\s.*/! jw/' in_file > out_file
To change the file in-place:
perl -i.bak -pe 's/(?<!!)!\s.*/! jw/' in_file
To change multiple files in-place, for example ex*.f90 files:
perl -i.bak -pe 's/(?<!!)!\s.*/! jw/' ex*.f90
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
(?<!!)! : Exclamation point that is not preceded by an exclamation point.
\s : Whitespace.
.* : Any character, repeated 0 or more times.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlre: Negative lookbehind
perldoc perlrequick: Perl regular expressions quick start
sed '/!!/!s/!.*/! jw/' file
/!!/! If the line does not contain !!, then
s/!.*/! jw/ substitute all following a exclamation mark with ! jw.
awk 'BEGIN{FS=OFS="!"}$2{$2=" jw"}1' file
BEGIN{FS=OFS="!"} Set the field separators to !.
$2{$2=" jw"} If the 2nd field is not empty, substitute it by jw.
1 Print the line.
If the line starts with ! then you could do something like
sed 's/^! /! jw/' mycode.fortran >newcodefile.fortran
I would put it into a new file then rename after. If you overwrite your file you could end up cause problems if anything goes wrong.
The s in the string to sed tells it to search, and replace.
the ^ means start of line, so if the comment is further in the line than the beginning this won't find that comment.
Then we search for a line that starts with ! followed by a space and replace with ! jw
If you just run it as:
sed 's/^! /! jw/' mycode.fortran
without redirecting the output to a file it will stream the output to your console so you can see if it's working. Then run it again output to a file with the redirect >, check the file then do your renaming. Don't get rid of your original code file until your completely sure it worked and didn't do anything you didn't want.

Using sed, prepend line only once, if there's a match later in file content

I'd like to add a line on top of my output if my input file has a specific word.
However, if I'm just looking for specific string, then as I understand it, it's too late. The first line is already in the output and I can't prepend to it anymore.
Here's an exemple of input.
one
two
two
three
If I can find a line with, say, the word two, I'd like to add a new line before the first one, with for example FOUND. I want that line prepended only once, even if there are several matches.
So an input file without any two would remain unchanged, and the example file above would become:
FOUND
one
two
two
three
I know how to prepend with i\, but can't get the context right. From what I understood that would be around:
1{
/two/{ # This will will search "two" in the first line, how to look for it in the whole file ?
1i\
FOUND
}
}
EDIT:
I know how to do it using other languages/methods, that's not my question.
Sed has advanced features to work on several lines at once, append/prepend lines and is not limited to substitution. I have a sed file already filled with expressions to modify a python source file, which is why I'd prefer to avoid using something else. I want to be able to add an import at the beginning of a file if a certain class is used.
A Perl solution:
perl -i.bak -0077 -pE 'say "FOUND" if /two/;' in_file
The Perl one-liner uses these command line flags:
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
-E : Tells Perl to look for code in-line, instead of in a file. Also enables all optional features. Here, enables say.
-0777 : Slurp files whole.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
sed is for doing s/old/new on individual strings, that's not what you're trying to do so you shouldn't bother trying to use sed. There's lots of ways to do this, this one will be very efficient, robust and portable to all Unix systems:
$ grep -Fq 'two' file && echo "FOUND"; cat file
FOUND
one
two
two
three
To operate on a stream instead of (or in addition to) a file and without needing to read the whole input into memory:
awk 'f{print; next} {buf[NR]=$0} /two/{print "FOUND"; for (i=1;i<=NR;i++) print buf[i]; f=1}'
e.g.:
$ cat file | awk 'f{print; next} {buf[NR]=$0} /two/{print "FOUND"; for (i=1;i<=NR;i++) print buf[i]; f=1}'
FOUND
one
two
two
three
That awk script will also work using any awk in any shell on every Unix box.

Perl one-liner: deleting a line with pattern matching

I am trying to delete bunch of lines in a file if they match with a particular pattern which is variable.
I am trying to delete a line which matches with abc12, abc13, etc.
I tried writing a C-shell script, and this is the code:
**!/bin/csh
foreach $x (12 13 14 15 16 17)
perl -ni -e 'print unless /abc$x/' filename
end**
This doesn't work, but when I use the one-liner without a variable (abc12), it works.
I am not sure if there is something wrong with the pattern matching or if there is something else I am missing.
Yes, it's the fact you're using single quotes. It means that $x is being interpreted literally.
Of course, you're also doing it very inefficiently, because you're processing each file multiple times.
If you're looking to remove lines abc12 to abc17 you can do this all in one go:
perl -n -i.bak -e 'print unless m/abc1[234567]/' filename
Try this
perl -n -i.bak -e 'print unless m/abc1[2-7]/' filename
using the range [2-7] only removes the need to type [234567] which has the effect of saving you three keystrokes.
man 1 bash: Pattern Matching
[...] Matches any one of the enclosed characters. A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale's collating sequence and character set, is matched. If the first character following the [ is a ! or a ^ then any character not enclosed is matched.
A - may be matched by including it as the first or last character in the set. A ] may be matched by including it as the first character in the set.

Using command line to remove text?

I have a huge file that contains lines that follow this format:
New-England-Center-For-Children-L0000392290
Southboro-Housing-Authority-L0000392464
Crew-Star-Inc-L0000391998
Saxony-Ii-Barber-Shop-L0000392491
Test-L0000392334
What I'm trying to do is narrow it down to just this:
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Test
Can anyone help with this?
Using GNU awk:
awk -F\- 'NF--' OFS=\- file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Set the input and output field separator to -.
NF contains number of fields. Reduce it by 1 to remove the last field.
Using sed:
sed 's/\(.*\)-.*/\1/' file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Simple greedy regex to match up to the last hyphen.
In replacement use the captured group and discard the rest.
Version 1 of the Question
The first version of the input was in the form of HTML and parts had to be removed both before and after the desired text:
$ sed -r 's|.*[A-Z]/([a-zA-Z-]+)-L0.*|\1|' input
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
Version 2 of the Question
In the revised question, it is only necessary to remove the text that starts with -L00:
$ sed 's|-L00.*||' input2
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Both of these commands use a single "substitute" command. The command has the form s|old|new|.
The perl code for this would be: perl -nle'print $1 if(m{-.*?/(.*?-.*?)-})
We can break the Regex down to matching the following:
- for that's between the city and state
.*? match the smallest set of character(s) that makes the Regex work, i.e. the State
/ matches the slash between the State and the data you want
( starts the capture of the data you are interested in
.*?-.*? will match the data you care about
) will close out the capture
- will match the dash before the L####### to give the regex something to match after your data. This will prevent the minimal Regex from matching 0 characters.
Then the print statement will print out what was captured (your data).
awk likes these things:
$ awk -F[/-] -v OFS="-" '{print $(NF-3), $(NF-2)}' file
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
This sets / and - as possible field separators. Based on them, it prints the last_field-3 and last_field-2 separated by the delimiter -. Note that $NF stands for last parameter, hence $(NF-1) is the penultimate, etc.
This sed is also helpful:
$ sed -r 's#.*/(\w*-\w*)-\w*\.\w*</loc>$#\1#' file
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
It selects the block word-word after a slash / and followed with word.word</loc> + end_of_line. Then, it prints back this block.
Update
Based on your new input, this can make it:
$ sed -r 's/(.*)-L\w*$/\1/' file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
It selects everything up to the block -L + something + end of line, and prints it back.
You can use also another trick:
rev file | cut -d- -f2- | rev
As what you want is every slice of - separated fields, let's get all of them but last one. How? By reversing the line, getting all of them from the 2nd one and then reversing back.
Here's how I'd do it with Perl:
perl -nle 'm{example[.]com/bp/(.*?)/(.*?)-L\d+[.]htm} && print $2' filename
Note: the original question was matching input lines like this:
<loc>http://www.example.com/bp/Lowell-MA/Special-Restaurant-L0000423916.htm</loc>
<loc>http://www.example.com/bp/Houston-TX/Eliot-Cleaning-L0000422797.htm</loc>
<loc>http://www.example.com/bp/New-Orleans-LA/Kennedy-Plumbing-L0000423121.htm</loc>
The -n option tells Perl to loop over every line of the file (but not print them out).
The -l option adds a newline onto the end of every print
The -e 'perl-code' option executes perl-code for each line of input
The pattern:
/regex/ && print
Will only print if the regex matches. If the regex contains capture parentheses you can refer to the first captured section as $1, the second as $2 etc.
If your regex contains slashes, it may be cleaner to use a different regex delimiter ('m' stands for 'match'):
m{regex} && print
If you have a modern Perl, you can use -E to enable modern feature and use say instead of print to print with a newline appended:
perl -nE 'm{example[.]com/bp/(.*?)/(.*?)-L\d+[.]htm} && say $2' filename
This is very concise in Perl
perl -i.bak -lpe's/-[^-]+$//' myfile
Note that this will modify the input file in-place but will keep a backup of the original data in called myfile.bak

Removing text with command line?

I have a huge list of locations in this form in a text file:
ar,casa de piedra,Casa de Piedra,20,,-49.985133,-68.914673
gr,riziani,Ríziani,18,,39.5286111,20.35
mx,tenextepec,Tenextepec,30,,19.466667,-97.266667
Is there any way with command line to remove everything that isn't between the first and second commas? For example, I want my list to look like this:
casa de piedra
riziani
tenextepec
with Perl
perl -F/,/ -ane 'print $F[1]."\n"' file
Use cut(1):
cut -d, -f2 inputfile
With perl:
perl -pe 's/^.*?,(.*?),.*/$1/' filename
Breakdown of the above code
perl - the command to use the perl programming language.
-pe - flags.
e means "run this as perl code".
p means:
Set $_ to the first line of the file (given by filename)
Run the -e code
Print $_
Repeat from step 1 with the next line of the file
what -p actually does behind the scenes is best explained here.
s/.*?,(.*?),.*/$1/ is a regular expression:
s/pattern/replacement/ looks for pattern in $_ and replaces it with replacement
.*? basically means "anything" (it's more complicated than that but outside the scope of this answer)
, is a comma (nothing special)
() capture whatever is in them and save it in $1
.* is another (slightly different) "anything" (this time it's more like "everything")
$1 is what we captured with ()
so the whole thing basically says to search in $_ for:
anything
a comma
anything (save this bit)
another comma
everything
and replace it with the bit it saved. This effectively saves the stuff between the first and second commas, deletes everything, and then puts what it saved into $_.
filename is the name of your text file
To review, the code goes through your file line by line, applies the regular expression to extract your needed bit, and then prints it out.
If you want the result in a file, use this:
perl -pe 's/^.*?,(.*?),.*/$1/' filename > out.txt
and the result goes into a file named out.txt (that will be placed wherever your terminal is pointed to at the moment.) What this pretty much does is tell the terminal to print the command's result to a file instead of on the screen.
Also, if it isn't crucial to use the command line, you can just import into Excel (it's in CSV format) and work with it graphically.
With awk:
$ awk -F ',' '{ print $2 }' file