The following command is correctly returning all the lines with warnings more than 0 from all files.
grep -i warning * | grep -v 'Warnings: 0' | more
I want to see the 4 lines above the warnings line where warning is more than 0. The -B4 switch does not work for obvious reasons.
If I understood your question correctly, here is a solution:
grep -v "Warnings: 0" * | grep -B4 -i warning
How about using a little regex instead:
grep -e "Warnings: [1-9][0-9]*" -B4 * | more
The grep should look for 1 or more warnings and print the previous 4 lines.
your are unclear if the warnings message should be include in the output or not.
In this version it is.
Try:
awk -F"[: ]" '($1 /Warning/ && $2> 0){a[NR]=$0;for (i=4;i>=0;i--) \
print a[NR-i]}{a[NR]=$0}' file
HTH Chris
Related
In a part of my script I am trying to generate a list of the year and month that a file was submitted. Since the file contains the timestamp, I should be able to cut the filenames to the month position, and then do a sort+uniq filtering. However sed is generating an outlier for one of the files.
I am using this command sequence
ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
And this works for most of time except in some cases it outputs the whole timestamp:
$ ls
service-parent-20181119092630.json service-parent-20181123134132.json service-parent-20181202124532.json service-parent-20190121091830.json service-parent-20190125124209.json
service-parent-20181119101003.json service-parent-20181126104300.json service-parent-20181211095939.json service-parent-20190121092453.json service-parent-20190128163539.json
service-parent-20181120095850.json service-parent-20181127083441.json service-parent-20190107035508.json service-parent-20190122093608.json
service-parent-20181120104838.json service-parent-20181129155835.json service-parent-20190107042234.json service-parent-20190122115053.json
$ ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
service-parent-201811
service-parent-201811201048
service-parent-201812
service-parent-201901
I have also tried this variation but the second output line is still returned:
ls -1 service*json | sed -e "s|\(.*201.\{3\}\).*json$|\1|g" | sort |uniq
Can somebody explain why service-parent-201811201048 is returned past the requested 3 characters?
Thanks.
service-parent-201811201048 happens to have 201048 to match 201....
Might try ls -1 service*json | sed -e "s|\(.*-201...\).*json$|\1|g" | sort |uniq to ask for a dash - before 201....
It is not recommended to parse the output of ls. Please try instead:
for i in service*json; do
sed -e "s|^\(service-.*-201[0-9]\{3\}\).*json$|\1|g" <<< "$i"
done | sort | uniq
Your problem is explained at https://stackoverflow.com/a/54565973/1745001 (i.e. .* is greedy) but try this:
$ ls | sed -E 's/(-[0-9]{6}).*/\1/' | sort -u
service-parent-201811
service-parent-201812
service-parent-201901
The above requires a sed that supports EREs via -E, e.g. GNU sed and OSX/BSD sed.
I appreciate your help with this problem. I like to eliminate everything that is not a specific pattern from a string.
For example, below I like to eliminate everything that is not "5TTGTC".
But as seen here ^5TTGTC is not right. I used different combinations of ^(), ^{}, ^[], but none gave me what I am looking for. Appreciate your feedback!
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed 's/^5TTGTC//g'
Thanks in advance
You may use the following command if you want case sensitivity:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|[,.A-Za-z+0-9]/\1/g'
The code above prints:
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
The regular expression used above uses alternation to capture what you are interested in.
We match and capture what we are interested in (5TTGCC) and we match everything that is not the substring, in this case characters ,.A-Za-z+0-9.
You can check the behaviour of the regex here.
As pointed out by #EdMorton, the command can be simplified to:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|./\1/g'
You can try this here.
For compatibility across sed versions the -r flag can be replaced by the -E flag.
You don't make it very clear what you are trying to achieve.
One way to get where you are trying to go could be the -o option in grep.
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | grep -o '5TTGTC'
Output:
5TTGTC
5TTGTC
5TTGTC
5TTGTC
5TTGTC
You can then change 5TTGTC into a pattern, e.g. grep -o '[0-9]TT[AG]GTC'
With any sed:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
sed 's/#//g; s/5TTGTC/#/g; s/[^#]//g; s/#/5TTGTC/g'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
With any awk:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
awk -v str='5TTGTC' '{gsub(str,"\n"); gsub(/[^\n]/,""); gsub(/\n/,str)}1'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
I'm trying to grab data from HTML output that looks like this:
<strong>Target1NoSpaces</strong><span class="creator"> ....
<strong>Target2 With Spaces</strong><span class="creator"> ....
I'm using a pipe train to whittle down the data to the targets I'm trying to hit. Here's my approach so far:
grep "/strong" output.html | awk '{print $1}'
Grep on "/strong" to get the lines with the targets; that works fine.
Pipe to 'awk '{print $1}'. That works in case #1 when the target has no spaces, but fails in case #2 when the target has spaces..only the first word is preserved as below:
<strong>Target1NoSpaces</strong><span
<strong>Target2
Do you have any tips on hitting the target properly, either in my awk or in different command? Anything quick and dirty (grep, awk, sed, perl) would be appreciated.
Try pup, a command line tool for processing HTML. For example:
$ pup 'strong text{}' < file.html
Target1NoSpaces
Target2 With Spaces
To search via XPath, try xpup.
Alternatively, for a well-formed HTML/XML document, try html-xml-utils.
One way using mojolicious and its DOM parser:
perl -Mojo -E '
g("http://your.web")
->dom
->find("strong")
->each( sub { if ( $t = shift->text ) { say $t } } )'
Using Perl regex's look-behind and look-ahead feature in grep. It should be simpler than using awk.
grep -oP "(?<=<strong>).*?(?=</strong>)" file
Output:
Target1NoSpaces
Target2 With Spaces
Add:
This implementation of Perl's regex's multi-matching in Ruby could match values in multiple lines:
ruby -e 'File.read(ARGV.shift).scan(/(?<=<strong>).*?(?=<\/strong>)/m).each{|e| puts "----------"; puts e;}' file
Input:
<strong>Target
A
B
C
</strong><strong>Target D</strong><strong>Target E</strong>
Output:
----------
Target
A
B
C
----------
Target D
----------
Target E
Here's a solution using xmlstarlet
xml sel -t -v //strong input.html
Trying to parse HTML without a real HTML parser is a bad idea. Having said that, here is a very quick and dirty solution to the specific example you provided. It will not work when there is more than one <strong> tag on a line, when the tag runs over more than one line, etc.
awk -F '<strong>|</strong>' '/<strong>/ {print $2}' filename
You never need grep with awk and the field separator doesn't have to be whitespace:
$ awk -F'<|>' '/strong/{print $3}' file
Target1NoSpaces
Target2 With Spaces
You should really use a proper parser for this however.
Since you tagged perl
perl -ne 'if(/(?:<strong>)(.*)(?:<\/strong>)/){print $1."\n";}' input.html
I am surprised no one mensions W3C HTML-XML-utils
curl -Ss https://stackoverflow.com/questions/18746957/parsing-html-on-the-command-line-how-to-capture-text-in-strong-strong |
hxnormalize -x |
hxselect -s '\n' strong
output:
<strong class="fc-black-750 mb6">Stack Overflow
for Teams</strong>
<strong>Teams</strong>
To capture only content:
curl -Ss https://stackoverflow.com/questions/18746957/parsing-html-on-the-command-line-how-to-capture-text-in-strong-strong |
hxnormalize -x |
hxselect -s '\n' -c strong
Stack Overflow
for Teams
Teams
If I have input file containing
statementes
asda
rertte
something
nothing here
I want to grep / extract (without using awk) every line from starting till I get the string "something". How can I do this? grep -B does not work since it needs the exact number of lines.
Desired output:
statementes
asda
rertte
something
it's not completely robust, but sure -B works... just make the -B count huge:
grep -B `wc -l <filename>` -e 'something' <filename>
You could use a bash while loop and exit early when you hit the string:
$ cat file | while read line; do
> echo $line
> if echo $line | grep -q something; then
> exit 0
> fi
> done
head -n `grep -n -e 'something' <filename> | cut -d: -f1` <filename>
There are often times I will grep -n whatever file to find what I am looking for. Say the output is:
1234: whatev 1
5555: whatev 2
6643: whatev 3
If I want to then just extract the lines between 1234 and 5555, is there a tool to do that? For static files I have a script that does wc -l of the file and then does the math to split it out with tail & head but that doesn't work out so well with log files that are constantly being written to.
Try using sed as mentioned on
http://linuxcommando.blogspot.com/2008/03/using-sed-to-extract-lines-in-text-file.html. For example use
sed '2,4!d' somefile.txt
to print from the second line to the fourth line of somefile.txt. (And don't forget to check http://www.grymoire.com/Unix/Sed.html, sed is a wonderful tool.)
The following command will do what you asked for "extract the lines between 1234 and 5555" in someFile.
sed -n '1234,5555p' someFile
If I understand correctly, you want to find a pattern between two line numbers. The awk one-liner could be
awk '/whatev/ && NR >= 1234 && NR <= 5555' file
You don't need to run grep followed by sed.
Perl one-liner:
perl -ne 'if (/whatev/ && $. >= 1234 && $. <= 5555) {print}' file
Line numbers are OK if you can guarantee the position of what you want. Over the years, my favorite flavor of this has been something like this:
sed "/First Line of Text/,/Last Line of Text/d" filename
which deletes all lines from the first matched line to the last match, including those lines.
Use sed -n with "p" instead of "d" to print those lines instead. Way more useful for me, as I usually don't know where those lines are.
Put this in a file and make it executable:
#!/usr/bin/env bash
start=`grep -n $1 < $3 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[0]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find start pattern!" 1>&2
exit 1
fi
stop=`tail -n +$start < $3 | grep -n $2 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[1]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find end pattern!" 1>&2
exit 1
fi
stop=$(( $stop + $start - 1))
sed "$start,$stop!d" < $3
Execute the file with arguments (NOTE that the script does not handle spaces in arguments!):
Starting grep pattern
Stopping grep pattern
File path
To use with your example, use arguments: 1234 5555 myfile.txt
Includes lines with starting and stopping pattern.
If I want to then just extract the lines between 1234 and 5555, is
there a tool to do that?
There is also ugrep, a GNU/BSD grep compatible tool but one that offers a -K option (or --range) with a range of line numbers to do just that:
ugrep -K1234,5555 -n '' somefile.log
You can use the usual GNU/BSD grep options and regex patterns (but it also offers a lot more such as -K.)
If you want lines instead of line ranges, you can do it with perl: eg. if you want to get line 1, 3 and 5 from a file, say /etc/passwd:
perl -e 'while(<>){if(++$l~~[1,3,5]){print}}' < /etc/passwd