grep all lines from start of file to line containing a string - command-line

If I have input file containing
statementes
asda
rertte
something
nothing here
I want to grep / extract (without using awk) every line from starting till I get the string "something". How can I do this? grep -B does not work since it needs the exact number of lines.
Desired output:
statementes
asda
rertte
something

it's not completely robust, but sure -B works... just make the -B count huge:
grep -B `wc -l <filename>` -e 'something' <filename>

You could use a bash while loop and exit early when you hit the string:
$ cat file | while read line; do
> echo $line
> if echo $line | grep -q something; then
> exit 0
> fi
> done

head -n `grep -n -e 'something' <filename> | cut -d: -f1` <filename>

Related

Removing a specific line in bash with an exact string

I'm having trouble in getting sed to remove just the specific line I want. Let's say I have a file that looks like this:
testfile
testfile.txt
testfile2
Currently I'm using this to remove the line I want:
sed -i "/$1/d" file
The issue is that with this if I were to give testfile as input it would delete all three lines but I want it to only remove the first line. How do I do this?
With grep
grep -x -F -v -- "$1" file
# or
grep -xFv -- "$1" file
-F is for "fixed strings" -- turns off regex engine.
-x is to match entire line.
-v is for "everything but" the matched line(s).
-- to signal the end of options, in case $1 starts with a hyphen.
To save the file
grep -xFv -- "$1" file | sponge file # `moreutils` package
# or
tmp=$(mktemp)
grep -xFv -- "$1" file > "$tmp" && mv "$tmp" file
So match the whole line.
var=testfile
sed -i '/^'"$var"'$/d' file
# or with " quoting
sed -i "/^$var\$/d" file
You can learn regex with fun online with regex crosswords.

search and select decimal numbers in a text file line

I have xml textfiles which contain lines of multiple numbers (3) separated by tabs/spaces, from which I would like to select the each set of numbers separately.
From:
<tagname1> 110.0912 99.1234 55.1326 </tagname1>
Result:
110.0912
and:
99.1234
and:
55.1326
I would like to use sed, awk, grep, etc. perl is fine too. Seems simple, but can't figure out a cleaner line. I've tried:
more FILENAME | grep tagname1 | grep -E -o "[0-9]+*\.[0-9]+" | head -n 1
perl -MRegexp::Common -nE 's/<.*?>//g; say for /($RE{num}{real})/g' file
You can use grep -o option.
$ cat file
<tagname1> 110.0912 99.1234 55.1326 </tagname1>
$ grep -oE '\b[0-9.]+\b' file
110.0912
99.1234
55.1326
\b defines a word boundary
[0-9.]+ is a character class suggesting match numbers and . one or more times
-o option prints matched pattern only
awk -v which=2 '/<tagname1>(([0-9]*(\.[0-9]*)?)|[ \t])*<\/tagname1>/ {print $(which+1)}' input.txt
Select which number you want to be printed using the variable which in this example it will print the second number which=2
input.txt:
<tagname1> 110.0912 99.1234 55.1326 </tagname1>
You can use awk
awk '{print $2,$3,$4}' OFS="\n" file
110.0912
99.1234
55.1326
$ cat file
<tagname1> 110.0912 99.1234 55.1326 </tagname1>
$ awk -v tag="tagname1" -v nr=1 '$0~"<"tag">"{print $(nr+1)}' file
110.0912
$ awk -v tag="tagname1" -v nr=2 '$0~"<"tag">"{print $(nr+1)}' file
99.1234
$ awk -v tag="tagname1" -v nr=3 '$0~"<"tag">"{print $(nr+1)}' file
55.1326

AWK/SED. How to remove parentheses in simple text file

I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred
I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file
As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile
This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed
cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.

How can I check whether a piped content is text with perl

I've written a svn-hook for text files. The content test looks like this:
svnlook cat -t $txn $repos $file 2>/dev/null | file - | egrep -q 'text$'
and I was wondering if this could be done with Perl. However something like this doesn't work:
svnlook cat -t $txn $repos $file 2>/dev/null | perl -wnl -e '-T' -
I'm testing the exit status of this invocation ($?) to see if the given file was text or binary. Since I'm getting the content out of svn. I can't use perl's normal file check.
I've done a simulation with the file program and perl with a text and binary file (text.txt, icon.png):
find -type f | xargs -i /bin/bash -c 'if $(cat {} | file - | egrep -q "text$"); then echo "{}: text"; else echo "{}: binary"; fi'
./text.txt: text
./icons.png: binary
find -type f | xargs -i /bin/bash -c 'if $(cat {} | perl -wln -e "-T;"); then echo "{}: text"; else echo "{}: binary"; fi'
./text.txt: text
./icons.png: text
You're testing perl's exit code, but you never set it. You need
perl -le'exit(-T STDIN ?0:1)' < file

How to "grep" out specific line ranges of a file

There are often times I will grep -n whatever file to find what I am looking for. Say the output is:
1234: whatev 1
5555: whatev 2
6643: whatev 3
If I want to then just extract the lines between 1234 and 5555, is there a tool to do that? For static files I have a script that does wc -l of the file and then does the math to split it out with tail & head but that doesn't work out so well with log files that are constantly being written to.
Try using sed as mentioned on
http://linuxcommando.blogspot.com/2008/03/using-sed-to-extract-lines-in-text-file.html. For example use
sed '2,4!d' somefile.txt
to print from the second line to the fourth line of somefile.txt. (And don't forget to check http://www.grymoire.com/Unix/Sed.html, sed is a wonderful tool.)
The following command will do what you asked for "extract the lines between 1234 and 5555" in someFile.
sed -n '1234,5555p' someFile
If I understand correctly, you want to find a pattern between two line numbers. The awk one-liner could be
awk '/whatev/ && NR >= 1234 && NR <= 5555' file
You don't need to run grep followed by sed.
Perl one-liner:
perl -ne 'if (/whatev/ && $. >= 1234 && $. <= 5555) {print}' file
Line numbers are OK if you can guarantee the position of what you want. Over the years, my favorite flavor of this has been something like this:
sed "/First Line of Text/,/Last Line of Text/d" filename
which deletes all lines from the first matched line to the last match, including those lines.
Use sed -n with "p" instead of "d" to print those lines instead. Way more useful for me, as I usually don't know where those lines are.
Put this in a file and make it executable:
#!/usr/bin/env bash
start=`grep -n $1 < $3 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[0]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find start pattern!" 1>&2
exit 1
fi
stop=`tail -n +$start < $3 | grep -n $2 | head -n1 | cut -d: -f1; exit ${PIPESTATUS[1]}`
if [ ${PIPESTATUS[0]} -ne 0 ]; then
echo "couldn't find end pattern!" 1>&2
exit 1
fi
stop=$(( $stop + $start - 1))
sed "$start,$stop!d" < $3
Execute the file with arguments (NOTE that the script does not handle spaces in arguments!):
Starting grep pattern
Stopping grep pattern
File path
To use with your example, use arguments: 1234 5555 myfile.txt
Includes lines with starting and stopping pattern.
If I want to then just extract the lines between 1234 and 5555, is
there a tool to do that?
There is also ugrep, a GNU/BSD grep compatible tool but one that offers a -K option (or --range) with a range of line numbers to do just that:
ugrep -K1234,5555 -n '' somefile.log
You can use the usual GNU/BSD grep options and regex patterns (but it also offers a lot more such as -K.)
If you want lines instead of line ranges, you can do it with perl: eg. if you want to get line 1, 3 and 5 from a file, say /etc/passwd:
perl -e 'while(<>){if(++$l~~[1,3,5]){print}}' < /etc/passwd