I want to process cron file with the time and cron entry into different columns of DB.
cat root | awk '{print $1, $2, $3, $4, $5, $6}'
47 * * * * string=`find somefile`; echo $string > success.txt 2> err.txt
It is easy to awk the first 5 placeholders of the cron using awk. But how do I select the actual cron entry?
In the above example I want to select every thing from "string" to "$string"
I do also want to select the standard and error out file paths. i.e. success.txt and err.txt
update:
awk '{print $NF}'
The above works for the last variable, but the following does not to find the second-last.
awk '{print $NF-2}'
A simple approach to get rid of the initial fields is to use cut:
cut -d' ' --complement -f1-5
I believe this requires the GNU version of cut, as the --complement flag is an extension.
I'd use awk to split the rest into fields:
awk '{
outfile=$(NF-2)
errfile=$NF
for(n=NF; n>(NF-4); n--) { $n = ""}
printf("command: %s\noutput: %s\nerror: %s\n", $0, outfile, errfile)
}'
Note that this approach involves a few assumptions:
whitespace can be standardized to single spaces
both stdout and stderr are redirected
no spaces are present in the names for the output and error files
Related
Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"
I have a file as below
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.70.xxx.xx)
NAME(BOLIVIA) TYPE(SA)
APPLIC(Java) IP(192.71.xxx.xx)
I am trying to extract the values NAME and IP using sed:
cat file1 |
sed ':a
N
$!ba
s/\n/ /g' | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
However, I'm only getting the output:
NAME(BOLIVIA) IP(192.71.xxx.xx)
What I would like is:
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)
Would appreciate it if someone could give me a pointer on what I'm missing.
TIA
Your first sed commands reformats the file into one long line. You could have used tr -d "\n" for this, but that is not the problem.
The problem is in the second part, where the .* greedy eats as much as possible until finding the last match.
Your solution could be "fixed" with the ugly
# Do not use this:
sed -zn 's/[^\n]*\(NAME(BOLI...)\)[^\n]*\n[^\n]*\(IP([^)]*)\)[^\n]*/\1 \2/gp' file1
Possible solutions:
cat file1 | paste -d " " - - | sed -n 's/.*\(NAME(BOLI...)\).*\(IP(.*)\).*/\1 \2/p'
# or
grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1 | paste -d " " - -
# or
printf "%s %s\n" $(grep -Eo "(NAME\(BOLI...\)|IP\(.*\))" file1)
In case you are ok with awk could you please try following. Written and tested in link
https://ideone.com/bJDzgf with shown samples only.
awk '
match($0,/^NAME\([^)]*/){
name=substr($0,RSTART+5,RLENGTH-5)
next
}
match($0,/IP\([^)]*/){
print name,substr($0,RSTART+3,RLENGTH-3)
name=""
}
' Input_file
This might work for you (GNU sed):
sed -n '/NAME/{N;/IP/s/\s.*\s/ /p}' file
If a line contains NAME and the following line contains IP remove everything between and print the result.
An alternative shorter awk:
awk '$1 ~ /^NAME/ {nm = $1} $2 ~ /^IP/ {print nm, $2}' file
NAME(BOLIVIA) IP(192.70.xxx.xx)
NAME(BOLIVIA) IP(192.71.xxx.xx)
The issue in your script is the use .* which matches in a greedy way
so that you have only the first NAME(BOLI...) and last IP(.*)
If you can use python :
#!/bin/bash
python -c '
import re, sys
for ar in re.findall(r"(NAME\(BOLI.*?\)).*?(IP\(.*?\))", sys.stdin.read(), re.DOTALL):
print(*ar)
' < input-file
Which is the simple and fast UNIX command to print all lines from the last occurrence of a pattern to the end of the file ?
sed -n '/pattern/,$p' file
This sed command prints from the first occurrence onwards.
This might work for you (GNU sed):
sed 'H;/pattern/h;$!d;x;//!d' file
Stashes the last pattern and following lines in the hold space and at end-of-file prints them out.
Or using the same method in awk:
awk '{x=x ORS $0};/pattern/{x=$0};END{if(x ~ //)print x}' file
However on my machine jaypals way with sed seems to be the quickest:
tac file | sed '/pattern/q' | tac
Reverse the file, print until the first pattern, exit and reverse the file.
tac file | awk '/pattern/{print;exit}1' | tac
Here's a Perlish way to do it:
perl -ne '$seen = 1, #a = () if /pattern/; push #a, $_; END { print #a if $seen }' file
Simplest solution is just to use a regex matching on the entire file:
perl -0777 -ne 'print $1 if /pattern(.*?)$/' file
A standalone awk:
awk '/pattern/{delete a;c=0}{a[c++]=$0}END{for (i=0;i<c;i++){print a[i]}}' file
Here is an pure awk
awk 'FNR==NR {if ($0~/pattern/) f=FNR;next} FNR==f {a=1}a' file{,}
It reads the file twice, and first time set a flag for last found of pattern, then print form pattern and out.
Or you can store data in an array like this:
awk '/pattern/ {f=NR} {a[NR]=$0} END {for (i=f;i<=NR;i++) print a[i]}' file
Using GNU awk for multi-char RS and gensub():
$ awk -v RS='^$' -v ORS= '{print gensub(/.*(pattern)/,"\\1","")}' file
e.g.:
$ cat file
a
b
c
b
d
$ awk -v RS='^$' -v ORS= '{print gensub(/.*(b)/,"\\1","")}' file
b
d
The above simply deletes from the start of the file up to just before the last occurrence of "b".
I have a file with three columns. I would like to delete the 3rd column(in-place editing). How can I do this with awk or sed?
123 abc 22.3
453 abg 56.7
1236 hjg 2.3
Desired output
123 abc
453 abg
1236 hjg
try this short thing:
awk '!($3="")' file
With GNU awk for inplace editing, \s/\S, and gensub() to delete
1) the FIRST field:
awk -i inplace '{sub(/^\S+\s*/,"")}1' file
or
awk -i inplace '{$0=gensub(/^\S+\s*/,"",1)}1' file
2) the LAST field:
awk -i inplace '{sub(/\s*\S+$/,"")}1' file
or
awk -i inplace '{$0=gensub(/\s*\S+$/,"",1)}1' file
3) the Nth field where N=3:
awk -i inplace '{$0=gensub(/\s*\S+/,"",3)}1' file
Without GNU awk you need a match()+substr() combo or multiple sub()s + vars to remove a middle field. See also Print all but the first three columns.
This might work for you (GNU sed):
sed -i -r 's/\S+//3' file
If you want to delete the white space before the 3rd field:
sed -i -r 's/(\s+)?\S+//3' file
It seems you could simply go with
awk '{print $1 " " $2}' file
This prints the two first fields of each line in your input file, separated with a space.
Try using cut... its fast and easy
First you have repeated spaces, you can squeeze those down to a single space between columns if thats what you want with tr -s ' '
If each column already has just one delimiter between it, you can use cut -d ' ' -f-2 to print fields (columns) <= 2.
for example if your data is in a file input.txt you can do one of the following:
cat input.txt | tr -s ' ' | cut -d ' ' -f-2
Or if you better reason about this problem by removing the 3rd column you can write the following
cat input.txt | tr -s ' ' | cut -d ' ' --complement -f3
cut is pretty powerful, you can also extract ranges of bytes, or characters, in addition to columns
excerpt from the man page on the syntax of how to specify the list range
Each LIST is made up of one range, or many ranges separated by commas.
Selected input is written in the same order that it is read, and is
written exactly once. Each range is one of:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte, character or field
-M from first to M'th (included) byte, character or field
so you also could have said you want specific columns 1 and 2 with...
cat input.txt | tr -s ' ' | cut -d ' ' -f1,2
Try this :
awk '$3="";1' file.txt > new_file && mv new_file file.txt
or
awk '{$3="";print}' file.txt > new_file && mv new_file file.txt
Try
awk '{$3=""; print $0}'
If you're open to a Perl solution...
perl -ane 'print "$F[0] $F[1]\n"' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the following perl code
I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred
I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file
As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile
This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed
cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.