using sed to grep pattern from log file - sed

I have the following log line
2021-10-28 10:26:19,624 INFO [Native-Thread-7f9e6effd700] idol.nifi.connector.GetWeb GetWeb[id=c59c415c-017c-1000-195c-e85818b0a032] Processing: [depth:0] https://qed.qld.gov.au/about-us/rti/disclosure-log/disclosure-log-2015
I want to extract the date and everything that comes after the word Processing: so that is looks like
2021-10-28 10:26:19 [depth:0] https://qed.qld.gov.au/about-us/rti/disclosure-log/disclosure-log-2015
I am not sure how to achieve this with grep or sed?
#!/bin/bash cat ./logs/*.log | grep Processing: | grep -E "(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d)"
is as far as I got.

sed 's/,.*Processing://' file
Output:
2021-10-28 10:26:19 [depth:0] https://qed.qld.gov.au/about-us/rti/disclosure-log/disclosure-log-2015
See: man sed and The Stack Overflow Regular Expressions FAQ

Related

Why is sed returning more characters than requested

In a part of my script I am trying to generate a list of the year and month that a file was submitted. Since the file contains the timestamp, I should be able to cut the filenames to the month position, and then do a sort+uniq filtering. However sed is generating an outlier for one of the files.
I am using this command sequence
ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
And this works for most of time except in some cases it outputs the whole timestamp:
$ ls
service-parent-20181119092630.json service-parent-20181123134132.json service-parent-20181202124532.json service-parent-20190121091830.json service-parent-20190125124209.json
service-parent-20181119101003.json service-parent-20181126104300.json service-parent-20181211095939.json service-parent-20190121092453.json service-parent-20190128163539.json
service-parent-20181120095850.json service-parent-20181127083441.json service-parent-20190107035508.json service-parent-20190122093608.json
service-parent-20181120104838.json service-parent-20181129155835.json service-parent-20190107042234.json service-parent-20190122115053.json
$ ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
service-parent-201811
service-parent-201811201048
service-parent-201812
service-parent-201901
I have also tried this variation but the second output line is still returned:
ls -1 service*json | sed -e "s|\(.*201.\{3\}\).*json$|\1|g" | sort |uniq
Can somebody explain why service-parent-201811201048 is returned past the requested 3 characters?
Thanks.
service-parent-201811201048 happens to have 201048 to match 201....
Might try ls -1 service*json | sed -e "s|\(.*-201...\).*json$|\1|g" | sort |uniq to ask for a dash - before 201....
It is not recommended to parse the output of ls. Please try instead:
for i in service*json; do
sed -e "s|^\(service-.*-201[0-9]\{3\}\).*json$|\1|g" <<< "$i"
done | sort | uniq
Your problem is explained at https://stackoverflow.com/a/54565973/1745001 (i.e. .* is greedy) but try this:
$ ls | sed -E 's/(-[0-9]{6}).*/\1/' | sort -u
service-parent-201811
service-parent-201812
service-parent-201901
The above requires a sed that supports EREs via -E, e.g. GNU sed and OSX/BSD sed.

Sed not matching one or more patterns

I have this list of files:
$ more files
one_this_2017_1_abc.txt
two_that_2018_1_abc.txt
three_another_2017_10.abc.txt
four_again_2018_10.abc.txt
five_back_2018_1a.abc.txt
I would like to get this output:
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_YY.abc.txt
four_again_XXXX_YY.abc.txt
five_back_XXXX_YY.abc.txt
I am trying to remove the year and the bit after the year and replace them with another string--this is to generate test cases.
I can get the year just fine, but it's that one or two character piece after it I can't seem to match.
This should work, right?
~/test_cases
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[[:alnum:]]\{1,2\}_/_YY_/'
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
Except it doesn't for the 2 character cases.
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[[:alnum:]]\
{2\}_/_YY_/'
one_this_XXXX_1_abc.txt
two_that_XXXX_1_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
Doesn't work for the two character cases either, and this works not at all (but according to the docs it should):
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[[:alnum:]]\+_/_YY_/'
one_YY_XXXX_1_abc.txt
two_YY_XXXX_1_abc.txt
three_YY_XXXX_10.abc.txt
four_YY_XXXX_10.abc.txt
five_YY_XXXX_1a.abc.txt
Other random experiments that don't work:
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[a-zA-Z0-9]\+_/_YY_/'
one_YY_XXXX_1_abc.txt
two_YY_XXXX_1_abc.txt
three_YY_XXXX_10.abc.txt
four_YY_XXXX_10.abc.txt
five_YY_XXXX_1a.abc.txt
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[a-zA-Z0-9]\{1\}_/_YY_/'
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
$ cat files | sed -e 's/_[[:digit:]]\{4\}_/_XXXX_/' -e 's/_[a-zA-Z0-9]\{2\}_/_YY_/'
one_this_XXXX_1_abc.txt
two_that_XXXX_1_abc.txt
three_another_XXXX_10.abc.txt
four_again_XXXX_10.abc.txt
five_back_XXXX_1a.abc.txt
Tried with both GNU sed version 4.2.1 under Linux and sed (GNU sed) 4.4 under Cygwin.
And yes, I realize I can pipe this through multiple sed calls to get it to work, but that regex SHOULD work, right?
if your Input_file is same as shown sample then following may help you in same.
sed 's/\([^_]*\)_\([^_]*\)_\(.*_\)\(.*\)/\1_\2_XXXX_YY_\4/g' Input_file
Output will be as follows.
one_this_XXXX_YY_abc.txt
two_that_XXXX_YY_abc.txt
three_another_XXXX_YY_10.abc.txt
four_again_XXXX_YY_10.abc.txt
five_back_XXXX_YY_1a.abc.txt

sed does not recognize -r flag on AIX

thanks in advance for the help.
I have the following line that does work on linux.
myfile (extract)
active_instance_count=
aq_tm_processes=1
archive_lag_target=0
audit_file_dest=?/rdbms/audit
audit_sys_operations=FALSE
audit_trail=NONE
background_core_dump=partial
background_dump_dest=/home1/oracle/app/oracle/admin/iopecom/bdump
...
cat myfile |sed -r 's/ {1,}//g'|sed -r 's/\t*//g' |grep -v "^#"|sed -s "/^$/d" |sed =|sed 'N;s/\n/\t/'|sed -r "s/#.*//g" | sed "s/\t/;/g"|sed "s/\t/;/g"|sed -e "s,',\o042,g"
The result will be:
1;O7_DICTIONARY_ACCESSIBILITY=TRUE
2;active_instance_count=
3;aq_tm_processes=1
4;archive_lag_target=0
5;audit_file_dest=?/rdbms/audit
6;audit_sys_operations=FALSE
7;audit_trail=NONE
8;background_core_dump=partial
9;background_dump_dest=/home1/oracle/app/oracle/admin/iopecom/bdump
But, I can't figure out, how to perform the same command on AIX server.
Help is very welcome.
Regards.
Antonio.
Unless you have a compelling reason to use sed, you could use alternate tools:
awk -v OFS=';' '{print NR,$0}' filename
would produce the desired output.
You could also use perl:
perl -ne 'print "$.;$_"' filename
It appears that your sed expression would skip lines beginning with a #. As such, you could say:
perl -ne '$,=";"; !/^#/ && print ++$i,$_' filename
or something like:
grep -v '^#' filename | awk ...
reformatting your pipeline:
cat myfile |
sed -r 's/ {1,}//g' | # strip all spaces (1)
sed -r 's/\t*//g' | # strip all tabs (2)
grep -v "^#" | # delete all lines beginning `#` (3)
sed -s "/^$/d" | # delete all empty lines (4)
sed = | # interleave with line numbers (5)
sed 'N;s/\n/\t/' | # join line number and line with `\t` (6)
sed -r "s/#.*//g" | # strip all `#` comments (7)
sed "s/\t/;/g" | # replace all tabs with `;` (8)
sed "s/\t/;/g" | # do it again (9)
sed -e "s,',\o042,g" # replace all ' with " (10)
Boiling that down and using cat -n to provide the line numbers up front gets:
cat -n myfile |
sed "$(print 's/\t/;/')
$(print 's/[ \t]*//g')
s/#.*//g
/^$/d
s/'/\"/g"
which behaves identically unless I'm misreading the aix docs. The $(...) construction is command substitution, it runs that command and substitutes its output. print would be printf on linux.

print value into sed -n

I use sed to get the content of file from a desire point but I have a problem.
I can not print $variable value into this sed command
count=$(sed -n '/$variable/,$p' file.log | grep '"KO"' -c)
I try with double quotes and close the single but not working
count=$(sed -n "/$variable/,$p" file.log | grep '"KO"' -c) ERROR unexpected `,'
count=$(sed -n '/'$variable'/,$p' file.log | grep '"KO"' -c) ERROR unterminated address regex
I know that the sed reseach is letteral "$variable" but I can not pass the value...
Thanks in advance.
It's a question of getting the quoting right.
Your first example:
count=$(sed -n '/$variable/,$p' file.log | grep '"KO"' -c)
doesn't expand $variable because it's in single quotes, the second:
count=$(sed -n "/$variable/,$p" file.log | grep '"KO"' -c)
expands $variable but has issues with its contents, as mentioned by choroba. It also has issue with the $p which will be interpreted as a shell variable. Your third example:
count=$(sed -n '/'$variable'/,$p' file.log | grep '"KO"' -c)
comes pretty close to what you need, but still suffers if $variable contains characters that sed treats specially, so these need to be escaped, e.g. the following works:
variable="\[17-09-12 00:01:03\]"
count=$(sed -n '/'$variable'/,$p' file.log
And as brackets are also special to the shell you can escape them automatically with the printf %q directive:
variable="[17-09-12 00:01:03]"
variable=$(printf "%q" "$variable")
count=$(sed -n '/'$variable'/,$p' file.log
[ has a special meaning in sed. I would use something more powerful than sed, i.e. Perl. It can escape the variable for you:
perl -ne '/\Q'"$variable"'\E/ and print'

Change sed line separator to NUL to act as "xargs -0" prefilter?

I'm running a command line like this:
filename_listing_command | xargs -0 action_command
Where filename_listing_command uses null bytes to separate the files -- this is what xargs -0 wants to consume.
Problem is that I want to filter out some of the files. Something like this:
filename_listing_command | sed -e '/\.py/!d' | xargs ac
but I need to use xargs -0.
How do I change the line separator that sed wants from newline to NUL?
If you've hit this SO looking for an answer and are using GNU sed 4.2.2 or later, it now has a -z option which does what the OP is asking for.
Pipe it through grep:
filename_listing_command | grep -vzZ '\.py$' | filename_listing_command
The -z accepts null terminators on input and the -Z produces null terminators on output and the -v inverts the match (excludes).
Edit:
Try this if you prefer to use sed:
filename_listing_command | sed 's/[^\x0]*\.py\x0//g' | filename_listing_command
If none of your file names contain newline, then it may be easier to read a solution using GNU Parallel:
filename_listing_command | grep -v '\.py$' | parallel ac
Learn more about GNU Parallel http://www.youtube.com/watch?v=OpaiGYxkSuQ
With help of Tom Hale and that answer we have:
sed -nzE "s/^$PREFIX(.*)/\1/p"