Print log records within a time interval using sed - sed

I am using sed to print log records within a time interval. My time format is YYYY-MM-DD HH:MM:ss,sss, for example "2014-07-22 15:33:25,758". I tried
sed -n '/2014-07-23 01:00:00,000/,/2014-07-23 02:00:00,000/ p'
But it does not work. I can find the solution for YYYY-MM-DD, but it is not for my case. Can anyone help?

You can replace insignificant digits with . so it would match anything within the range:
sed -n '/2014-07-23 01:..:..,.../,/2014-07-23 02:..:..,.../p'
And perhaps just remove them totally:
sed -n '/2014-07-23 01:/,/2014-07-23 02:/p'

Ultimately, sed is not the best tool for this job; it does exact matches rather than range-based matches (within the exactness of regular expressions).
Using awk, you can do range-based checking:
awk '$1 == "2014-07-23" && $2 >= "01:00:00" && $2 < "02:00:00" { print }'
where the { print } is optional but explicit. One of the advantages of the ISO 8601 date and time notations is precisely that lexicographic order is the same as time order.

For your sed command to work, those exact times must appear in the file. konsolebox has a good solution that should work in your case (but see Jonathan Leffler's comment there, and also his awk solution which is simpler than mine).
In general you need something more powerful than sed, like awk. In the example below, note how you must specify the input times (space-separated values, no decimals on seconds). Also note that it is gawk-specific. Also note that I've assumed the time is the first and second space-separated fields. Adjust as needed.
gawk -vstart="2014 07 23 01 00 00" -vend="2014 07 23 02 00 00" '
BEGIN {nstart=mktime(start); nend=mktime(end)}
{
t = $1 " " $2
gsub(/[-:]/, " ", t);
nt = mktime(substr(t, 1, 19))
if (nt >= nstart && nt <= nend)
print
}
' file

Related

Print specific lines that have two or more occurrences of a particular character

I have file with some text lines. I need to print lines 3-7 and 11 if it has two "b". I did
sed -n '/b\{2,\}/p' file but it printed lines where "b" occurs two times in a row
You can use
sed -n '3,7{/b[^b]*b/p};11{/b[^b]*b/p}' file
## that is equal to
sed -n '3,7{/b[^b]*b/p};11{//p}' file
Note that b[^b]*b matches b, then any zero or more chars other than b and then a b. The //p in the second part matches the most recent pattern , i.e. it matches the same b[^b]*b regex.
Note you might also use b.*b regex if you want, but the bracket expressions tend to word faster.
See an online demo, tested with sed (GNU sed) 4.7:
s='11bb1
b222b
b n b
ww
ee
bb
rrr
fff
999
10
11 b nnnn bb
www12'
sed -ne '3,7{/b[^b]*b/p};11{/b[^b]*b/p}' <<< "$s"
Output:
b n b
bb
11 b nnnn bb
Only lines 3, 6 and 11 are returned.
Just use awk for simplicity, clarity, portability, maintainability, etc. Using any awk in any shell on every Unix box:
awk '( (3<=NR && NR<=7) || (NR==11) ) && ( gsub(/b/,"&") >= 2 )' file
Notice how if you need to change a range, add a range, add other line numbers, change how many bs there are, add other chars and/or strings to match, add some completely different condition, etc. it's all absolutely clear and trivial.
For example, want to print the line if there's exactly either 13 or 27 bs instead of 2 or more:?
awk '( (3<=NR && NR<=7) || (NR==11) ) && ( gsub(/b/,"&") ~ /^(13|27)$/ )' file
Want to print the line if the line number is between 23 and 59 but isn't 34?
awk '( 23<=NR && NR<=59 && NR!=34 ) && ( gsub(/b/,"&") >= 2 )' file
Try making similar changes to a sed script. I'm not saying you can't force it to happen, but it's not nearly as trivial, clear, portable, etc. as it is using awk.

Replacing all occurrence after nth occurrence in a line in perl

I need to replace all occurrences of a string after nth occurrence in every line of a Unix file.
My file data:
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
My output data:
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
tried using sed: sed 's/://3g' test.txt
Unfortunately, the g option with the occurrence is not working as expected. instead, it is replacing all the occurrences.
Another approach using awk
awk -v c=':' -v n=2 'BEGIN{
FS=OFS=""
}
{
j=0;
for(i=0; ++i<=NF;)
if($i==c && j++>=n)$i=""
}1' file
$ cat file
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
$ awk -v c=':' -v n=2 'BEGIN{FS=OFS=""}{j=0;for(i=0; ++i<=NF;)if($i==c && j++>=n)$i=""}1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
With GNU awk, using gensub please try following. This is completely based on your shown samples, where OP wants to remove : from 3rd occurrence onwards. Using gensub to segregate parts of matched values and removing all colons from 2nd part(from 3rd colon onwards) in it as per OP's requirement.
awk -v regex="^([^:]*:)([^:]*:)(.*)" '
{
firstPart=restPart=""
firstPart=gensub(regex, "\\1 \\2", "1", $0)
restPart=gensub(regex,"\\3","1",$0)
gsub(/:/,"",restPart)
print firstPart restPart
}
' Input_file
I have inferred based on the limited data you've given us, so it's possible this won't work. But I wouldn't use regex for this job. What you have there is colon delimited fields.
So I'd approach it using split to extract the data, and then some form of string formatting to reassemble exactly what you like:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ( undef, $first, #rest ) = split /:/;
print ":$first:", join ( "", #rest ),"\n";
}
__DATA__
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
This gives you the desired result, whilst IMO being considerably clearer for the next reader than a complicated regex.
You can use the perl solution like
perl -pe 's~^(?:[^:]*:){2}(*SKIP)(?!)|:~~g if /^:account_id:/' test.txt
See the online demo and the regex demo.
The ^(?:[^:]*:){2}(*SKIP)(?!)|: regex means:
^(?:[^:]*:){2}(*SKIP)(?!) - match
^ - start of string (here, a line)
(?:[^:]*:){2} - two occurrences of any zero or more chars other than a : and then a : char
(*SKIP)(?!) - skip the match and go on to search for the next match from the failure position
| - or
: - match a : char.
And only run the replacement if the current line starts with :account_id: (see if /^:account_id:/').
Or an awk solution like
awk 'BEGIN{OFS=FS=":"} /^:account_id:/ {result="";for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}' test.txt
See this online demo. Details:
BEGIN{OFS=FS=":"} - sets the input/output field separator to :
/^:account_id:/ - line must start with :account_id:
result="" - sets result variable to an empty string
for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result} - iterates over the fields and if the field number is greater than 2, just append the current field value to result, else, append the value + output field separator; then print the result.
I would use GNU AWK following way if n fixed and equal 2 following way, let file.txt content be
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
then
awk 'BEGIN{FS=":";OFS=""}{$2=FS $2 FS;print}' file.txt
output
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Explanation: use : as field separator and nothing as output field separator, this itself does remove all : so I add : which have to be preserved: 1st (before second column) and 2nd (after second column). Beware that I tested it solely for this data, so if you would want to use it you should firstly test it with more possible inputs.
(tested in gawk 4.2.1)
This might work for you (GNU sed):
sed 's/:/\n/3;h;s/://g;H;g;s/\n.*\n//' file
Replace the third occurrence of : by a newline.
Make a copy of the line.
Delete all occurrences of :'s.
Append the amended line to the copy.
Join the two lines by removing everything from third occurrence of the copy to the third occurrence of the amended line.
N.B. The use of the newline is the best delimiter to use in the case of sed, as the line presented to seds commands are initially devoid of newlines. However the important property of the delimiter is that it is unique and therefore can be any such character as long as it is not found anywhere in the data set.
An alternative solution uses a loop to remove all :'s after the first two:
sed -E ':a;s/^(([^:]*:){2}[^:]*):/\1/;ta' file
With GNU awk for the 3rd arg to match() and gensub():
$ awk 'match($0,/(:[^:]+:)(.*)/,a){ $0=a[1] gensub(/:/,"","g",a[2]) } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
and with any awk in any shell on every Unix box:
$ awk 'match($0,/:[^:]+:/){ tgt=substr($0,1+RLENGTH); gsub(/:/,"",tgt); $0=substr($0,1,RLENGTH) tgt } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus

result splitted with a specific character using ksh

I have several input files looking like this and before loop the processing for all the files i would like to get the 1st column in the same line splitted with || .
Input.txt
aa ,DEC
bb ,CHAR
cc ,CHAR
dd ,DEC
ee ,DEC
ff ,CHAR
gg ,DEC
For my try this is my commands :
cat $1| while read line
do
cle=`echo $line|cut -d"," -f1`
for elem in $cle
do
echo -n "$elem||"
done
fi
done
But the problem I got the || in the end of the output file ;
He is the result I'm looking for in one line :
aa || bb || cc || dd || ee || ff || gg
Probably use Awk instead.
awk -F ',' '{ printf "%s%s", sep, $1; sep = "||"; } END { printf "\n" }' "$1"
If you really wanted to use the shell, you can do pretty much the same thing, but it will typically be both clunkier and slower. Definitely prefer the Awk version for any real system.
sep=''
while IFS=',' read -r cle _; do
printf "%s%s" "$sep" "$cle"
sep="||"
done <"$1"
printf "\n"
Notice the absence of a useless cat and how the read command itself is perfectly able to split on whatever IFS is set to. (Your example looks like maybe you want to split on whitespace instead, which is the default behavior of both Awk and the shell. Drop the -F ',' or remove the IFS=',', respectively.) You obviously don't need a for loop to iterate over a single value, either. And always quote your variables.
If you want a space after the delimiter, set it to "|| " instead of just "||". Your example is not entirely consistent (or maybe the markup here hides some of your formatting).

Keeping first character in string, in a specific single field

I am trying to remove all but the first character of a specific field in a .tab file. I want to keep only first character in fields 10 and 11.
Normally the fields have 35 characters in them, so I used:
awk '{gsub ("..................................$","",$10;print} file
however, there are some fields which have less than 35, and were ignored by this replace function. I tired using substring, but I cannot figure out how to make it field specific. I believe there is a way to use perl inside awk so that I can use the function
perl -pe 's/(.).*/$1/g'
but I am not sure how to do that and use the field as the input value, so the file comes out identical except for the altered field.
is there a way to do the perl equivalent with gsub, or the awk equivalent with perl?
help is appreciated!
One way using awk:
awk '{ for (i=10;i<=11;i++) { $i = substr( $i, 1, 1) } } { print }' infile
Another way using gensub function of gawk
gawk '{ for (i=10;i<=11;i++) { $i = gensub(/(.).*/ , "\\1", G , $i) } }1' infile
A shortest awk version, I could figure out:
awk '($10=substr($10,1,1))&&$11=substr($11,1,1)' infile
If the 10th and/or 11th field is not existing then the line is not printed.
Similar version in perl
perl -ane '$F[9]=~s/(.).*/$1/;$F[10]=~s/(.).*/$1/;print "#F\n"' infile
This prints the line even if 10th and/or 11th field is not defined.
Another way with perl:
perl -pe '$c=0; s/(\S+)/(++$c < 10 || $c > 11) ? $1 : substr($1,0,1)/eg' filename

Perform action on line range in sed/awk

How can I extract certain variables from a specific range of lines in sed/awk?
Example: I want to exctract the host and port from .tnsnames.ora
from this section that starts at line 105.
DB_CONNECTION=
(description=
(address=
(protocol=tcp)
(host=127.0.0.1)
(port=1234)
)
(connect_data=
(sid=ABCD)
(sdu=4321)
)
The gawk can use regular expression in field separater(FS).
'$0=$2' is always true, so automatically this script print $2.
$ gawk -F'[()]' 'NR>105&&NR<115&&(/host/||/port/)&&$0=$2' .tnsnames.ora
use:
sed '105,$< whatever sed code you want here >'
If you specifically want the host and the port you can do something like:
sed .tnsnames.ora '105,115p'|grep -e 'host=' -e 'port='
You can use address ranges to specify to which section to apply the regular expressions. If you leave the end line address out (keep the comma) it will match to the end of file. You can also 'chain' multiple expressions by using '-e' multiple times. The following expression will just print the port and host value to standard out. It uses back references (\1) in order to just print the matching parts.
sed -n -e '105,115s/(port=\([0-9].*\))/\1/p' -e '105,115s/(host=\([0-9\.].*\))/\1/p' tnsnames.ora
#lk, to address the answer you posted:
You can write awk code like C, but it's more succinctly expressed as "pattern {action}" pairs.
If you have gawk or nawk, the field separator is an ERE as Hirofumi Saito said
gawk -F'[()=]' '
NR < 105 {next}
NR > 115 {exit}
$2 == "host" || $2 == "port" {
# do stuff with $2 and $3
print $2 "=" $3
}
'