Converting date format with sed - date

I have a question how to convert date from i.e. 'Jun 14 2012 5:00PM' to '2012-06-14' using sed? So ommiting '5:00PM' as well.
I'm trying something like s/(...)(\d\d)(\d\d\d\d)/$2.$1.$3/ but can't get it right.
Thanks!

You really do not need sed to do it - you need date:
$ date --date='Jun 14 2012 5:00PM' '+%Y-%m-%d'
2012-06-14
In this command, I am asking date to convert the date 'Jun 14 2012 5:00PM' to the format composed by the four-digits year (%Y) followed by an hyphen and the two-digits month (%m) followed by another hyphen and the two-digit day (%d).
Now, some notes about your sed command:
Sed uses an specific, somewhat limited regular expression syntax. One of the differences is that the parenthesis groups should be preceded by backslashes for effectively grouping. If there is no backslash, the parenthesis is considered a literal parenthesis char. This is exactly the contrary of the regex in other languages, I know, but it is how it work. So, instead of
s/(...)(\d\d)(\d\d\d\d)/$2.$1.$3/
you should have
s/\(...\)\(\d\d\)\(\d\d\d\d\)/$2.$1.$3/
Also, there is no \d wildcard in sed. Instead, you should use [[:digit:]]:
s/\(...\)\([[:digit:]][[:digit:]]\)\([[:digit:]][[:digit:]][[:digit:]][[:digit:]]\)/$2.$1.$3/
or, better yet:
s/\(...\)\([[:digit:]]\{2\}\)\([[:digit:]]\{4\}\)/$2.$1.$3/
Last but not least, the reference to the matched groups is not marked by $, but instead by a backslash:
s/\(...\)\([[:digit:]]\{2\}\)\([[:digit:]]\{4\}\)/\2.\1.\3/
(Oh, uh, ok, you could use extended regexes with GNU sed, too, but what would be the fun?)
Anyway, this will not match your date either - I just mentioned the syntax errors, there are some semantic errors too (for example, your date is separated by spaces that are not present in the regex). Nonetheless, this is no problem because date can do it better.

This might work for you:
echo "Jun 14 2012 5:00PM" |
sed 's/$/Jan01Feb02Mar03Apr04May05Jun06Jul07Aug08Sep09Oct10Nov11Dec12/;s/\(...\) \(..\) \(....\).*\1\(..\).*/\3-\2-\4/'
2012-14-06
Add a lookup and match using back references.

You can try something like this -
echo "Jun 21 2011 5:00PM" |
awk '{"date \"+%Y/%m/%d\" -d \""$1" "$2" "$3" \"" | getline var; print var}'
Test:
$ echo "Jun 21 2011 5:00PM" |
awk '{"gdate \"+%Y/%m/%d\" -d \""$1" "$2" "$3" \"" | getline var; print var}'
2011/06/21
$

Related

How to regex today or previous days date using awk and $date?

Column 13 of my data contains date in YYMMDD format. I'm trying to regex using $date for today and previous days. Neither of the following code would work. Could someone give me some insights?
TODAY
awk -F, ($13~/$(date '+%Y%m%d')/) {n++} END {print n+0}' file.csv)
3 DAYS AGO
awk -F, ($13~/$(date -d "$date -3 days" '+%Y%m%d')/) {n++} END {print n+0}' file.csv
Your Awk attempts have rather severe quoting problems. You will generally want to single-quote your Awk script, and pass in any parameters as variables with -v.
awk -F, -v when="$(date -d "-3 days" '+%Y%m%d')" '$13~when {n++} END {print n+0}' file.csv
Perhaps notice also that $date is not defined anywhere. The notation $(cmd ...) is a command substitution which runs cmd ... and replaces the expression with its output.
Probably also notice that date -d is a GNU extension and is not portable, though it will work on Linus and other platforms where you have the GNU utilities installed.
More fundamentally, depending on what's in $13, you might want to implement a simple date parsing for that format, so that you can specify a range of acceptable values, rather than search for matches on static text.
This quoting is correct for Bourne-style Unix shells. If you are on Windows, the quoting rules are quite different, and quite likely often impossible to apply in useful ways.
If you are using GNU AWK then you might use its' Time Functions to check if it does work do
awk 'END{print strftime("%y%m%d")}' emptyfile.txt
which should output current day in YYMMDD format. If it does then you might get what you want following way:
awk 'BEGIN{today=strftime("%y%m%d");threedago=strftime("%y%m%d",systime()-259200)}END{print today, threedago}' emptyfile.txt
output (as of today)
210809 210806
Explanation: strftime first argument is time format %y is year 00...99, %m is month 01...12, %d is day 01...31. Second argument is optional, and it is seconds since start of epoch. If skipped current time is used, systime() return number of seconds since start of epoch, 259200 is 72 hours as seconds.
Example of usage as regexp, let say that I have file.txt as follows
210807 120
210808 150
210809 100
and want to retrieve content of 2nd column for today, then I can do
awk 'BEGIN{today=strftime("%y%m%d")}$1~today{print $2}' file.txt
getting output (as of today)
100
(tested in gawk 4.2.1)

unix yyyymmddhhmmss format conversion to specific date format

There is a bash script running that outputs folder names appended with time logs_debug_20190213043348. I need to be able to extract the date into a readable format yyyy.mm.dd.hh.mm.ss and also may be convert to GMT timezone. I'm using the below method to extract.
echo "${folder##*_}" | awk '{ print substr($0,1,4)"."substr($0,5,2)"."substr($0,7,2)"."substr($0,9,6)}'
Is there a better way to print the output without writing complex shell scripts?
The internal string conversion functions are too limited, so we use sed and tr when needed.
## The "readable" format yyyy.mm.dd.hh.mm.ss isn’t understood by date.
## yyyy-mm-dd hh:mm:ss is. So we first produce the latter.
# Note how to extract the last 14 characters of ${folder} and that, since
# we know (or should have checked somewhere else) that they are all digits,
# we match them with a simple dot instead of the more precise but less
# readable [0-9] or [[:digit:]]
# -E selects regexp dialect where grouping is done with simple () with no
# backslashes.
d="$(sed -Ee's/(....)(..)(..)(..)(..)(..)/\1-\2-\3 \4:\5:\6/'<<<"${folder:(-14)}")"
# Print the UTF date (for Linux and other systems with GNU date)
date -u -d "$d"
# Convert to your preferred "readable" format
# echo "${d//[: -]/.}" would have the same effect, avoiding tr
tr ': -' '.'<<<"$d"
For systems with BSD date (notably MacOS), use
date -juf'%Y-%m-%d %H:%M:%S' "$d"
instead of the date command given above. Of course, in this case the simplest way would be:
# Convert to readable
d="$(sed -Ee's/(....)(..)(..)(..)(..)(..)/\1.\2.\3.\4.\5.\6/'<<<"${folder:(-14)}")"
# Convert to UTF
date -juf'%Y.%m.%d.%H.%M.%S' "$d"
Here's a pipeline that does what you want. It certainly isn't simple looking, but taking each component it can be understood:
echo "20190213043348" | \
sed -e 's/\([[:digit:]]\{4\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)/\1-\2-\3 \4:\5:\6/' | \
xargs -d '\n' date -d | \
xargs -d '\n' date -u -d
The first line is simply printing the date string so sed can format it (so that it can easily be modified to fit the way you are passing in the string).
The second line with sed is converting the string from the format you give, to something like this, which can be parsed by date: 2019-02-13 04:33:48
Then, we pass the date to date using xargs, and it formats it with the timezone of the device running the script (CST in my case): Wed Feb 13 04:33:48 CST 2019
The final line converts the date string given by the first invocation of date to UTC time rather than being stuck in the local time: Wed Feb 13 10:33:48 UTC 2019
If you want it in a different format, you can modify the final invocation of date using the +FORMAT argument.

How to have SED remove all characters between a hypen and the file extension

I have been trying with no luck to change this
'Simple' week 1-117067638.mp3
into this
'Simple' week 1.mp3
However when I use the command sed 's/\(-\).*\(.mp3\)//' I get
'Simple' week 1
How do I keep my file extension? If you could explain the command you use it would be great so that I can learn from this instead of just getting an answer.
You don't need to have a capturing group.
$ echo "'Simple' week 1-117067638.mp3" | sed 's/-.*\.mp3/.mp3/g'
'Simple' week 1.mp3
OR
$ echo "'Simple' week 1-117067638.mp3" | sed 's/-.*\(\.mp3\)/\1/g'
'Simple' week 1.mp3
What's wrong with your code?
sed 's/\(-\).*\(.mp3\)//'
sed would replace all the matched characters with the characters in the replacement part. So \(-\).*\(.mp3\) matches all the characters from - to .mp3 (you must need to escape the dot in-order to match a literal dot). You're replacing all the matched characters with an empty string. So .mp3 also got removed. In-order to avoid this, add .mp3 to the replacement part.
In basic sed, capturing groups are represented by \(..\). This capturing group is used to capture characters which are to be referenced later.
This task can also be done just in bash without calling sed:
$ fname="'Simple' week 1-117067638.mp3"
$ fname="${fname/-*/}.mp3"
$ echo "$fname"
'Simple' week 1.mp3

How do I replace a substring by the output of a shell command with sed, awk or such?

I'd like to use sed or any command line tool to replace parts of lines by the output of shell commands. For example:
Replace linux epochs by human-readable timestamps, by calling date
Replace hexa dumps of a specific protocol packets by their decoded counterparts, by calling an in-house decoder
sed seems best fitted because it allows to match patterns and reformat other things too, like moving bits of matches around, but is not mandatory.
Here is a simplified example:
echo "timestamp = 1234567890" | sed "s/timestamp = \(.*\)/timestamp = $(date -u --d #\1 "+%Y-%m-%d %T")/g"
Of course, the $(...) thing does not work. As far as I understand, that's for environment variables.
So what would the proper syntax be? Is sed recommended in this case ? I've spent several hours searching... Is sed even capable of this ? Are there other tools better suited?
Edit
I need...
Pattern matching. The log is full of other things, so I need to be able to pinpoint the strings I want to replace based on context (text before and after, on the same line). This excludes column-position-based matching like awk '{$3...
In-place replacement, so that the reste of the line, "Timestamp = " or whatever, remains unchanged. This exclused sed's 'e' command.
To run an external command in sed you need to use e. See an example:
$ echo "timestamp = 1234567890" | sed "s#timestamp = \(.*\)#date -u --d #\1 "\+\%Y"#e"
2009
With the full format:
$ sed "s#timestamp = \(.*\)#echo timestamp; date -u --d #\1 '\+\%Y-\%m-\%d \%T'#e" <<< "timestamp = 1234567890"
timestamp
2009-02-13 23:31:30
This catches the timestamp and converts it into +%Y format.
From man sed:
e
This command allows one to pipe input from a shell command into
pattern space. If a substitution was made, the command that is found
in pattern space is executed and pattern space is replaced with its
output. A trailing newline is suppressed; results are undefined if the
command to be executed contains a nul character. This is a GNU sed
extension.
However, you see it is a bit "ugly". Depending on what you want to do, you'd better use a regular while loop to fetch the values and then use date normally. For example, if the file is like:
timestamp = 1234567890
Then you can say:
while IFS="=" read -r a b
do
echo "$b"
done < file
this will make you have $b as the timestamp and then you can perform a date ....
As commented, use a language with built-in time functions. For example:
$ echo "timestamp = 1234567890" | gawk '{$3 = strftime("%F %T", $3)} 1'
timestamp = 2009-02-13 18:31:30
$ echo "timestamp = 1234567890" | perl -MTime::Piece -pe 's/(\d+)/ localtime($1)->strftime("%F %T") /e'
timestamp = 2009-02-13 18:31:30

sed rare-delimiter (other than & | / ?...)

I am using the Unix sed command on a string that can contain all types of characters (&, |, !, /, ?, etc).
Is there a complex delimiter (with two characters?) that can fix the error:
sed: -e expression #1, char 22: unknown option to `s'
The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.
At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:
$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'
In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).
Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.
Another way to do this is to use Shell Parameter Substitution.
${parameter/pattern/replace} # substitute replace for pattern once
or
${parameter//pattern/replace} # substitute replace for pattern everywhere
Here is a quite complex example that is difficult with sed:
$ parameter="Common sed delimiters: [sed-del]"
$ pattern="\[sed-del\]"
$ replace="[/_%:\\#]"
$ echo "${parameter//$pattern/replace}"
result is:
Common sed delimiters: [/_%:\#]
However: This only work with bash parameters and not files where sed excel.
There is no such option for multi-character expression delimiters in sed, but I doubt
you need that. The delimiter character should not occur in the pattern, but if it appears in the string being processed, it's not a problem. And unless you're doing something extremely weird, there will always be some character that doesn't appear in your search pattern that can serve as a delimiter.
You need the nested delimiter facility that Perl offers. That allows to use stuff like matching, substituting, and transliterating without worrying about the delimiter being included in your contents. Since perl is a superset of sed, you should be able to use it for whatever you’re used sed for.
Consider this:
$ perl -nle 'print if /something/' inputs
Now if your something contains a slash, you have a problem. The way to fix this is to change delimiter, preferably to a bracketing one. So for example, you could having anything you like in the $WHATEVER shell variable (provided the backets are balanced), which gets interpolated by the shell before Perl is even called here:
$ perl -nle "print if m($WHATEVER)" /usr/share/dict/words
That works even if you have correctly nested parens in $WHATEVER. The four bracketing pairs which correctly nest like this in Perl are < >, ( ), [ ], and { }. They allow arbitrary contents that include the delimiter if that delimiter is balanced.
If it is not balanced, then do not use a delimiter at all. If the pattern is in a Perl variable, you don’t need to use the match operator provided you use the =~ operator, so:
$whatever = "some arbitrary string ( / # [ etc";
if ($line =~ $whatever) { ... }
With the help of Jim Lewis, I finally did a test before using sed :
if [ `echo $1 | grep '|'` ]; then
grep ".*$1.*:" $DB_FILE | sed "s#^.*$1*.*\(:\)## "
else
grep ".*$1.*:" $DB_FILE | sed "s|^.*$1*.*\(:\)|| "
fi
Thanks for help
Wow. I totally did not know that you could use any character as a delimiter.
At least half the time I use the sed and BREs its on paths, code snippets, junk characters, things like that. I end up with a bunch of horribly unreadable escapes which I'm not even sure won't die on some combination I didn't think of. But if you can exclude just some character class (or just one character even)
echo '#01Y $#1+!' | sed -e 'sa$#1+ashita' -e 'su#01YuHolyug'
> > > Holy shit!
That's so much easier.
Escaping the delimiter inline for BASH to parse is cumbersome and difficult to read (although the delimiter does need escaping for sed's benefit when it's first used, per-expression).
To pull together thkala's answer and user4401178's comment:
DELIM=$(echo -en "\001");
sed -n "\\${DELIM}${STARTING_SEARCH_TERM}${DELIM},\\${DELIM}${ENDING_SEARCH_TERM}${DELIM}p" "${FILE}"
This example returns all results starting from ${STARTING_SEARCH_TERM} until ${ENDING_SEARCH_TERM} that don't match the SOH (start of heading) character with ASCII code 001.
There's no universal separator, but it can be escaped by a backslash for sed to not treat it like separator (at least unless you choose a backslash character as separator).
Depending on the actual application, it might be handy to just escape those characters in both pattern and replacement.
If you're in a bash environment, you can use bash substitution to escape sed separator, like this:
safe_replace () {
sed "s/${1//\//\\\/}/${2//\//\\\/}/g"
}
It's pretty self-explanatory, except for the bizarre part.
Explanation to that:
${1//\//\\\/}
${ - bash expansion starts
1 - first positional argument - the pattern
// - bash pattern substitution pattern separator "replace-all" variant
\/ - literal slash
/ - bash pattern substitution replacement separator
\\ - literal backslash
\/ - literal slash
} - bash expansion ends
example use:
$ input="ka/pus/ta"
$ pattern="/pus/"
$ replacement="/re/"
$ safe_replace "$pattern" "$replacement" <<< "$input"
ka/re/ta