Replacing the date in a csv file with mm-yyyy - date

I have a csv file consisting of one column of dates. The date format is dd.mm.yyyy. How can I use sed to replace the date with the mm-yy format? That is, to replace the date with mm-yyyy, which is a text and dropping the day in the date. An example of the input csv file is:
11.12.2018
21.01.2019
07.02.2019
29.03.2019
01.04.2019
I would like the output to be:
12-18
01-19
02-19
03-19
04-19

With GNU sed:
sed -r 's/...(..)...(..)/\1-\2/' file
-r: use extended regular expressions in the script.
Output:
12-18
01-19
02-19
03-19
04-19
See: The Stack Overflow Regular Expressions FAQ

Related

Convert timestamp (unix 13 digits) to datetime format of a complet column of a csv file using awk or sed

I have a csv file multiple columns. The first columns has timestamps like,
1529500027127
1529500027227
1529500027327
1529500027428
1529500027528
1529500027628
1529500027728
I know you can do something like that for a specific timestamp:
date -d #1529500027528
But how can I select all values of the columns and do that? I tried the next command:
date -d "$(awk -F , -v OFS=, '$1/=1000')" file.csv
I am trying to understand how date command works with other commands.
Since sample of expected output is not given so could only test it with given 1st column values, written and tested in GNU awk. You could use strftime function of awk, also since OP hs mentioned Input_file is a csv file so mentioning FS and OFS as , here.
awk 'BEGIN{FS=OFS=","} {$1=strftime("%Y/%m/%d %H:%M:%S",$1/1000)}1' Input_file
From man awk for strftime:
strftime([format [, timestamp[, utc-flag]]]) Format timestamp
according to the specification in format. If utc-flag is present and
is non-zero or non-null, the result is in UTC, otherwise the result is
in local time. The timestamp should be of the same form as returned
by systime(). If timestamp is missing, the current time of day is
used. If format is missing, a default format equivalent to
the output of date(1) is used. The default format is available in
PROCINFO["strftime"]. See the specification for the strftime()
function in ISO C for the format conversions that are guaranteed to be
available.
If you want to use an external date -d#.... command, you could do this:
awk -F, -v 'OFS=,' '{"date -d#"$1 | getline timestamp ; $1=timestamp; print}' filename
Obviously finding a builtin function to do the same job (in this case, the strftime function as suggested by another answer) is a more efficient solution in terms of execution time, but the above gives an example of how to call out to external programs that you may be already familiar with.

unix yyyymmddhhmmss format conversion to specific date format

There is a bash script running that outputs folder names appended with time logs_debug_20190213043348. I need to be able to extract the date into a readable format yyyy.mm.dd.hh.mm.ss and also may be convert to GMT timezone. I'm using the below method to extract.
echo "${folder##*_}" | awk '{ print substr($0,1,4)"."substr($0,5,2)"."substr($0,7,2)"."substr($0,9,6)}'
Is there a better way to print the output without writing complex shell scripts?
The internal string conversion functions are too limited, so we use sed and tr when needed.
## The "readable" format yyyy.mm.dd.hh.mm.ss isn’t understood by date.
## yyyy-mm-dd hh:mm:ss is. So we first produce the latter.
# Note how to extract the last 14 characters of ${folder} and that, since
# we know (or should have checked somewhere else) that they are all digits,
# we match them with a simple dot instead of the more precise but less
# readable [0-9] or [[:digit:]]
# -E selects regexp dialect where grouping is done with simple () with no
# backslashes.
d="$(sed -Ee's/(....)(..)(..)(..)(..)(..)/\1-\2-\3 \4:\5:\6/'<<<"${folder:(-14)}")"
# Print the UTF date (for Linux and other systems with GNU date)
date -u -d "$d"
# Convert to your preferred "readable" format
# echo "${d//[: -]/.}" would have the same effect, avoiding tr
tr ': -' '.'<<<"$d"
For systems with BSD date (notably MacOS), use
date -juf'%Y-%m-%d %H:%M:%S' "$d"
instead of the date command given above. Of course, in this case the simplest way would be:
# Convert to readable
d="$(sed -Ee's/(....)(..)(..)(..)(..)(..)/\1.\2.\3.\4.\5.\6/'<<<"${folder:(-14)}")"
# Convert to UTF
date -juf'%Y.%m.%d.%H.%M.%S' "$d"
Here's a pipeline that does what you want. It certainly isn't simple looking, but taking each component it can be understood:
echo "20190213043348" | \
sed -e 's/\([[:digit:]]\{4\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)\([[:digit:]]\{2\}\)/\1-\2-\3 \4:\5:\6/' | \
xargs -d '\n' date -d | \
xargs -d '\n' date -u -d
The first line is simply printing the date string so sed can format it (so that it can easily be modified to fit the way you are passing in the string).
The second line with sed is converting the string from the format you give, to something like this, which can be parsed by date: 2019-02-13 04:33:48
Then, we pass the date to date using xargs, and it formats it with the timezone of the device running the script (CST in my case): Wed Feb 13 04:33:48 CST 2019
The final line converts the date string given by the first invocation of date to UTC time rather than being stuck in the local time: Wed Feb 13 10:33:48 UTC 2019
If you want it in a different format, you can modify the final invocation of date using the +FORMAT argument.

sed replace from csv include last character of search term

I am trying to replace a list of words found in a csv file with index markup (docbook). The csv is in this format:
testword[ -?],testword<indexterm><primary>testword</primary></indexterm>
This finds all occurrences of the testword with punctuation at the end. This part works. However, I need the final punctuation mark to be included in the replace part of the sed command.
sed -e 's/\(.*\)/s,\1,g/' index.csv > index.sed
sed -i -f index.sed file.xml
So e.g. This is a testword, in a test.
Would get replaced with This is a testword,<indexterm><primary>testword</primary></indexterm> in a test.
Problem is the string in the csv file that steers the proces, here you loose the punctuation.
Replacing the:
testword[ -?],testword<indexterm><primary>testword</primary></indexterm>
by:
testword\([ -?]\),testword\1<indexterm><primary>testword</primary></indexterm>
Would already solve your problem.

Using grep to adjust timecode

I'm trying to change the timecode found from one format into another, basically to remove the milliseconds off the end of a file and update it. This is to remove extra milliseconds from a transcription timecode software and make it look pretty for file for client.
Input looks like this:
00:50:34.00>INTERVIEWER
Why was it ............... script?
00:50:35.13>JOHN DOE
Because of the quality.
So I'm trying to use grep to match the timecode and got it working with following expression.
grep [0-9][0-9][:][0-9][0-9][:][0-9][0-9]\.[0-9][0-9] -P -o transcriptionFile.txt
Output looks like this:
00:50:34.00
00:50:35.13
So now I'm trying to take timecode and update the file with updated values like:
00:50:34
00:50:35
How do I do that? Should I use a pipe to push it over to sed so I can update the values in the file?
I've also tried to use sed with following command:
sed 's/[0-9][0-9][:][0-9][0-9][:][0-9][0-9]\.[0-9][0-9]/[0-9][0-9][:][0-9][0-9][:][0-9][0-9]/g' transcriptionFile.txt > outtranscriptionFile.txt
I get output but puts in my RegExp in place where timecode is supposed to be. Any ideas? Also How do I can trim last 3 digits off far right side of timecode before I update file?
Any tips or suggestions will be much appreciated.
Thanks :-)
With GNU sed:
$ sed -r 's/^([0-9]{2}:[0-9]{2}:[0-9]{2})\>\.[0-9]{2}/\1/' transcriptionFile.txt
00:50:34>INTERVIEWER
Why was it ............... script?
00:50:35>JOHN DOE
Because of the quality.
To edit the file in place, add the -i option:
sed -r -i 's/^([0-9]{2}:[0-9]{2}:[0-9]{2})\>\.[0-9]{2}/\1/' transcriptionFile.txt
Explanation:
[0-9]{2}: matches every two digits followed by a :. All three occurences are captured using brackets.
\>\.[0-9]{2} matches > followed by a dot and two digits.
using backreference \1, strings matching previous pattern are replaced with captured characters (timecode without milliseconds).

sed command to change date format

Here is a snippet of the file I'm working with:
709ENVUN07,SET1,FE10,GB0009252882,GB,GBX,NULL,S,O,LO,1510.00000000,173,N,F,28022007,07:51:15,3717
208ATNHG07,SET1,FE10,GB0009252882,GB,GBX,NULL,S,O,LO,1550.00000000,1800,N,F,18012007,15:48:21,654681
As you can see the date is in this format: 28022007, 18012007
Using sed I've successfully changed to the format I wish.
gzip -dc allGlaxoOrderHistory.CSV.gz |sed 's/\([0-9]\{2\}\)\([0-9]\{2\}\)\(2[0-9]\{3\}\)/\1-\2-\3/g' > newOrderHistory.csv
However sed is also changing GB0009252882 to GB00-09-252882 as you can see below
709ENVUN07,SET1,FE10,GB00-09-252882,GB,GBX,NULL,S,O,LO,1510.00000000,173,N,F,28-02-2007,07:51:15,3717
208ATNHG07,SET1,FE10,GB00-09-252882,GB,GBX,NULL,S,O,LO,1550.00000000,1800,N,F,18-01-2007,15:48:21,654681
Question is how do I change 28022007, 18012007 to this 28-02-2007 ,18-01-2007 without GB0009252882 changing too.
[edit]
Your date field is the 15th from the start. You can write your pattern like this:
sed 's/\(\([^,]*,\)\{14\}..\)\(..\)/\1-\3-/'
Where ,[^,]*, describes a field (with separator).
You can also work by fields more easily with awk. You only need to set the input and output delimiter to ,
With awk (Gnu), target the 15th field:
awk -F, -vOFS=, '{$15=gensub(/(..)(..)(....)/, "\\1-\\2-\\3", "g", $15)}1' yourfile
The parameter -F, set the input delimiter and -vOFS=, the output delimiter. The 1 at the end is used as a shortcut for print).