Replace inline date format with sed - sed

I have a large file which contains data in specific format. I want to convert date format with different one.
from: YYYY-MM-DD hh:mm:ss.sss
To: YYYY-MM-DDThh:mm:ss.sss000+00:00
I can print out the desired format using awk
awk -F " " '{print substr($1,0,4)"-"substr($1,6,2)"-"substr($1,9,2)"T"substr($2,1,2)":"substr($2,4,2)":"substr($2,7,6)"000+00:00"}' my-file.txt
but struggling to replace it inline. Also have heard that sed has similar capability to do inline replacement.
I have tried running below sed command but its not working -
sed 's/\([0-9]{4}-[0-9]{2}-[0-9]{2}\s\) \([0-9]{2}:[0-9]{2}:[0-9]{2}\s\) \1/T\2 /g'
It says unterminateds' command`
Any suggestion/pointers are appreciated.

Related

How to convert in file csv date in specific column to unix date

I have a file csv with this columns:
"Weight","Impedance","Units","User","Timestamp","PhysiqueRating"
"58.75","5.33","kg","7","2020-7-11 19:29:29","5"
Of course, I can convert the date command:
date -d '2020-7-11 19:29:29' +%s
Results:
1594488569
How to replace this date in csv file in bash script?
With GNU sed
sed -E '2,$ s/(("[^"]*",){4})("[^"]+")(.*)/echo \x27\1"\x27$(date -d \3 +%s)\x27"\4\x27/e'
2,$ to skip header from getting processed
(("[^"]*",){4}) first four columns
("[^"]+") fifth column
(.*) rest of the line
echo \x27\1"\x27 and \x27"\4\x27 preserve first four columns and rest of line after fifth column, along with adding double quotes to result of date conversion
$(date -d \3 +%s) calling shell command with fifth column value
Note that this command will fail if input can contain single quotes. That can be worked around by using s/\x27/\x27\\&\x27/g.
You can see the command that gets executed by using -n option and pe flags
sed -nE '2,$ s/(("[^"]*",){4})("[^"]+")(.*)/echo \x27\1"\x27$(date -d \3 +%s)\x27"\4\x27/pe'
will give
echo '"58.75","5.33","kg","7","'$(date -d "2020-7-11 19:29:29" +%s)'","5"'
For 58.25,5.89, kg, 7,2020 / 7/12 11:23:46, "5" format, try
sed -E '2,$ s/(([^,]*,){4})([^,]+)(.*)/echo \x27\1\x27$(date -d "\3" +%s)\x27\4\x27/e'
or (adapted from https://stackoverflow.com/a/62862416)
awk 'BEGIN{FS=OFS=","} NR>1{$5=mktime(gensub(/[:\/]/, " ", "g", $5))} 1'
Note: For the sed solution, if the input can come from outside source, you'll have to take care to avoid malicious intent as mentioned in the comments. One way is to match the fifth column using [0-9: -]+ or similar.
Using GNU awk:
$ gawk '
BEGIN {
FS=OFS=","
}
{
n=split($5,a,/[-" :]/)
if(n==8)
$5="\"" mktime(sprintf("%s %s %s %s %s %s",a[2],a[3],a[4],a[5],a[6],a[7])) "\""
}1' file
Output:
"Weight","Impedance","Units","User","Timestamp","PhysiqueRating"
"58.75","5.33","kg","7","1594484969","5"
With GNU awk for gensub() and mktime():
$ awk 'BEGIN{FS=OFS="\""} NR>1{$10=mktime(gensub(/[-:]/," ","g",$10))} 1' file
"Weight","Impedance","Units","User","Timestamp","PhysiqueRating"
"58.75","5.33","kg","7","1594513769","5"

How to truncate the first digit of a number?

For example, my file has the following data:
$ cat sample.txt
19999119999,string1,dddddd
18888135790,string2,dddddd
15555555500,string3,dddddd
This is a sample data. How can we remove ONLY first digit from each row? My output should be:
$ cat output.txt
9999119999,string1,dddddd
8888135790,string2,dddddd
5555555500,string3,dddddd
Is there any way to parse each line character wise using grep or sed?
Or any other way to get the desired output?
You just need to print from the second character on:
$ cut -c2- file
9999119999,string1,dddddd
8888135790,string2,dddddd
5555555500,string3,dddddd
Or, using sed, remove the first char:
$ sed 's/^.//' file
9999119999,string1,dddddd
8888135790,string2,dddddd
5555555500,string3,dddddd
Try this:
$ sed -r 's/^[0-9](.*)/\1/' sample.txt
Output:
9999119999,string1,dddddd
8888135790,string2,dddddd
5555555500,string3,dddddd
^[0-9] - The first digit of each line
(.*) - The content of each line except the first digit
\1 - Denote the content of (.*)
Sorry for my bad English.
Grep can solve this with a look behind. For that you need -P option :
grep -Po '(?<=^\d)(.+)' file
or in shorthand :
grep -Po '^\d\K.+' file
The (?<=^\d)/^\d\K part is the look behind that matches the first digit.

Parsing a string with sed

I have a string like prefix-2020.80-suffix-1
Here are all of possible combinations of input string
"2020.80-suffix-1"
"2020.80-suffix"
"prefix-2020.80"
"prefix-2020.80-1"
I need to cut out and assign 2020 to a variable but cannot get my desired output
Here what i got so far...
set var=`echo "prefix-2020.80-suffix-1" | sed "s/[[:alnum:]]*-*\([0-9]*\).*/\1/"`
My regexp does not work for other cases and i cannot figure out why! its more complicated that python's regexp syntax
This should work for all you inputs
sed 's/.*\(^\|-\)\([0-9]*\)\..*/\2/' test
Matches the start of the line or everything up to -[number]. and captures the number.
The problem with the original you were using was you didn't take into account when there wasn't a prefix.
You can use this grep -oP:
echo "prefix-2020.80-suffix-1" | grep -oP '^([[:alnum:]]+-)?\K[0-9]+'
2020
RegEx Demo
Using sed (with extended regex):
echo "prefix-2020.80-suffix-1" |sed -r 's/^([^-]*-|)([0-9]+).*/\2/'
Using grep:
echo "prefix-2020.80-suffix-1" |grep -oP "^([^-]*-|)\K\d+"
2020
-P is for Perl regex.

sed: convert format of date

Fairly new to sed. I am trying to write a sed command that converts dates to the reverse, but not if they're part of other words.
So far I have:
sed 's/[0-9]\{1\}/[0-9]\{1\}/[0-9]\{4\}/SUBSTITUTE/g'
Trying to figure out the substitute part. Thank you!
You need to use word boundaries.
sed 's~\b\([0-9]\{2\}\)/\([0-9]\{2\}\)/\([0-9]\{4\}\)\b~\3/\2/\1~g' file
Example:
$ echo '04/13/1991hello' | sed 's~\b\([0-9]\{2\}\)/\([0-9]\{2\}\)/\([0-9]\{4\}\)\b~\3/\1/\2~g'
04/13/1991hello
$ echo '02/03/2001' | sed 's~\b\([0-9]\{2\}\)/\([0-9]\{2\}\)/\([0-9]\{4\}\)\b~\3/\2/\1~g'
2001/03/02

Using variables in sed -f (where sed script is in a file rather than inline)

We have a process which can use a file containing sed commands to alter piped input.
I need to replace a placeholder in the input with a variable value, e.g. in a single -e type of command I can run;
$ echo "Today is XX" | sed -e "s/XX/$(date +%F)/"
Today is 2012-10-11
However I can only specify the sed aspects in a file (and then point the process at the file), E.g. a file called replacements.sed might contain;
s/XX/Thursday/
So obviously;
$ echo "Today is XX" | sed -f replacements.sed
Today is Thursday
If I want to use an environment variable or shell value, though, I can't find a way to make it expand, e.g. if replacements.txt contains;
s/XX/$(date +%F)/
Then;
$ echo "Today is XX" | sed -f replacements.sed
Today is $(date +%F)
Including double quotes in the text of the file just prints the double quotes.
Does anyone know a way to be able to use variables in a sed file?
This might work for you (GNU sed):
cat <<\! > replacements.sed
/XX/{s//'"$(date +%F)"'/;s/.*/echo '&'/e}
!
echo "Today is XX" | sed -f replacements.sed
If you don't have GNU sed, try:
cat <<\! > replacements.sed
/XX/{
s//'"$(date +%F)"'/
s/.*/echo '&'/
}
!
echo "Today is XX" | sed -f replacements.sed | sh
AFAIK, it's not possible. Your best bet will be :
INPUT FILE
aaa
bbb
ccc
SH SCRIPT
#!/bin/sh
STRING="${1//\//\\/}" # using parameter expansion to prevent / collisions
shift
sed "
s/aaa/$STRING/
" "$#"
COMMAND LINE
./sed.sh "fo/obar" <file path>
OUTPUT
fo/obar
bbb
ccc
As others have said, you can't use variables in a sed script, but you might be able to "fake" it using extra leading input that gets added to your hold buffer. For example:
[ghoti#pc ~/tmp]$ cat scr.sed
1{;h;d;};/^--$/g
[ghoti#pc ~/tmp]$ sed -f scr.sed <(date '+%Y-%m-%d'; printf 'foo\n--\nbar\n')
foo
2012-10-10
bar
[ghoti#pc ~/tmp]$
In this example, I'm using process redirection to get input into sed. The "important" data is generated by printf. You could cat a file instead, or run some other program. The "variable" is produced by the date command, and becomes the first line of input to the script.
The sed script takes the first line, puts it in sed's hold buffer, then deletes the line. Then for any subsequent line, if it matches a double dash (our "macro replacement"), it substitutes the contents of the hold buffer. And prints, because that's sed's default action.
Hold buffers (g, G, h, H and x commands) represent "advanced" sed programming. But once you understand how they work, they open up new dimensions of sed fu.
Note: This solution only helps you replace entire lines. Replacing substrings within lines may be possible using the hold buffer, but I can't imagine a way to do it.
(Another note: I'm doing this in FreeBSD, which uses a different sed from what you'll find in Linux. This may work in GNU sed, or it may not; I haven't tested.)
I am in agreement with sputnick. I don't believe that sed would be able to complete that task.
However, you could generate that file on the fly.
You could change the date to a fixed string, like
__DAYOFWEEK__.
Create a temp file, use sed to replace __DAYOFWEEK__ with $(date +%Y).
Then parse your file with sed -f $TEMPFILE.
sed is great, but it might be time to use something like perl that can generate the date on the fly.
To add a newline in the replacement expression using a sed file, what finally worked for me is escaping a literal newline. Example: to append a newline after the string NewLineHere, then this worked for me:
#! /usr/bin/sed -f
s/NewLineHere/NewLineHere\
/g
Not sure it matters but I am on Solaris unix, so not GNU sed for sure.