Using grep to adjust timecode

Using grep to adjust timecode - sed

I'm trying to change the timecode found from one format into another, basically to remove the milliseconds off the end of a file and update it. This is to remove extra milliseconds from a transcription timecode software and make it look pretty for file for client.
Input looks like this:
00:50:34.00>INTERVIEWER
Why was it ............... script?
00:50:35.13>JOHN DOE
Because of the quality.
So I'm trying to use grep to match the timecode and got it working with following expression.
grep [0-9][0-9][:][0-9][0-9][:][0-9][0-9]\.[0-9][0-9] -P -o transcriptionFile.txt
Output looks like this:
00:50:34.00
00:50:35.13
So now I'm trying to take timecode and update the file with updated values like:
00:50:34
00:50:35
How do I do that? Should I use a pipe to push it over to sed so I can update the values in the file?
I've also tried to use sed with following command:
sed 's/[0-9][0-9][:][0-9][0-9][:][0-9][0-9]\.[0-9][0-9]/[0-9][0-9][:][0-9][0-9][:][0-9][0-9]/g' transcriptionFile.txt > outtranscriptionFile.txt
I get output but puts in my RegExp in place where timecode is supposed to be. Any ideas? Also How do I can trim last 3 digits off far right side of timecode before I update file?
Any tips or suggestions will be much appreciated.
Thanks :-)

With GNU sed:
$ sed -r 's/^([0-9]{2}:[0-9]{2}:[0-9]{2})\>\.[0-9]{2}/\1/' transcriptionFile.txt
00:50:34>INTERVIEWER
Why was it ............... script?
00:50:35>JOHN DOE
Because of the quality.
To edit the file in place, add the -i option:
sed -r -i 's/^([0-9]{2}:[0-9]{2}:[0-9]{2})\>\.[0-9]{2}/\1/' transcriptionFile.txt
Explanation:
[0-9]{2}: matches every two digits followed by a :. All three occurences are captured using brackets.
\>\.[0-9]{2} matches > followed by a dot and two digits.
using backreference \1, strings matching previous pattern are replaced with captured characters (timecode without milliseconds).

Related

Using Two Regex Strings in SED

I have this text file where I need to first find a string "BEGINNING" and then find a string "HERE" after the first "BEGINNING" but only once. And there can be any amount of strings in between. This must be done with SED commands so no awk. I know I can simply do /BEGINNING/ to find the first one but I don't know how to put the two together in one SED command.

something like this?
$ sed -n '/BEGINNING/,${/HERE/{p;q}}' file
may be supported only by GNU sed, not sure.

How to use sed to isolate only the first part of a file

I'm running Windows and have the GnuWin32 toolkit, which includes sed. Specifically:
C:\TEMP>sed --version
GNU sed version 4.2.1
I have a text file with two sections: A fixed part I want to preserve, and a part that's appended after running a job.
In the file is a unique string that identifies the start of the part that's added, and I'd like to use Gnu sed to isolate only the part of the file that's before the unique string - i.e., so I can append different data to the fixed part each time the job is run.
I know I could keep the fixed portion in a separate file, but that adds complexity and it would be more elegant if I could just reuse the data at the start of the same file.
A long time ago I knew how to set up sed scripts, and I'm sure this can be done with sed, but I've slept since then. :)
Can you please describe how to use sed to display the lines of text in a file up to and not including a specific string?
Example:
line 1 of fixed portion
line 2 of fixed portion
unique string
line 1 of appended portion
line 2 of appended portion
line 3 of appended portion
What I'd like is to see as output:
line 1 of fixed portion
line 2 of fixed portion
I've gotten as far as:
sed -r -n -e "0,/unique string/p"
but that prints the unique string as well.
Thanks in advance.
-Noel

This should work for you:
sed -n '/unique string/q;p' file
It quits processing at unique string. Other lines get printed.
An alternative might be to use a range address like this:
sed -n '1,/unique string/{/unique string/!p}' file
Note that sed includes the range border. We need to exclude unique string from printing.
Furthermore I'm using the -n option which makes sed suppress the output of input lines by default.
One thing, if unique string can contain characters which are also syntax characters in the regex like ...
test*
... sed might not be the right tool for the job any more since it can only match regular expressions but not fixed strings.
In that case awk might be the tool of choice:
awk 'index("*unique string*"){exit}1' file
index("string") returns a non zero value (the position) if the string has been found. We cancel further processing of input lines in that case and don't print that line as well.
The trailing 1 always evaluates to true and makes awk print all the lines until the previous condition applies.

Insert specific lines from file before first occurrence of pattern using Sed

I want to insert a range of lines from a file, say something like 210,221r before the first occurrence of a pattern in a bunch of other files.
As I am clearly not a GNU sed expert, I cannot figure how to do this.
I tried
sed '0,/pattern/{210,221r file
}' bunch_of_files
But apparently file is read from line 210 to EOF.

Try this:
sed -r 's/(FIND_ME)/PUT_BEFORE\1/' test.text
-r enables extendend regular expressions
the string you are looking for ("FIND_ME") is inside parentheses, which creates a capture group
\1 puts the captured text into the replacement.
About your second question: You can read the replacement from a file like this*:
sed -r 's/(FIND_ME)/`cat REPLACEMENT.TXT`\1/' test.text
If replace special characters inside REPLACEMENT.TXT beforehand with sed you are golden.
*= this depends on your terminal emulator. It works in bash.

In https://stackoverflow.com/a/11246712/4328188 CodeGnome gave some "sed black magic" :
In order to insert text before a pattern, you need to swap the pattern space into the hold space before reading in the file. For example:
sed '/pattern/ {
h
r file
g
N
}' in
However, to read specific lines from file, one may have to use a two-calls solution similar to dummy's answer. I'd enjoy knowing of a one-call solution if it is possible though.

How to have SED remove all characters between a hypen and the file extension

I have been trying with no luck to change this
'Simple' week 1-117067638.mp3
into this
'Simple' week 1.mp3
However when I use the command sed 's/\(-\).*\(.mp3\)//' I get
'Simple' week 1
How do I keep my file extension? If you could explain the command you use it would be great so that I can learn from this instead of just getting an answer.

You don't need to have a capturing group.
$ echo "'Simple' week 1-117067638.mp3" | sed 's/-.*\.mp3/.mp3/g'
'Simple' week 1.mp3
OR
$ echo "'Simple' week 1-117067638.mp3" | sed 's/-.*\(\.mp3\)/\1/g'
'Simple' week 1.mp3
What's wrong with your code?
sed 's/\(-\).*\(.mp3\)//'
sed would replace all the matched characters with the characters in the replacement part. So \(-\).*\(.mp3\) matches all the characters from - to .mp3 (you must need to escape the dot in-order to match a literal dot). You're replacing all the matched characters with an empty string. So .mp3 also got removed. In-order to avoid this, add .mp3 to the replacement part.
In basic sed, capturing groups are represented by \(..\). This capturing group is used to capture characters which are to be referenced later.

This task can also be done just in bash without calling sed:
$ fname="'Simple' week 1-117067638.mp3"
$ fname="${fname/-*/}.mp3"
$ echo "$fname"
'Simple' week 1.mp3

Manipulate characters with sed

I have a list of usernames and i would like add possible combinations to it.
Example. Lets say this is the list I have
johna
maryb
charlesc
Is there is a way to use sed to edit it the way it looks like
ajohn
bmary
ccharles
And also
john_a
mary_b
charles_c
etc...
Can anyone assist me into getting the commands to do so, any explanation will be awesome as well. I would like to understand how it works if possible. I usually get confused when I see things like 's/\.(.*.... without knowing what some of those mean... anyway thanks in advance.
EDIT ... I change the username

sed s/\(user\)\(.\)/\2\1/
Breakdown:
sed s/string/replacement/ will replace all instances of string with replacement.
Then, string in that sed expression is \(user\)\(.\). This can be broken down into two
parts: \(user\) and \(.\). Each of these is a capture group - bracketed by \( \). That means that once we've matched something with them, we can reuse it in the replacement string.
\(user\) matches, surprisingly enough, the user part of the string. \(.\) matches any single character - that's what the . means. Then, you have two captured groups - user and a (or b or c).
The replacement part just uses these to recreate the pattern a little differently. \2\1 says "print the second capture group, then the first capture group". Which in this case, will print out auser - since we matched user and a with each group.
ex:
$ echo "usera
> userb
> userc" | sed "s/\(user\)\(.\)/\2\1/"
auser
buser
cuser
You can change the \2\1 to use any string you want - ie. \2_\1 will give a_user, b_user, c_user.
Also, in order to match any preceding string (not just "user"), just replace the \(user\) with \(.*\). Ex:
$ echo "marya
> johnb
> alfredc" | sed "s/\(.*\)\(.\)/\2\1/"
amary
bjohn
calfred

here's a partial answer to what is probably the easy part. To use sed to change usera to user_a you could use:
sed 's/user/user_/' temp
where temp is the name of the file that contains your initial list of usernames. How this works: It is finding the first instance of "user" on each line and replacing it with "user_"
Similarly for your dot example:
sed 's/user/user./' temp
will replace the first instance of "user" on each line with "user."

Sed does not offer non-greedy regex, so I suggest perl:
perl -pe 's/(.*?)(.)$/$2$1/g' file
ajohn
bmary
ccharles
perl -pe 's/(.*?)(.)$/$1_$2/g' file
john_a
mary_b
charles_c
That way you don't need to know the username before hand.

Simple solution using awk
awk '{a=$NF;$NF="";$0=a$0}1' FS="" OFS="" file
ajohn
bmary
ccharles
and
awk '{a=$NF;$NF="";$0=$0"_" a}1' FS="" OFS="" file
john_a
mary_b
charles_c
By setting FS to nothing, every letter is a field in awk. You can then easy manipulate it.
And no need to using capturing groups etc, just plain field swapping.

This might work for you (GNU sed):
sed -r 's/^([^_]*)_?(.)$/\2\1/' file
This matches any charactes other than underscores (in the first back reference (\1)), a possible underscore and the last character (in the second back reference (\2)) and swaps them around.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Using grep to adjust timecode - sed

Related

Using Two Regex Strings in SED

How to use sed to isolate only the first part of a file

Insert specific lines from file before first occurrence of pattern using Sed

How to have SED remove all characters between a hypen and the file extension

Manipulate characters with sed

Categories

Resources