Extracting a string between two strings

Extracting a string between two strings - sed

I have a file, ABDC.DELTA00.TS.D20161022.TS_BAR99.DAT.DOCC.
I want to cut the text between two strings: the first TS and DOCC. I tried
efvar4=$(echo $filename | sed -n "s/.*TS//;s/DOCC.*//p")
resulting in _BAR99.DAT – matching the second TS in the filename.
Desired result: .TS.D20161022.TS_BAR99.DAT.
How do I modify my sed command to achieve the desired result?

echo "ABDC.DELTA00.TS.D20161022.TS_BAR99.DAT.DOCC" | sed 's/^.*\.TS\./.TS./;s/\.DOCC/./'

Related

Using grep and sed to extract file name and number

I have a list of files in the current directory, some of those contain the keyword "speed", assuming in the same line with the keyword, I have a number.
For example, in the file "filename.txt", I have the following lines:
some text
speed: this is the keyword, and equals 150
some text
I want to use a combination of grep and sed to get the following output:
filename: 150
Currently, I can only extract file names and the line that contains the keyword using grep, but I don't know how to form the output as above using a combination of grep and sed. The grep command I have so far is:
grep -r "speed"
which gives me:
filename.txt:speed: this is the keyword, and equals 150
Any help would be appreciated!

As Wiktor Stribiżew as mentioned in the comment
The below command will provide you the desired output
awk '/speed/{print FILENAME": "$NF}' filename.txt
Explanation
/speed/ is used since that is the keyword used as a reference for extracting.
{print FILENAME": "$NF}
print FILENAME will print the respective filename
$NF which denotes the number of fields, using NF with awk will print the string or word at the last field, for this text file that is 150

Assuming the filenames do not contain colon : character, would you please try the following:
grep -r "speed" | sed -E 's/^([^:]+):[^0-9]*([0-9]+)/\1: \2/'
In the sed command:
^([^:]+) matches the filename and the 1st capture group is assigned to it.
[^0-9]* matches non-digits to be skipped.
([0-9]+) matches digits and the 2nd capture group is assigned to it.

How to use sed for substituting 2nd column in shell

I have file that looks like this :
1,2,3,4
5,6,7,8
I want to substitute 2rd column containing 6 to 89. The desired output is
1,2,3,4
5,89,7,8
But if I type
index=2
cat file | sed 's/[^,]*/89/'$index
I get
1,89,3,4
5,89,7,8
and if I type
index=2
cat file | sed 's/[^,]6/89/'$index
nothing changes.
Why is it like this? How can I fix this? Thank you.

Since you want to change the second column containing a 6 and you have a comma as field separator it is actually very easy with sed:
sed 's/^\([^,]*\),6,/\1,89,/`
Here we make use of back-referencing to remember the first column.
If you want to replace the 6 in the 5th column, you can do something like:
sed 's/^\(\([^,]*,\)\{4\}\)6,/\189,/'
It is, however, much more comfortable using awk:
awk 'BEGIN{FS=OFS=","}($2==6){$2=89}1'

I solved this by using awk
awk 'BEGIN{FS=OFS=","} {if ($2==6) $2=89}1' file >file1

AWK/Sed string manipulation

I have a string in the following format and I want to convert it to csv format (note the separator is the underscore character "_"
Title_YYYYMMDD_emailname convert to Title,YYYYMMDD,emailname
This is simple enough using sed ...
echo "Report_20131107_jlsmith" | sed 's/_/,/g'
Output:
Report,20131107,jlsmith
But there are complications trying to parse a string that contains underscores in the title field ..
I want to retain the underscores in the title (if any) but change the underscores to commas for the
date and emailname ...
For instance:
Report_Title_20131107_jlsmith convert to: Report_Title,20131107,jlsmith
And a related question: is there a way to compress multiple repeating instances of the underscore character for the entire string?
Report_Title____20131107_jlsmith convert to: Report_Title,20131107,jlsmith

Last request first:
echo "Report_Title____20131107_jlsmith" | awk '{gsub(/_+/,"_")}1'
Report_Title_20131107_jlsmith
First request (using gnu awk)
echo "Report_Title_more_20131107_jlsmith" | awk '{print gensub(/_([0-9]+)_/,",\\1,","g")}'
Report_Title_more,20131107,jlsmith
All in one command
echo "Report_Title___more_20131107_jlsmith" | awk '{gsub(/_+/,"_");print gensub(/_([0-9]+)_/,",\\1,","g")}'
Report_Title_more,20131107,jlsmith

With the format you have shown, you could replace ____YYYYMMDD_ with ,YYYYMMDD, using sed as follows
echo 'Report_Title____20131107_jlsmith' | sed 's/__*\([0-9]\{8\}\)__*/,\1,/g'
Report_Title,20131107,jlsmith

Using sed
sed -r -e 's/_+/_/g' -e 's/_([^_]+)_([^_]+)$/,\1,\2/'
Or more robust with stringent regex
sed -r -e 's/_+/_/g' -e 's/^(.+)_([0-9]{8})_(\w+)$/\1,\2,\3/'

How do I get rid of lines not matching a timestamp via sed?

I am not sure why sed is not working as expected in this particular instance. I have lines of the form:
12:42:46.675 token
where I expect the timestamp to alwas have that format. Unfortunately every now and then there are lines in the file which do not begin with a timestamp and I want to get rid of those. I tried filtering out everything that does not match the above with:
sed -n /^\d{2}:\d{2}:\d{2}.\d{3}/p
but the above filters everything out, even if I give sed the -r option. What is the correct way of doing that with sed? And is there an alternative with grep?

Using grep to only display lines starting with timestamp format:
grep -E '^([0-9]{2}:){2}[0-9]{2}\.[0-9]{3} ' file

Sed doesn't accept \d, use [0-9] instead. And both { and } are not metacharacters, they are literal for sed so you will need to escape them for the special behaviour, it would result like:
sed -n '/^[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}.[0-9]\{3\}/p' infile
EDIT: Also surround the expression between quotes (better singles than double) to avoid shell expansion.

Sed - pattern matching with binary value as separator?

Is it possible to use binary values in sed pattern matching?
I have a one line strings which contain plain text fields separated by binary value 1 as separator.
Is it possible to use sed to much everything up to binary separator 1?
Or should I use awk?
Example string where \x1 represents binary value 1:
key1=value1\x1key2=value2\x1key3=value3
Example expected output, values for key1 and key2:
value1 value2

edit: Here are a couple of options for printing the values based on a list of keys, couldn't figure out a more concise way with awk but one probably exists:
$ echo -e 'key1=value1\001key2=value2\001key3=value3' > test
$ sed 's/\x01/\n/g' test | awk -F= '{if ($1 == "key1" || $1 == "key2") print $2}'
value1
value2
$ sed 's/\x01/\n/g' test | perl -pe 's/((key1|key2)=(.*)|.*)/\3/'
value1
value2
You can't match everything up to the first \x1 since sed does not support non-greedy matching, your options are to use a different language, or something like the following:
$ sed 's/\x01/\n/g' test | head -n 1
key1=value1
The answer to the following question has a good example of using a Perl regex for non-greedy matches:
Non greedy regex matching in sed?

You have to find a way to get the \x1 in binary in the command there as sed doesn't parse it. For example to convert them all to new lines:
sed -e "s/$(echo -e \\001)/\n/g" filename

Type Control-A at the point where you want the character \001 to appear.
I would find this a lot easier than handling all the necessary escaping to get echo to produce the correct string if there are any backslashes in the regex - and I find there often are such backslashes.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Extracting a string between two strings - sed

echo "ABDC.DELTA00.TS.D20161022.TS_BAR99.DAT.DOCC" | sed 's/^.*\.TS\./.TS./;s/\.DOCC/./'

Related

Using grep and sed to extract file name and number

How to use sed for substituting 2nd column in shell

AWK/Sed string manipulation

How do I get rid of lines not matching a timestamp via sed?

Sed - pattern matching with binary value as separator?

Categories

Resources