grep and/or sed to match a path from a string which has different patterns

grep and/or sed to match a path from a string which has different patterns - sed

I have a big file which is composed of alot of different lines which only have one commen keyword, storaged.
PROC:storage123:0702:2108:0,1,2,3,4,5:storage:vers:storaged:storage123:Storage
123:storage123:-R /etc/orc/storage123 -e emr123#localhost -p Xxx::
PROC:storageabc:0606:2108:0,1,2,3,4,5:storage:vers:storaged:storageabc:Storage
abc:storageabc: -e emabc#localhost -R /etc/orc/storageabc -p 654::
What i need to do is grep for the path that can be found on all storaged keywords that comes after -R. But I only want the path, nothing after that. -R can be found on different places so there is no pattern to it.
I created one espressionen which seemed to work, but I think I made it much for complex (and not 100% sure to match) than it should have to be.
[root:~/scripts/] <conf.txt grep -o 'R *[^ ]*' | grep -o '[^ ]*$' | sed 's/.*R\///'
/etc/orc/storage123
/etc/orc/storagerabc
The espression also is hard to implement in a bash script so something simpler would be great. I need these paths in the script later on.
Cheers

Your attempt is nice, but you can simplify it by using a look-behind:
$ grep -Po '(?<=-R )[^ ]*' file
/etc/orc/storage123
/etc/orc/storageabc
Basically it looks for the string -R (note the space) and from that, it prints everything up to a space.

$ sed 's/.*-R \([^ ]*\).*/\1/' file
/etc/orc/storage123
/etc/orc/storageabc

Related

Using sed to eliminate a specific string

I appreciate your help with this problem. I like to eliminate everything that is not a specific pattern from a string.
For example, below I like to eliminate everything that is not "5TTGTC".
But as seen here ^5TTGTC is not right. I used different combinations of ^(), ^{}, ^[], but none gave me what I am looking for. Appreciate your feedback!
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed 's/^5TTGTC//g'
Thanks in advance

You may use the following command if you want case sensitivity:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|[,.A-Za-z+0-9]/\1/g'
The code above prints:
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
The regular expression used above uses alternation to capture what you are interested in.
We match and capture what we are interested in (5TTGCC) and we match everything that is not the substring, in this case characters ,.A-Za-z+0-9.
You can check the behaviour of the regex here.
As pointed out by #EdMorton, the command can be simplified to:
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | sed -r 's/(5TTGTC)|./\1/g'
You can try this here.
For compatibility across sed versions the -r flag can be replaced by the -E flag.

You don't make it very clear what you are trying to achieve.
One way to get where you are trying to go could be the -o option in grep.
echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" | grep -o '5TTGTC'
Output:
5TTGTC
5TTGTC
5TTGTC
5TTGTC
5TTGTC
You can then change 5TTGTC into a pattern, e.g. grep -o '[0-9]TT[AG]GTC'

With any sed:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
sed 's/#//g; s/5TTGTC/#/g; s/[^#]//g; s/#/5TTGTC/g'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC
With any awk:
$ echo ".,..,...+5TTGTC...+5TTGCC.+5TTGTC,,.,.,,.,+5ttgtc,.,,.,.+5TTGTC.+5TTGTC,..+5TTGTC" |
awk -v str='5TTGTC' '{gsub(str,"\n"); gsub(/[^\n]/,""); gsub(/\n/,str)}1'
5TTGTC5TTGTC5TTGTC5TTGTC5TTGTC

sed with filename from pipe

In a folder I have many files with several parameters in filenames, e.g (just with one parameter) file_a1.0.txt, file_a1.2.txt etc.
These are generated by a c++ code and I'd need to take the last one (in time) generated. I don't know a priori what will be the value of this parameter when the code is terminated. After that I need to copy the 2nd line of this last file.
To copy the 2nd line of the any file, I know that this sed command works:
sed -n 2p filename
I know also how to find the last generated file:
ls -rtl file_a*.txt | tail -1
Question:
how to combine these two operation? Certainly it is possible to pipe the 2nd operation to that sed operation but I dont know how to include filename from pipe as input to that sed command.

You can use this,
ls -rt1 file_a*.txt | tail -1 | xargs sed -n '2p'
(OR)
sed -n '2p' `ls -rt1 file_a*.txt | tail -1`
sed -n '2p' $(ls -rt1 file_a*.txt | tail -1)

Typically you can put a command in back ticks to put its output at a particular point in another command - so
sed -n 2p `ls -rt name*.txt | tail -1 `
Alternatively - and preferred, because it is easier to nest etc -
sed -n 2p $(ls -rt name*.txt | tail -1)

-r in ls is reverse order.
-r, --reverse
reverse order while sorting
But it is not good idea when used it with tail -1.
With below change (head -1 without r option in ls), performance will be better, that you needn't wait to list all files then pipe to tail command
sed -n 2p $(ls -t1 name*.txt | head -1 )

I was looking for a similar solution: taking the file names from a pipe of grep results to feed to sed. I've copied my answer here for the search & replace, but perhaps this example can help as it calls sed for each of the names found in the pipe:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. Feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
fwiw - I had some problems using the tail method, it seems that the entire dataset was generated before calling tail on just the last item.

In-place replacement

I have a CSV. I want to edit the 35th field of the CSV and write the change back to the 35th field. This is what I am doing on bash:
awk -F "," '{print $35}' test.csv | sed -i 's/^0/+91/g'
so, I am pulling the 35th entry using awk and then replacing the "0" in the starting position in the string with "+91". This one works perfet and I get desired output on the console.
Now I want this new entry to get written in the file. I am thinking of sed's "in -place" replacement feature but this fetuare needs and input file. In above command, I cannot provide input file because my primary command is awk and sed is taking the input from awk.
Thanks.

You should choose one of the two tools. As for sed, it can be done as follows:
sed -ri 's/^(([^,]*,){34})0([^,]*)/\1+91\3/' test.csv
Not sure about awk, but #shellter's comment might help with that.

The in-place feature of sed is misnamed, as it does not edit the file in place. Instead, it creates a new file with the same name. eg:
$ echo foo > foo
$ ln -f foo bar
$ ls -i foo bar # These are the same file
797325 bar 797325 foo
$ echo new-text > foo # Changes bar
$ cat bar
new-text
$ printf '/new/s//newer\nw\nq\n' | ed foo # Edit foo "in-place"; changes bar
9
newer-text
11
$ cat bar
newer-text
$ ls -i foo bar # Still the same file
797325 bar 797325 foo
$ sed -i s/new/newer/ foo # Does not edit in-place; creates a new file
$ ls -i foo bar
797325 bar 792722 foo
Since sed is not actually editing the file in place, but writing a new file and then renaming it to the old file, you might as well do the same.
awk ... test.csv | sed ... > test.csv.1 && mv test.csv.1 test.csv
There is the misperception that using sed -i somehow avoids the creation of the temporary file. It does not. It just hides the fact from you. Sometimes abstraction is a good thing, but other times it is unnecessary obfuscation. In the case of sed -i, it is the latter. The shell is really good at file manipulation. Use it as intended. If you do need to edit a file in place, don't use the streaming version of ed; just use ed

So, it turned out there are numerous ways to do it. I got it working with sed as below:
sed -i 's/0\([0-9]\{10\}\)/\+91\1/g' test.csv
But this is little tricky as it will edit any entry which matches the criteria. however in my case, It is working fine.
Similar implementation of above logic in perl:
perl -p -i -e 's/\b0(\d{10})\b/\+91$1/g;' test.csv
Again, same caveat as mentioned above.
More precise way of doing it as shown by Lev Levitsky because it will operate specifically on the 35th field
sed -ri 's/^(([^,]*,){34})0([^,]*)/\1+91\3/g' test.csv
For more complex situations, I will have to consider using any of the csv modules of perl.
Thanks everyone for your time and input. I surely know more about sed/awk after reading your replies.

This might work for you:
sed -i 's/[^,]*/+91/35' test.csv
EDIT:
To replace the leading zero in the 35th field:
sed 'h;s/[^,]*/\n&/35;/\n0/!{x;b};s//+91/' test.csv
or more simply:
|sed 's/^\(\([^,]*,\)\{34\}\)0/\1+91/' test.csv

If you have moreutils installed, you can simply use the sponge tool:
awk -F "," '{print $35}' test.csv | sed -i 's/^0/+91/g' | sponge test.csv
sponge soaks up the input, closes the input pipe (stdin) and, only then, opens and writes to the test.csv file.
As of 2015, moreutils is available in package repositories of several major Linux distributions, such as Arch Linux, Debian and Ubuntu.

Another perl solution to edit the 35th field in-place:
perl -i -F, -lane '$F[34] =~ s/^0/+91/; print join ",",#F' test.csv
These command-line options are used:
-i edit the file in-place
-n loop around every line of the input file
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code
-F autosplit modifier, in this case splits on ,
#F is the array of words in each line, indexed starting with 0
$F[34] is the 35 element of the array
s/^0/+91/ does the substitution

How do I push `sed` matches to the shell call in the replacement pattern?

I need to replace several URLs in a text file with some content dependent on the URL itself. Let's say for simplicity it's the first line of the document at the URL.
What I'm trying is this:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \1 | head -n 1)/" file.txt
This doesn't work, since \1 is not set. However, the shell is getting called. Can I somehow push the sed match variables to that subprocess?

The accept answer is just plain wrong. Proof:
Make an executable script foo.sh:
#! /bin/bash
echo $* 1>&2
Now run it:
$ echo foo | sed -e "s/\\(foo\\)/$(./foo.sh \\1)/"
\1
$
The $(...) is expanded before sed is run.

So you are trying to call an external command from inside the replacement pattern of a sed substitution. I dont' think it can be done, the $... inside a pattern just allows you to use an already existent (constant) shell variable.
I'd go with Perl, see the /e option in the search-replace operator (s/.../.../e).
UPDATE: I was wrong, sed plays nicely with the shell, and it allows you do to that. But, then, the backlash in \1 should be escaped. Try instead:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \\1 | head -n 1)/" file.txt

Try this:
sed "s/^URL=\(.*\)/\1/" file.txt | while read url; do sed "s#URL=\($url\)#TITLE=$(curl -s $url | head -n 1)#" file.txt; done
If there are duplicate URLs in the original file, then there will be n^2 of them in the output. The # as a delimiter depends on the URLs not including that character.

Late reply, but making sure people don't get thrown off by the answers here -- this can be done in gnu sed using the e command. The following, for example, decrements a number at the beginning of a line:
echo "444 foo" | sed "s/\([0-9]*\)\(.*\)/expr \1 - 1 | tr -d '\n'; echo \"\2\";/e"
will produce:
443 foo

Single sed command for multiple substitutions?

I use sed to substitute text in files.
I want to give sed a file which contains all the strings to be searched and replaced in a given file.
It goes over .h and .cpp files. In each file it searches for file names which are included in it. If found, it substitutes for example "a.h" with "<a.h>" (without the quotes).
The script is this:
For /F %%y in (all.txt) do
for /F %%x in (allFilesWithH.txt) do
sed -i s/\"%%x\"/"\<"%%x"\>"/ %%y
all.txt - List of files to do the substitution in them
allFilesWithH.txt - All the include names to be searched
I don't want to run sed several times (as the number of files names in input.txt.) but I want to run a single sed command and pass it input.txt as input.
How can I do it?
P.S I run sed from VxWorks Development shell, so it doesn't have all the commands that the Linux version does.

You can eliminate one of the loops so sed only needs to be called once per file. Use
the -f option to specify more than one substitution:
For /F %%y in (all.txt) do
sed -i -f allFilesWithHAsSedScript.sed %%y
allFilesWithHAsSedScript.sed derives from allFilesWithH.txt and would contain:
s/\"file1\"/"\<"file1"\>"/
s/\"file2\"/"\<"file2"\>"/
s/\"file3\"/"\<"file3"\>"/
s/\"file4\"/"\<"file4"\>"/
(In the article Common threads: Sed by example, Part 3 there are many examples of sed scripts with explanations.)
Don't get confuSed (pun intended).

sed itself has no capability to read filenames from a file. I'm not familiar with the VxWorks shell, and I imagine this is something to do with the lack of answers... So here are some things that would work in bash - maybe VxWorks will support one of these things.
sed -i 's/.../...' `cat all.txt`
sed -i 's/.../...' $(cat all.txt)
cat all.txt | xargs sed -i 's/.../...'
And really, it's no big deal to invoke sed several times if it gets the job done:
cat all.txt | while read file; do sed -i 's/.../.../' $file; done
for file in $(cat all.txt); do # or `cat all.txt`
sed -i 's/.../.../' $file
done

What I'd do is change allFilesWithH.txt into a sed command using sed.
(When forced to use sed. I'd actually use Perl instead, it can also do the search for *.h files.)