I need to replace all occurrences of a string after nth occurrence in every line of a Unix file.
My file data:
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
My output data:
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
tried using sed: sed 's/://3g' test.txt
Unfortunately, the g option with the occurrence is not working as expected. instead, it is replacing all the occurrences.
Another approach using awk
awk -v c=':' -v n=2 'BEGIN{
FS=OFS=""
}
{
j=0;
for(i=0; ++i<=NF;)
if($i==c && j++>=n)$i=""
}1' file
$ cat file
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
$ awk -v c=':' -v n=2 'BEGIN{FS=OFS=""}{j=0;for(i=0; ++i<=NF;)if($i==c && j++>=n)$i=""}1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
With GNU awk, using gensub please try following. This is completely based on your shown samples, where OP wants to remove : from 3rd occurrence onwards. Using gensub to segregate parts of matched values and removing all colons from 2nd part(from 3rd colon onwards) in it as per OP's requirement.
awk -v regex="^([^:]*:)([^:]*:)(.*)" '
{
firstPart=restPart=""
firstPart=gensub(regex, "\\1 \\2", "1", $0)
restPart=gensub(regex,"\\3","1",$0)
gsub(/:/,"",restPart)
print firstPart restPart
}
' Input_file
I have inferred based on the limited data you've given us, so it's possible this won't work. But I wouldn't use regex for this job. What you have there is colon delimited fields.
So I'd approach it using split to extract the data, and then some form of string formatting to reassemble exactly what you like:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ( undef, $first, #rest ) = split /:/;
print ":$first:", join ( "", #rest ),"\n";
}
__DATA__
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
This gives you the desired result, whilst IMO being considerably clearer for the next reader than a complicated regex.
You can use the perl solution like
perl -pe 's~^(?:[^:]*:){2}(*SKIP)(?!)|:~~g if /^:account_id:/' test.txt
See the online demo and the regex demo.
The ^(?:[^:]*:){2}(*SKIP)(?!)|: regex means:
^(?:[^:]*:){2}(*SKIP)(?!) - match
^ - start of string (here, a line)
(?:[^:]*:){2} - two occurrences of any zero or more chars other than a : and then a : char
(*SKIP)(?!) - skip the match and go on to search for the next match from the failure position
| - or
: - match a : char.
And only run the replacement if the current line starts with :account_id: (see if /^:account_id:/').
Or an awk solution like
awk 'BEGIN{OFS=FS=":"} /^:account_id:/ {result="";for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result}' test.txt
See this online demo. Details:
BEGIN{OFS=FS=":"} - sets the input/output field separator to :
/^:account_id:/ - line must start with :account_id:
result="" - sets result variable to an empty string
for (i=1; i<=NF; ++i) { result = result (i > 2 ? $i : $i OFS)}; print result} - iterates over the fields and if the field number is greater than 2, just append the current field value to result, else, append the value + output field separator; then print the result.
I would use GNU AWK following way if n fixed and equal 2 following way, let file.txt content be
:account_id:12345:6789:Melbourne:Aus
:account_id:98765:43210:Adelaide:Aus
then
awk 'BEGIN{FS=":";OFS=""}{$2=FS $2 FS;print}' file.txt
output
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
Explanation: use : as field separator and nothing as output field separator, this itself does remove all : so I add : which have to be preserved: 1st (before second column) and 2nd (after second column). Beware that I tested it solely for this data, so if you would want to use it you should firstly test it with more possible inputs.
(tested in gawk 4.2.1)
This might work for you (GNU sed):
sed 's/:/\n/3;h;s/://g;H;g;s/\n.*\n//' file
Replace the third occurrence of : by a newline.
Make a copy of the line.
Delete all occurrences of :'s.
Append the amended line to the copy.
Join the two lines by removing everything from third occurrence of the copy to the third occurrence of the amended line.
N.B. The use of the newline is the best delimiter to use in the case of sed, as the line presented to seds commands are initially devoid of newlines. However the important property of the delimiter is that it is unique and therefore can be any such character as long as it is not found anywhere in the data set.
An alternative solution uses a loop to remove all :'s after the first two:
sed -E ':a;s/^(([^:]*:){2}[^:]*):/\1/;ta' file
With GNU awk for the 3rd arg to match() and gensub():
$ awk 'match($0,/(:[^:]+:)(.*)/,a){ $0=a[1] gensub(/:/,"","g",a[2]) } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
and with any awk in any shell on every Unix box:
$ awk 'match($0,/:[^:]+:/){ tgt=substr($0,1+RLENGTH); gsub(/:/,"",tgt); $0=substr($0,1,RLENGTH) tgt } 1' file
:account_id:123456789MelbourneAus
:account_id:9876543210AdelaideAus
My input is:
"INTC_KEY,ABC1|OBJID,ABC2"
And I want to send the output to a file like:
DDS.INTC_KEY = REPL.OBJID AND DDS.ABC1 = REPL.ABC2
Here is what I've tried so far:
sed 's/^/DDS./g' | sed 's/|/=REPL./g' | tr '\n' '~' | sed 's/~/_N~/g' | sed 's/~$/\n/g' | sed 's/~/~\n/g' | sed 's/~/ AND/g' > ${LOG_DIR}/JOIN.tmp
Based on the single line of input, the following regular expression will transform the input into the desired output:
s/"\([^,]*\),\([^|]*\)|\([^,]*\),\(.*\)"/DDS.\1 = REPL.\3 AND DDS.\2 = REPL.\4/
This shell command shows it working:
$ echo '"INTC_KEY,ABC1|OBJID,ABC2"' | sed 's/"\([^,]*\),\([^|]*\)|\([^,]*\),\(.*\)"/DDS.\1 = REPL.\3 AND DDS.\2 = REPL.\4/'
DDS.INTC_KEY = REPL.OBJID AND DDS.ABC1 = REPL.ABC2
The regular expression basically matches four pieces of text (using the escaped parentheses), delimited by the commas and vertical bar and made available as the \1-\4 back references for the substitution.
Note: I’ve tried to stick to using the features of standard sed and I tested using GNU sed with the POSIXLY_CORRECT environment variable set to 1 to emulate standard sed.
How to print only string figure with the following line :
\begin{figure}[h!]
I tried :
firstLine='\begin{figure}[h!]'
echo $firstLine | sed -n 's/\\begin{\(.*\)}/\1/p'
but returns :
figure[h!] instead of figure
It seems that issue comes from [] or ! character.
firstLine='\begin{figure}[h!]'
echo "$firstLine" | sed 's/.*{\(.*\)}.*/\1/'
Output:
figure
With your code (add .*):
echo $firstLine | sed -n 's/\\begin{\(.*\)}.*/\1/p'
This might work for you (GNU sed):
sed 's/.*{\(.*\)}.*/\1/' file
This assumes there is only one {...} expression and one line.
A more rigorous solution would be:
sed -n 's/.*\\begin{\([^}]*\)}.*/\1/p' file
However nothing would be output if no match was found.
i want to replace lines which contains a string that has some special characters.
i used \ and \ for escape special characters but nothing changes in file.
i use sed like this:
> sed -i '/pnconfig\[\'dbhost\'\] = \'localhost\'/c\This line is removed.' tco.php
i just want to find lines that contains :
$pnconfig['dbhost'] = 'localhost';
and replace that line with:
$pnconfig['dbhost'] = '1.1.1.1';
Wrap the sed in double quotes as
sed -i "s/\(pnconfig\['dbhost'\] = \)'localhost'/\1'1.1.1.1'/" filename
Test
$ echo "\$pnconfig['dbhost'] = 'localhost';" | sed "s/\(pnconfig\['dbhost'\] = \)'localhost'/\1'1.1.1.1'/"
$pnconfig['dbhost'] = '1.1.1.1';
Use as below:
sed -i.bak '/pnconfig\[\'dbhost\'\] = \'localhost\'/pnconfig\[\'dbhost\'\] = \'1.1.1.1\'/' tco.php
Rather than modifying the file for the first time, create back up and then search for your pattern and then replace it with the other as above in your file tco.php
You don't have to worry about backslashing single quotes by using double quotes for sed.
sed -i.bak "/pnconfig\['dbhost'\] = 'localhost'/s/localhost/1.1.1.1/g" File
Try this one.
sed "/$pnconfig\['dbhost']/s/localhost/1.1.1.1/"
I want to search for keyword "mykey = " in a file and print out the string that is following the keyword.
I cannot do a "grep", because each line is very long. I just want to extract the string following the keyword.
Here's what I came up with. Not first, but works, and without the final grep.
grep 'mykey = ' file | sed 's/.*\(mykey = [A-Za-z]*\).*/\1/'
Assuming the keyword is a single word and a space follows it, like this:
mykey = myCoolValue
grep 'mykey' /your/file/here | sed -r 's/.*mykey = (^[ ]*) .*/\1/g' | grep .
If you have pcregrep at hand, you can issue this command in terminal or in a script to get only desired text after mykey =
$ pcregrep -o '(?<=mykey = ).+' file
The regex uses a positive lookbehind, where -o returns only the matched text, not the whole line.