Sed Pattern Match then Append to Line - sed

I have some lines down below and I'm trying to append "Check" to the line that starts with Apples. Does someone know how I can get "Check" on the same line as Apples, not a new one and print the output? I wasn't getting anywhere on my own.
Thanks
What I have:
Grocery store bank and hardware store
Apples Bananas Milk
What I want:
Grocery store bank and hardware store
Apples Bananas Milk Check
What I tried:
sed -i '/^Apples/a Check' file
What I got:
Grocery store bank and hardware store
Apples Bananas Milk
Check

This might work for you (GNU sed):
sed '/Apples/s/$/ check/' file
If a line contains Apples append the string check. Where $ represents an anchor that is the end of the line (see here).

The problem is that you append the line with a command, see this reference:
The "a" command appends a line after the range or pattern.
What you want is a mere substitution. However, there may be some more tweaks you would like to implement, here are some suggestions:
sed -i 's/Apples/& Check/g' file # Adds ' Check' after each 'Apples'
sed -i 's/\<Apples\>/& Check/g' file # Only adds ' Check' after 'Apples' as whole word
sed -i -E 's/\<Apples(\s+Check)?\>/& Check/g' file # Adds ' Check' after removing existing ' Check'
Note these suggestions are for GNU sed only. \< and \> in GNU sed patterns are word boundaries, \s+ matches one or more whitespaces in GNU sed POSIX ERE patterns, and -E enables the POSIX ERE pattern syntax.
See the online demo:
#!/bin/bash
s='Grocery store bank and hardware store
Apples Bananas Milk'
sed 's/Apples/& Check/g' <<< "$s"
sed 's/\<Apples\>/& Check/g' <<< "$s"
sed -E 's/\<Apples(\s+Check)?\>/& Check/g' <<< "$s"
Output in each case is:
Grocery store bank and hardware store
Apples Check Bananas Milk

Using sed
$ sed '/^Apples/s/.*/& Check/' input_file
Grocery store bank and hardware store
Apples Bananas Milk Check
You can match lines that begin with Apples, return it with & appending Check

Related

Can I avoid duplicate strings with the sed "a\" command?

Can I avoid duplicate strings with the sed "a" command?
I added the word "apple" under "true" in my file.txt.
The problem is that every time I run the command "apple" is appended.
$ sed -i '/true/a\apple' file.txt ...execute 3 time
$ cat file.txt
true
apple
apple
apple
If the word "apple" already exists, repeating the sed command does not want to add any more.
I have no idea, please help me
...
I want to do this,
...execute sed command anytime
$ cat file.txt
true
apple
It seems you don't want to append the line apple if the line following the true already contains apple. Then this sed command should do the trick.
sed -i.backup '
/true/!b
$!{N;/\napple$/!s/\n/&apple&/;p;d;}
a\
apple
' file.txt
Explanation of sed commands:
If the line doesn't contain true then jump to the end of the script, which will print out the line read (/true/!b).
Otherwise the line contains true:
If it isn't the last line ($!) then• read the next line (N).• If the next line doesn't consist of apple (/\napple$/!) then insert the apple between two lines (s/\n/&apple&/).• Print out the pattern space (p) and start a new cycle (d)
Otherwise it is the last line (and contains true)
Append apple (a\ apple)
Edit:
The above sed script won't work properly if two consecutive true line occurs in the file, as pointed out by #potong. The version below should fix this, if I haven't overlooked something.
sed -i.backup ':a
/true/!b
a\
apple
n
/^apple$/d
ba
' file.txt
Explanation:
/true/!b: If the line doesn't contain true, no further processing is required. Jump to the end of the script. This will print the current pattern space.
a\ apple: Otherwise, the line contains true. Append apple.
n: Print the current pattern space and appended line (apple) and replace the pattern space with the next line. This will end the script if no next line available.
/^apple$/d: If the line read consists of string apple then delete it and start a new cycle (because it is already appended before)
ba: Jump to the start of the script (label a) without reading an input line.
There is no general solution for sed unless the file is sorted. If sorted, the following deletes the duplicate lines:
sed '$!N; /^\(.*\)\n\1$/!P; D'
This was taken from this link: https://www.unix.com/shell-programming-and-scripting/146404-command-remove-duplicate-lines-perl-sed-awk.html
Great answer by M. Nejat Aydin but to make things simpler just add grep:
grep -q apple file.txt || sed -i '/true/a\apple' file.txt
This might work for you (GNU sed):
sed -e ':a;/true/!b;$a apple' -e 'n;/apple/b;i apple' -e 'ba' file
If a line does not contain true just print it.
Otherwise, if it is the last line, append the line apple.
Otherwise, print that line and fetch the next.
If that line contains apple just print it.
Otherwise, insert a line apple and jump to the first sed instruction since the fetched line might be one containing true.
N.B. This uses both the a command (for end of file condition) and the i command for when there is a following line.

Remove whitespaces till we find comma, but this should start skipping first comma in each line of a file

I am in the learning phase of sed and awk commands, trying some complicated logic but couldn't get solution for the below.
File contents:
This is apple,apple.com 443,apple2.com 80,apple3.com 232,
We talk on 1 banana,banana.com 80,banannna.com 23,
take 5 grape,grape5.com 23,
When I try with
$ cat sample.txt | sed -e 's/[[:space:]][^,]*,/,/g'
,apple.com,apple2.com,apple3.com,
,banana.com,banannna.com,
,grape5.com,
is ok but I want to skip this sed for the first comma in each line, so expected output is
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
Any help is appreciated.
If you are using GNU sed, you can do something like
sed -e 's/[[:space:]][^,]*,/,/2g' file
where the 2g specifies something like start the substitution from the 2nd occurrence and g for doing it subsequently to the rest of the occurrences.
The output for the above command.
sed -e 's/[[:space:]][^,]*,/,/2g' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
An excerpt from the man page of GNU sed
g
Apply the replacement to all matches to the regexp, not just the first.
number
Only replace the numberth match of the regexp.
awk '{gsub(/[ ]+/," ")gsub(/com [0-9]+/,"com")}1' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
The first gsub removes extra space and the next one takes away unwanted numbers between com and comma.

Substituting everything except an ID with sed

I want to keep the first id and remove everything afterwards with sed.
My line looks like
CAM_READ_0623233309 /library_id=CAM_LIB_002149 /sample_id=CAM_SMPL_003380 raw_id=G9ALM7U02F5HAW length=383 /IP_notice=?This genetic information downloaded from CAMERA may be considered to be part of the genetic patrimony of Denmark, the country from which the sample was obtained. Users of this information agree to: 1) acknowledge Denmark as the country of origin in any country where the genetic information is presented and 2) contact the CBD focal point identified on the CBD website (http://www.cbd.int/countries/) if they intend to use the genetic information for commercial purposes.?
and I just want :
CAM_READ_06232333
Capturing specific sequence:
sed -r 's/.*(CAM_READ_[0-9]+).*/\1/' input.txt
or
sed -e 's/.*\(CAM_READ_[0-9]\+\).*/\1/' input.txt
Capturing everything at the front, except whitespace characters:
sed -r 's/^(\S+).*/\1/' input.txt
Nice and easy sed statement:
sed 's/ .*$//'
s substitute
/ .*$/ match everything after the first space in the line
/ replace it with nothing
Command example:
echo "CAM_READ_0623233309 /library_id=CAM_LIB_002149 blah blah" | sed 's/ .*$//'
Command example output:
CAM_READ_0623233309
Now, of course, if you have multiple different types of lines within the same file that you're dealing with this will not work for you. But, your question above does not indicate this.

grep or awk - how to return line if column 1 and 3 have the same value

I have a tab delimited file and I want the output to have the entire line in my file if values in column 1 are the same as the values in column 3. Having very limited knowledge in perl and linux, this is as close as I came to a solution.
File example
Apple Sugar Apple
Apple Butter Orange
Raisins Flour Orange
Orange Butter Orange
The results would be:
Apple Sugar Apple
Orange Butter Orange
Code:
#!/bin/sh
awk '{
prev=$0; f1=$1; f3=$3;
getline
if ($1 == $3) {
print prev
print
}'
} myfilename
I am sure that there is an easier solution to it. Maybe even a grep or awk on the command line. But that was the only code I could find that seemed to give me my solution.
Thanks!
It's easy with awk:
awk '$1 == $3' myfile
The default action is to print out the record, so if fields 1 and 3 are equal, that's what will happen.
Using awk
awk is the tool for the job:
awk '$1 == $3'
If your fields in the data are strictly tab separated and may contain blanks, then you will need to specify the field separator explicitly:
awk -F'\t' '$1 == $3'
(where the The \t represents a tab; you may have to type Tab (or even Control-VTab) to get it into the string).
Using grep
You can do it with grep, but you don't want to do it with grep:
grep -E '([A-Za-z]+)\t[A-Za-z]+\t\1'
The key part of the regex is the \1 which means 'the same value as the first captured string.
You might even go through gyrations like this in bash:
grep -E $'([A-Za-z]+)\t[A-Za-z]+\t\\1'
You could simplify life by noting (assuming) there are no spaces within fields:
grep -E '([A-Za-z]+)[[:space:]]+[A-Za-z]+[[:space:]]+\1'
As noted in one of the comments, I didn't put a $ at the end of the search pattern; it would be feasible (though the data would have to be cleaned up to contain tabs and drop trailing blanks), so that 'Good Noise GoodBad' would not be picked up. There are other ways to do it, and you can make the regex more and more complex to handle more possible situations. But those only go to emphasize that the awk solution is better; awk deals with the details automatically.
Using grep:
grep -P "([^\t]+)\t[^\t]+\t\1" inFile

looking a way to extract pattern from a text file in linux

I am using linux and bash. I have a text file with the context generated in run time by other program. The length, number of lines and content of the text file changed from time to time. But there is some pattern unchanged in the text, one typical example is
123098230984LD#2e3
123098230984LD#23234
XER_3424324_23424
33: 34: 35: node:9-72-1408 &82 &34
$1231313
*3435322
link to port:323
3424242424LD#2234
332424LD#23424234
Here, I want to extract the pattern "node:NUMBER-NUMBER-NUMBER" and "port:NUMBER" but where it occurs in the text varied from time to time too. Now I manually extract the information. I am wondering if there is any way to extract the information automatically. What make it really difficult is the content change every time when the file generated.
You can use sed to extract the desired fields by getting rid of the undesired bits:
pax> echo 'junk node:9-72-1408 more junk port:323 last junk'
| sed -E 's/^.*(node:[0-9]+-[0-9]*-[0-9]*).*(port:[0-9]+).*$/\1 \2/'
node:9-72-1408 port:323
The .* bits simply represent any junk and the parentheses are used to "capture" the matching text so it can be used in the replacement (as \1 and \2).
Sidebar:
If your version of sed doesn't support -E for extended regexes, it may support -r, as with certain versions of GNU sed.
Otherwise, you'll need to escape the parentheses and + characters:
pax> echo 'junk node:9-72-1408 more junk port:323 last junk'
| sed 's/^.*\(node:[0-9]\+-[0-9]\+-[0-9]\+\).*\(port:[0-9]\+\).*$/\1 \2/'
node:9-72-1408 port:323
The source code for GNU sed contains this little snippet:
/* Undocumented, for compatibility with BSD sed. */
case 'E':
case 'r':
but this appears to have been introduced in 4.2 (i.e., it's in 4.2 but not in 4.1.5, the last of the 4.1 series). See here for details.
And, if you need the actual values in variables, you can use something like:
pax> inpstr='junk-here node:9-72-1408 more-junk port:323 last-junk'
pax> node=$(echo "$inpstr" | sed -E 's/^.*node:([0-9]+-[0-9]+-[0-9]+).*$/\1/')
pax> port=$(echo "$inpstr" | sed -E 's/^.*port:([0-9]+).*$/\1/')
pax> echo $inpstr
junk-here node:9-72-1408 more-junk port:323 last-junk
pax> echo $node
9-72-1408
pax> echo $port
323
(taking into account the earlier comments about using -r or adding extra escaping for "lesser" sed implementations).