I am using following command to append string after AMP, but now I want to add after to AMP which is after SET2 or line number 9, can we modify this command to append the string only after SET2 or line number 9? And if I want to add to only to SET1 AMPs or before line number 9 , could someone help me with the command, thanks.
$ sed -i '/AMP/a Target4' test.txt
$ cat test.txt
#SET1
AMP
Target 1
Target 2
AMP
Target 3
Target 4
Target 5
#Set2
AMP
Target 11
Target 12
Note there is no line between above text.
Would you please try the following:
sed -i '
/^#Set2/,${ ;# if the line starts with "#Set2", execute the {block} until the last line $
/AMP/a Target4 ;# append the string after "AMP"
} ;# end of the block
' test.txt
If you want to append the string before the #Set2 line, please try:
sed -i '
1,/^#Set2/ { ;# excecute the {block} while the line number >= 1 until the line matches the pattern /^#Set2/
/AMP/a Target4
}
' test.txt
The expression address1,address2 is a flip-flop operator. Once the
address1 (line number, regular expression, or other condition) meets,
the operator keeps on returning true until the address2 meets.
Then the following command or block is executed from address1 until
address2.
If you want to add to after AMP which is after #Set2 or line number 9,
I think it is better to process up to the 8th line and after the 9th line separately.
For example, the command is below:
sed '
1,8{
/^#Set2/,${
/AMP/a Target4
}
}
9,${
/AMP/a Target4
}' test.txt
This question already has answers here:
Remove Left and right square brackets using sed/bash
(1 answer)
Remove a pattern using sed which has square brackets and quotes
(2 answers)
How can I use sed to delete line with square brackets?
(2 answers)
Closed 1 year ago.
How can I remove the line "name['todo']['remove'] = 3456" from a text file?
[test.txt]
name['myname']['test'] = 12
name['todo']['remove'] = 3456
name['todo']['remove']['inspection'] = 34
My current approach is not working as expected. The line is still in my file.
sed -i "name\['todo'\]\['remove'\]" test.txt
The error message is "sed: -e expression #1, char 2: extra characters after command"
A simple grep -vF would work fine here that matches using fixed string without requiring escaping of special regex characters:
grep -ivF "name['todo']['remove'] " file
[test.txt]
name['myname']['test'] = 12
name['todo']['remove']['inspection'] = 34
You can use
sed -i "/name\['todo']\['remove'] =/d" test.txt
Note that the pattern is wrapped with / regex delimiters, and the d means the matched line will get removed.
See an online demo:
s="[test.txt]
name['myname']['test'] = 12
name['todo']['remove'] = 3456
name['todo']['remove']['inspection'] = 34"
sed "/name\['todo']\['remove'] =/d" <<< "$s"
yielding
[test.txt]
name['myname']['test'] = 12
name['todo']['remove']['inspection'] = 34
If you want to make sure you only match a whole line with digits after =, you may use "/^name\['todo']\['remove'] = [0-9]*$/d" command with sed.
I have a file with multiple lines; but a specific line contains tons of information, with several repeated expressions. I'm trying to extract some specific values. I first tried some commands with sed, for instance, but with no success. So, I was wondering if you could give me some insights.
So, here you have one fraction of the unique line of the given document I mentioned:
[...]6[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01},DLOOP.rate_median=0.04131395026396427,length=
[...]
10[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61},DLOOP.rate_median=0.04131395026396427,length=
[...]
My aim here is first to extract all the values that is between the brackets, after "habitat.set.prob={". and put them in a single line in a text file.
Also, it would be important to extract the numbers that appears just before the expression "[&length_range=]", which in this case are "6" and "10". They are the label of the set of numbers after "prob={"
So the set of numbers I want to extract always appears between "habitat.set.prob={" and "},DLOOP.rate_median", while the other number (the label) is always rigth before "[&length_range="; but what is before the label is not the same expression; actually it is a random number.
The goal then is end up with a file with the following characteristcs:
6 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
10 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
and so on …
What do you think? Is this possible?
I started with this very basic command at least to try to extract the set of numbers, but it didn't work
sed -n "/habitat.set.prob={/,/},DLOOP.rate_median=/ p"
| Well... I got some improvement.
I was able to get the values at least:
awk '{gsub("habitat.set.prob={","\n");printf"%s",$0}' filename | awk -F'},' '{print $1"}"}' | grep -iv "TREE" > stats.txt
|
Many thanks in advance.
Cheers,
Luiz
Something like that:
sed -rn '/.*[0-9]+\[&length_range=\{/,/habitat.set.prob=\{/{s/.*\b([0-9]+)\[&length_range.*/\1/p; s/.*habitat.set.prob=\{([^D]+)\},DLOOP.rate.*/\1/p}' habitat
6
0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01
10
0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
The first part '/.a./,/.b./' searches from pattern a to b, distributed over multiple lines. The -n told sed to do non-printing as default.
In '/.a./,/.b./{s/.c./.d./p; s/.e./.f./p}'
there are two substitution commands with p=print in curly braces.
I am not sure if you really digged a little, so not providing the complete answer, but let's hope this would help you:
for the first part: getting the no(which you call as label) you didn't mention if there is any specific pattern, so try this (data is the file which contains the actual input) - you need to work on how to get the number and tweak the RE a bit
sed -n 's/.*\([0-9][0-9]*\).*length_range.*/\1/p' data
For the other part which gives the numericals between habitat and DLOOP:
sed -n 's/.*habitat.set.prob=\(.*\),DLOOP.*/\1/pg' data | tr '{' ' ' | tr '}' ' '
Now, try to take this as a starter and work on your output to get your desired result!
To explain a bit:
In the first section - I am trying to capture the numericals between anything(.*) and (.*)length_range [you can escape the character [ and & by using \ in front of them]
In the second section: I am capturing pattern in between habitat.set.prob and DLOOP and then doin a tr to remove the brackets.
#include <iostream>
using namespace std;
int main()
{
string p = "1:2:3:4"; //input your string
int arr[4] = {}; //create a new empty integer array to put the integers in it
for(int i=0, j=0; i <p.length(); i++){//loop on the string to extract integers
if( p[i] == ':'){continue;}//if the value = ':' skip it and continue
arr[j]=(int)p[i]-48;j++;//put the integer in the array we created
}
cout << "String={"<<arr[0]<<" "<<arr[1]<<" "<<arr[2]<<" "<<arr[3]<<"}";//print the array
return 0;
}
I have extracted the following data using 'grep' & 'sed' pipes from a file and now I want to perform a mathematical equation on the last two numbers, delete them and replace them with a single number.
Mathematical operations
Add the numbers together
divide by 2
multiply by 141
ROUNDUP to whole number
File Data
AJ29 IO_0_VRN_10 77.234 78.011
AJ30 IO_L1P_T0_100M 89.886 90.789
AJ31 IO_L1N_T0_100S 101.388 102.406
AK29 IO_L2P_T0_101M 66.163 66.828
AL29 IO_L2N_T0_101S 63.626 64.266
So the line starting AJ29 should appear as:
AJ29 IO_0_VRN_10 10945
I could put it in MS excel / Open Office calc and do this but want to avoid MS and keep it in a single linux script if it is possible. Hope you can help. The script I have so far is below and ideally I'd like to add a few more pipes to achieve this.
grep IOB xc7vx690tffg1930.pkg | sed 's/pin//g' | sed 's/IOB_[A-Za-z0-9]*//g' | sed 's/ /-/g' | sed 's/\t//g' | sed 's/^[-]*//g' | sed 's/-/ /g' | sed 's/ [0-9][0-9] //g' | sed 's/[[:space:]]\+/,/g' | sed 's/,X[0-9A-Z]*,//g' | sed 's/,[0-9]*[A-Z],//g' | sed 's/N\.A\.,/,/g' | sed 's/,$//g' | sed 's/,/ /g'
For calculations, use awk!
$ awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 141); NF--}1' file
AJ29 IO_0_VRN_10 10945
AJ30 IO_L1P_T0_100M 12738
AJ31 IO_L1N_T0_100S 14367
AK29 IO_L2P_T0_101M 9376
AL29 IO_L2N_T0_101S 9016
This replaces the penultimate field with the result of (penultimate*last)/2 * 141). To make it round, we use %.0f format as indicated in Awk printf number in width and round it up.
Also, it looks to me that you are piping way too many things: I counted one call to grep and 13 (!) to sed. You can probably use sed -e 'first block' -e 'second block' ... instead.
Explanation
In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
{...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
$(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
{$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
{...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".
I need to format files for a miRNA-identifying tool (miREAP).
I have a fasta file in the following format:
>seqID_1
CCCGGCCGTCGAGGC
>seqID_2
AGGGCACGCCTGCCTGGGCGTCACGC
>seqID_3
CCGCATCAGGTCTCCAAGGTGAACAGCCTCTGGTCGA
>seqID_4
CCGCATCAGGTCTCCAAGGTGAACAGCCTCTGGTCGA
>seqID_5
CCGCATCAGGTCTCCAAGGTGAACAGCCTCTGGTCGA
>seqID_6
AGGGCACGCCTGCCTGGGCGTCACGC
I want to count the number of times each sequence occurs and append that number to the seqID line. The count for each sequence and an original ID referring to the sequence need only appear once in the file like this:
>seqID_1 1
CCCGGCCGTCGAGGC
>seqID_2 2
AGGGCACGCCTGCCTGGGCGTCACGC
>seqID_3 3
CCGCATCAGGTCTCCAAGGTGAACAGCCTCTGGTCGA
Fastx_collapser does the trick nearly as I'd like (http://hannonlab.cshl.edu/fastx_toolkit/index.html). However, rather than maintain seqIDs, it returns:
>1 1
CCCGGCCGTCGAGGC
>2 2
AGGGCACGCCTGCCTGGGCGTCACGC
>3 3
CCGCATCAGGTCTCCAAGGTGAACAGCCTCTGGTCGA
This means that the link between my sequence, seqID, and genome mapping location is lost. (Each seqID corresponds to a sequence in my fasta file and a genome mapping spot in a separate Bowtie2-generated .sam file)
Is there a simple way to do the desired deduplication at the command line?
Thanks!
linearize and sort/uniq -c
awk '/^>/ {if(N>0) printf("\n"); ++N; printf("%s ",$0);next;} {printf("%s",$0);} END { printf("\n");}' input.fa | \
sort -t ' ' -k2,2 | uniq -f 1 -c |\
awk '{printf("%s_%s\n%s\n",$2,$1,$3);}'
>seqID_2_2
AGGGCACGCCTGCCTGGGCGTCACGC
>seqID_1_1
CCCGGCCGTCGAGGC
>seqID_3_3
CCGCATCAGGTCTCCAAGGTGAACAGCCTCTGGTCGA