How to repeat a pattern and modify with a "tab" with sed - sed

My input looks like this separated by tabs):
Yadda yaddabla blubb_1234 extremlylongtext, with commata
awesomo sappa dwarf_775 extremlylongbutdifferenttext, with commata
The output should be:
Yadda yaddabla S23 blubb_1234 1234 extremlylongtext, with commata
awesomo sappa y5 dwarf_775 775 extremlylongbutdifferenttext, with commata
So I want to repeat only the Numbers after a "_" character seperated with a tab. Any suggestions? : )

sed 's/_\([[:digit:]]\{1,\}\)/_\1\t\1/g'
I have shown this with a \t indicating a tab in the output. If you're not using GNU sed, you may need to replace it with a literal tab.

awk solution for tab separated file
awk -F"\t" 'BEGIN{OFS="\t";}{$2 = gensub(/_([0-9]+)/,"_\\1\t\\1","g",$2);}1' temp.txt

Related

Extract substrings between strings

I have a file with text as follows:
###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###
I want to extract all strings between ### .
My desired output would be something like this:
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
I have tried the following:
grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'
This almost works but only seems to grab the first instance per line, so the first line in my output only grabs
interest1 moreinterest1
rather than
interest1 moreinterest1
interest2
Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:
awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
Here is an alternative grep + sed solution:
grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'
This assumes there are no # characters in between ### markers.
With GNU awk for multi-char RS:
$ awk -v RS='###' '!(NR%2)' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
You can use pcregrep:
pcregrep -o1 '###(.*?)###' file
The regex - ###(.*?)### - matches ###, then captures into Group 1 any zero o more chars other than line break chars, as few as possible, and ### then matches ###.
o1 option will output Group 1 value only.
See the regex demo online.
sed 't x
s/###/\
/;D; :x
s//\
/;t y
D;:y
P;D' file
Replacing "###" with newline, D, then conditionally branching to P if a second replacement of "###" is successful.
This might work for you (GNU sed):
sed -n 's/###/\n/g;/[^\n]*\n/{s///;P;D}' file
Replace all occurrences of ###'s by newlines.
If a line contains a newline, remove any characters before and including the first newline, print the details up to and including the following newline, delete those details and repeat.

How to replace # using sed c0mmand?

I have the following header :
#SRR1561197.1/1
#SRR1561197.2/1
#SRR1561197.3/1
#SRR1561197.4/1
I want to Add few letters after # and before SRR like this:
#MexD1SRR1561197.1/1
#MexD1SRR1561197.2/1
#MexD1SRR1561197.3/1
#MexD1SRR1561197.4/1
I tried:
sed 's/#/#MexD1/File,fastq > change.fastq
This results in empty file..
Use sed with the in file replacement option. The g at the end makes it global.
sed -i 's/#/#MexD1/g' file
To fix your code.
sed 's/#/#MexD1/g' File.fastq > change.fastq
You have to escape it: sed s/\#/\#MexD1/g source-file-name > change.fastq

divide each line in equal part

I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile

sed - comment a matching line and x lines after it

I need help with using sed to comment a matching lines and 4 lines which follows it.
in a text file.
my text file is like this:
[myprocess-a]
property1=1
property2=2
property3=3
property4=4
[anotherprocess-b]
property1=gffgg
property3=gjdl
property2=red
property4=djfjf
[myprocess-b]
property1=1
property4=4
property2=2
property3=3
I want to prefix # to all the lines having text '[myprocess' and 4 lines that follows it
expected output:
#[myprocess-a]
#property1=1
#property2=2
#property3=3
#property4=4
[anotherprocess-b]
property1=gffgg
property3=gjdl
property2=red
property4=djfjf
#[myprocess-b]
#property1=1
#property4=4
#property2=2
#property3=3
Greatly appreciate your help on this.
You can do this by applying a regular expression to a set of lines:
sed -e '/myprocess/,+4 s/^/#/'
This matches lines with 'myprocess' and the 4 lines after them. For those 4 lines it then inserts a '#' at the beginning of the line.
(I think this might be a GNU extension - it's not in any of the "sed one liner" cheatsheets I know)
sed '/\[myprocess/ { N;N;N;N; s/^/#/gm }' input_file
Using string concatenation and default action in awk.
http://www.gnu.org/software/gawk/manual/html_node/Concatenation.html
awk '/myprocess/{f=1} f>5{f=0} f{f++; $0="#" $0} 1' foo.txt
or if the block always ends with empty line
awk '/myprocess/{f=1} !NF{f=0} f{$0="#" $0} 1' foo.txt

Brocade alishow merge two consecutive lines awk sed

How would like to join two lines usung awk or sed?
For example, I have data like below:
abcd
12:12:12:12:12:12:12:12
efgh001_01
45:45:45:45:45:45:45:45
ijkl7464746
78:78:78:78:78:78:78:78
and I need output like below:
abcd 12:12:12:12:12:12:12:12
efgh001_01 45:45:45:45:45:45:45:45
ijkl7464746 78:78:78:78:78:78:78:78
Running this almost works, but I need the space or tab:
awk '!(NR%2){print$0p}{p=$0}'
You're almost there:
awk '(NR % 2 == 0) {print p, $0} {p = $0}'
With sed you can do that as follows:
sed -n 'N;s/\n/ /p' file
where:
N reads next line
s replaces the new line character with a space to join both lines properly
p prints the result
This might work for you:
sed '$!N;s/\n/ /' file
or this:
paste -sd' \n' file