Sed: Print lines between string and another string in one line

Sed: Print lines between string and another string in one line - sed

I have 100 html files in a directory
I need to print a line from each file that matches a regex and at the same time print the lines between 2 regex.
The commands below provide the results, correctly
sed -n '/string1/p' *.html >result.txt
sed -n '/string2/,/string3/p' *.html > result2.txt
but I need them in one result.txt file, in the format
string1
string2
string3
I have been trying with grep, awk and sed and have searched but I have not found the answer.
Any help would be appreciated.

This might work for you:
sed -n '/string1/p;/string2/;/string3/p' INPUTFILE > OUTPUTFILE
Or here's an awk solution:
awk '/string1/ { print }
/srting2/ { print ; p = 1 }
p == 1 { print }
/string3/ { print ; p = 0 }' INPUTFILE > OUTPUTFILE

Simply put both SED epressions in one invocation:
echo $'a\nstring1\nb\nstring2\nc\nstring3\nd\n' | \
sed -n -e '/string1/p' -e '/string2/,/string3/p'
Input is:
a
string1
b
string2
c
string3
d
Output is:
string1
string2
c
string3

Related

sed - add new line with matched groups value

I have input text with a pattern '([\w_]+TAG) = "\w+";', and if matched, then append a new line with matched group string like 'found \1'. for example:
input text:
abcTAG = "hello";
efgTAG = "world";
expected output:
abcTAG = "hello";
found abcTAG
efgTAG = "world";
found efgTAG
I try below sed command but not work:
sed -E '/(\w+TAG) = "\w+";/a found \1' a.txt
Current output:
abcTAG = "hello";
found 1
efgTAG = "world";
found 1

You cannot use the backreference \1 in a command. Please try instead:
sed -E 's/(\w+TAG) = "\w+";/&\nfound \1/' a.txt
Output:
abcTAG = "hello";
found abcTAG
efgTAG = "world";
found efgTAG
Please note it assumes GNU sed which supports \w and \n.
[Edit]
If you want to match the line endings with the input file, please try:
sed -E 's/(\w+TAG) = "\w+";(\r?)/&\nfound \1\2/' a.txt

using command line tools to extract and replace texts for translations

For an application, I have a language file in the way
first_identifier = English words
second_identifier = more English words
and need to translate it to further languages. In a first step I'm required to extract the right side of those texts resulting in a file like ...
English words
more English words
... How can I archive that? Using grep maybe?
Next I'd use a translation tool and receive something like
German words
more German words
that need to be inserted in the first file again (replace English words with Germans) now. I was thinking about using sed maybe, but I don't know how to use it for this purpose. Or, do you have other recommendations?

To do it as you describe would be:
$ cat tst.sh
#!/usr/bin/env bash
tmp=$(mktemp) || exit 1
trap 'rm -f "$tmp"; exit' 0
sed 's/[^ =]* = //' "${#:--}" > "$tmp" &&
tr 'a-z' 'A-Z' < "$tmp" |
awk '
BEGIN { OFS = " = " }
NR == FNR {
ger[NR] = $0
next
}
{
sub(/ = .*/,"")
print $0, ger[FNR]
}
' - "$tmp"
$ ./tst.sh file
English words = ENGLISH WORDS
more English words = MORE ENGLISH WORDS
but you don't need a temp file for that:
$ cat tst.sh
#!/usr/bin/env bash
sed 's/[^ =]* = //' "$#" |
tr 'a-z' 'A-Z' |
awk '
BEGIN { OFS = " = " }
NR == FNR {
ger[NR] = $0
next
}
{
sub(/ = .*/,"")
print $0, ger[FNR]
}
' - "$#"
$ ./tst.sh file
first_identifier = ENGLISH WORDS
second_identifier = MORE ENGLISH WORDS
and I think this might be what you really want anyway so your translation tool can translate 1 line at a time instead of the whole input at once which might produce different results:
$ cat tst.sh
#!/usr/bin/env bash
while IFS= read -r line; do
id="${line%% = *}"
eng="${line#* = }"
ger="$(tr 'a-z' 'A-Z' <<<"$eng")"
printf '%s = %s\n' "$id" "$ger"
done < "${#:--}"
$ ./tst.sh file
first_identifier = ENGLISH WORDS
second_identifier = MORE ENGLISH WORDS
Just replace tr 'a-z' 'A-Z' < "$tmp" or tr 'a-z' 'A-Z' <<<"$eng" with the call to whatever translation tool you have in mind.

How to transform file (using SED)?

I have a file (10k lines) in format:
line_number string
i.e.
1 string1
2 string2
...
10000 string10000
How to transform it to format like this
the_same_constant_for_all_lines string
i.e.
101 string1
101 string2
...
101 string10000
It is file in Windows but I can use SED, may be it is easier.

If you want to use sed, try this
sed 's/^\([0-9]*\) \(.*\)/101 \2/g' <file>

try this
awk '{print "101 " $2}' file
or you can use this
awk '{for (i=2; i<=NF; i++) print "101 " $i}' file

SED code for removing newline

I am looking for sed command which will transform following line:
>AT1G01020.6 | ARV1 family protein | Chr1:6788-8737 REVERSE LENGTH=944 | 201606
AGACCCGGACTCTAATTGCTCCGTATTCTTCTTCTCTTGAGAGAGAGAGAGAGAGAGAGA
GAGAGAGAGCAATGGCGGCGAGTGAACACAGATGCGTGGGATGTGGTTTTAGGGTAAAGT
CATTGTTCATTCAATACTCTCCGGGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAG
TGTGAACGCATGATTATTTTCATCGATTTAATCCTTCACAGACCAAAGGTATATAGACAC
into
>AT1G01020.6 | ARV1 family protein | Chr1:6788-8737 REVERSE LENGTH=944 | 201606
AGACCCGGACTCTAATTGCTCCGTATTCTTCTTCTCTTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGCAATGGCGGCGAGTGAACACAGATGCGTGGGATGTGGTTTTAGGGTAAAGTCATTGTTCATTCAATACTCTCCGGGGAAATTGCAAGGAAGTAGCAGATGAGTACATCGAGTGTGAACGCATGATTATTTTCATCGATTTAATCCTTCACAGACCAAAGGTATATAGACAC
which means newline after > this character will remain unchanged, while on other cases newlines will be joined.
I have tried with the following line, but it is not working:
sed s/^!>\n$// <in.fasta>out.fasta
I have a 28MB fasta file which I need to transform.

sed is not a particularly good tool for this.
awk '/^>/ { if(prev) printf "\n"; print; next }
{ printf "%s", $0; prev = 1; }
END { if(prev) printf "\n" }' in.fasta >out.fasta

Using awk:
awk '/^>/{print (l?l ORS:"") $0;l="";next}{l=l $0}END{print l}' file
The line is printed if a > or the end of the file is reached, otherwise the line is buffered in the variable l.

Following awk may also help you here. Without using any array or variable's values solution.
awk 'BEGIN{ORS=""} /^>/{if(FNR==1){print $0 RS} else {print RS $0 RS};next}1' Input_file
OR
awk 'BEGIN{ORS=""} /^>/{printf("%s",FNR==1?$0 RS:RS $0 RS);next}1' Input_file

Unix - Removing everything after a pattern using sed

I have a file which looks like below:
memory=500G
brand=HP
color=black
battery=5 hours
For every line, I want to remove everything after = and also the =.
Eventually, I want to get something like:
memory:brand:color:battery:
(All on one line with colons after every word)
Is there a one-line sed command that I can use?

sed -e ':a;N;$!ba;s/=.\+\n\?/:/mg' /my/file
Adapted from this fine answer.
To be frank, however, I'd find something like this more readable:
cut -d = -f 1 /my/file | tr \\n :

Here's one way using GNU awk:
awk -F= '{ printf "%s:", $1 } END { printf "\n" }' file.txt
Result:
memory:brand:color:battery:
If you don't want a colon after the last word, you can use GNU sed like this:
sed -n 's/=.*//; H; $ { g; s/\n//; s/\n/:/g; p }' file.txt
Result:
memory:brand:color:battery

This might work for you (GNU sed):
sed -i ':a;$!N;s/=[^\n]*\n\?/:/;ta' file

perl -F= -ane '{print $F[0].":"}' your_file
tested below:
> cat temp
abc=def,100,200,dasdas
dasd=dsfsf,2312,123,
adasa=sdffs,1312,1231212,adsdasdasd
qeweqw=das,13123,13,asdadasds
dsadsaa=asdd,12312,123
> perl -F= -ane '{print $F[0].":"}' temp
abc:dasd:adasa:qeweqw:dsadsaa:

My command is
First step:
sed 's/([a-z]+)(\=.*)/\1:/g' Filename |cat >a
cat a
memory:
brand:
color:
battery:
Second step:
sed -e 'N;s/\n//' a | sed -e 'N;s/\n//'
My output is
memory:brand:color:battery: