Add lines not beginning with specific regex to previous line - sed

sms;deliver;"+99999999999";"";"";"2012.06.23 09:21";"";"xxxxxxxxxxxxx
xxxxxxxx,
xxxxxxxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxx"
I need any lines that don't begin with "sms;deliver;" to be added to the previous line. i.e to get such a line :
sms;deliver;"+99999999999";"";"";"2012.06.23 09:21";"";"xxxxxxxxxxxxx
xxxxxxxx, xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx"
^ That is a single line. Also it would be helpful to remove/replace any double quotes in the xxxxx(content) part.
sms;deliver;"+99999999999";"";"";"2012.06.23 09:21";"";"xxxxxxxxxxxxx xxxxxxxx, xxxxxxxxxxxx xxxxxxxxxxxx "xxxx"xxxxxxxx"
So the above line would get converted to this(double quotes converted to single quotes):
sms;deliver;"+99999999999";"";"";"2012.06.23 09:21";"";"xxxxxxxxxxxxx xxxxxxxx, xxxxxxxxxxxx xxxxxxxxxxxx 'xxxx'xxxxxxxx"

The following sed command seems to do what you need (edited: a short sed command in the beginning to filter quotes):
sed '/^sms;deliver;/!'"y/\"/'/" yourfile | sed -n '/^sms;deliver;/!b;:r;${p;b};N;/\nsms;deliver;/!{s/\n//;br};P;s/.*\n//;br'
A short explanation:
sed -n '# not print by default
/^sms;deliver;/!b # if line not starting with the pattern, goto end
:r #label r
${p;b} # if last line, print & exit
N # read new line, append to pattern space
/\nsms;deliver;/!{s/\n//;br} # if appended line doesn't start with pattern,
# remove newline & goto r
P # print everything up to the newline
s/.*\n//;br # remove what was just printed, goto r'
The sed in the beginning only changes " to ' when it's not on a line with sms;delivered;

This might work for you:
sed ':a;$!N;/\nsms;deliver;/!s/\n//;ta;:b;s/\(;".*\)"\([^";]*\)"\([^";]*"\)$/\1'\''\2'\''\3/;tb;P;D' file
EDIT:
Test data for "'s issue:
echo 'sms;deliver;"+99999999999";"";"";"2012.06.23 09:21";"";"xxxxxxxxxxxxx xxxxxxxx, xxxxxxxxxxxx xxxxxxxxxxxx "xxxx"xxxxxxxx"' >/tmp/a
sed ':a;$!N;/\nsms;deliver;/!s/\n//;ta;:b;s/\(;".*\)"\([^";]*\)"\([^";]*"\)$/\1'\''\2'\''\3/;tb;P;D' /tmp/a
sms;deliver;"+99999999999";"";"";"2012.06.23 09:21";"";"xxxxxxxxxxxxx xxxxxxxx, xxxxxxxxxxxx xxxxxxxxxxxx 'xxxx'xxxxxxxx"
sed 's/xx/"&"/g' /tmp/a >/tmp/b
sed ':a;$!N;/\nsms;deliver;/!s/\n//;ta;:b;s/\(;".*\)"\([^";]*\)"\([^";]*"\)$/\1'\''\2'\''\3/;tb;P;D' /tmp/b
sms;deliver;"+99999999999";"";"";"2012.06.23 09:21";"";"'xx''xx''xx''xx''xx''xx'x 'xx''xx''xx''xx', 'xx''xx''xx''xx''xx''xx' 'xx''xx''xx''xx''xx''xx' ''xx''xx'''xx''xx''xx''xx'"

Related

sed output first match only between brackets

using sed, i would like to extract the first match between square brackets.
i couldn't come up with a matching regex, since it seems that sed is greedy in its regex. for instance, given the regex \[.*\] - sed will match everything between the first opening bracket and the last closing bracket, which is not what i am after (would appreciate your help on this).
but until i could come up with a regex for that, i made an assumption that there must be a space after the closing bracket, to come up with a regex that will let me continue my work \[[^ ]*\].
i have tried it with grep, e.g.
$ echo '++ *+ ++ + [SPAM] foo(): z.y.o ## [x.y.z]----- ' | grep -oE '\[[^ ]*\]'
[SPAM]
[x.y.z]
i would like to use the regex in sed (not in grep) and output the first match (i.e. [SPAM]). i have tried it as follows, but wasn't able to do that
$ echo '++ *+ ++ + [SPAM] foo(): z.y.o ## [x.y.z]----- ' | sed 's/\[[^ ]*\]/\1/'
sed: 1: "s/\[[^ ]*\]/\1/": \1 not defined in the RE
$ echo '++ *+ ++ + [SPAM] foo(): z.y.o ## [x.y.z]----- ' | sed 's/\(\[[^ ]*\]\)/\1/'
++ *+ ++ + [SPAM] foo(): z.y.o ## [x.y.z]-----
would appreciate if you could assist me in:
constructing a regex to match all text between every opening and closing square brackets (see grep example above)
use the regex in sed and output only the first occurrence of the match
You can use this sed:
s='++ *+ ++ + [SPAM] foo(): z.y.o ## [x.y.z]----- '
sed -E 's/[^[]*(\[[^]]*\]).*/\1/' <<< "$s"
[SPAM]
Here:
[^[]* match 0 or more of any non-[ character
(\[[^]]*\]) matches a [...] substring and captures in group #1
.* matches rest of the string till end
\1 in substitution puts value captured in group #1 back in output
An awk solution would be nice as well:
awk 'match($0, /\[[^]]*\]/){print substr($0, RSTART, RLENGTH)}' <<< "$s"
[SPAM]
You can use
grep -o '\[[^][]*]' <<< "$text"
sed -n 's/^[^[]*\(\[[^][]*]\).*/\1/p' <<< "$text"
See the online demo. Details:
grep -o '\[[^][]*]' - outputs only matching substrings that meet the pattern: [, then zero or more chars other than [ and ], and then a ] char
sed -n 's/^[^[]*\(\[[^][]*]\).*/\1/p':
-n - suppresses default line output
^[^[]*\(\[[^][]*]\).* - matches start of string, then zero or more chars other than [, then captures into Group 1 a [, then any zero or more chars other than [ and ] and then a ] char, and then matches the rest of the string
\1 - replaces the match with Group 1 value
p - prints the result of the replacement.

Extract substrings between strings

I have a file with text as follows:
###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###
I want to extract all strings between ### .
My desired output would be something like this:
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
I have tried the following:
grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'
This almost works but only seems to grab the first instance per line, so the first line in my output only grabs
interest1 moreinterest1
rather than
interest1 moreinterest1
interest2
Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:
awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
Here is an alternative grep + sed solution:
grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'
This assumes there are no # characters in between ### markers.
With GNU awk for multi-char RS:
$ awk -v RS='###' '!(NR%2)' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
You can use pcregrep:
pcregrep -o1 '###(.*?)###' file
The regex - ###(.*?)### - matches ###, then captures into Group 1 any zero o more chars other than line break chars, as few as possible, and ### then matches ###.
o1 option will output Group 1 value only.
See the regex demo online.
sed 't x
s/###/\
/;D; :x
s//\
/;t y
D;:y
P;D' file
Replacing "###" with newline, D, then conditionally branching to P if a second replacement of "###" is successful.
This might work for you (GNU sed):
sed -n 's/###/\n/g;/[^\n]*\n/{s///;P;D}' file
Replace all occurrences of ###'s by newlines.
If a line contains a newline, remove any characters before and including the first newline, print the details up to and including the following newline, delete those details and repeat.

How to extract a specific character inside a parentheses using sed command?

I want to extract an atomic symbols inside a parentheses using sed.
The data I have is in the form C(X12), and I only want the X symbol
EX: that a test command :
echo "C(Br12)" | sed 's/[0-9][0-9])$//g'
gives me C(Br.
You can use
sed -n 's/.*(\(.*\)[0-9]\{2\})$/\1/p'
See the online demo:
sed -n 's/.*(\(.*\)[0-9]\{2\})$/\1/p' <<< "c(Br12)"
# => Br
Details
-n - suppresses the default line output
.*(\(.*\)[0-9]\{2\})$ - a regex that matches
.* - any text
( - a ( char
\(.*\) - Capturing group 1: any text up to the last....
[0-9]\{2\} - two digits
)$ - a ) at the end of string
\1 - replaces with Group 1 value
p - prints the result of the substitution.
For example:
echo "C(Br12)" | sed 's/C(\(.\).*/\1/'
C( - match exactly literally C(
. match anything
\(.\) - match anythig - one character- and "remember" it in a backreference \1
.* ignore everything behind it
\1 - replace it by the stuff that was remembered. The first character.
Research sed, regex and backreferences for more information.
Try using the following command
echo "C(BR12)" | cut -d "(" -f2 | cut -d ")" -f1 | sed 's/[0-9]*//g'
The cut tool will split and get you the string in middle of the paranthesis.Then pass the string to a sed for replacing the numbers inside the string.
Not a fully sed solution but this will get you the output.

delete string for each line with sed

My file contains x number of lines, I would like to remove the string before and after the reference string at the beginning and end of each line.
The reference string and string to remove are separated by space.
The file contains :
test.user.passs
test.user.location
global.user
test.user.tel
global.pass
test.user.email string_err
#ttt...> test.user.car ->
test.user.address
è_ 788 test.user.housse
test.user.child
{kl78>&é} global.email
global.foo
test.user.foo
How to remove the string at the start of each line which contain "test" string and also the end of each line separated by space or tab with sed?
The desired result is :
test.user.passs
test.user.location
global.user
test.user.tel
global.pass
test.user.email
test.user.car
test.user.address
test.user.housse
test.user.child
{kl78>&é} global.email
global.foo
test.user.foo
I interpret your question as: find the first word that is "word characters and at least one dots"
Tcl:
echo '
set fh [open [lindex $argv 1] r]
while {[gets $fh line] != -1} {puts [regexp -inline {\w+(?:\.\w+)+} $line]}
' | tclsh - file
sed
sed -r 's/.*\<([[:alpha:]]+(\.[[:alpha:]]+)).*/\1/' file
perl
perl -nE '/(\w+(\.\w+)+)/ and say $1' file
using sed like
sed -r 's/^[^ ]+[ ]+([^ ]+)[ ]+[^ ]*/\1/' file
This might work for you (GNU sed):
sed -r 's/.*(test\S+).*/\1/' file

Wrap each line in a text file in apostrophes and add comma to end of lines

My actual text document contains the following lines.
san.20140226.sbc.UTM
san.201402261.UTM
san.2014022613.UTM
I want the below output:
'san.20140226.sbc.UTM',
'san.201402261.UTM',
'san.2014022613.UTM',
You could try this sed command,
sed "s/.*/'&',/g" file
Example:
$ echo 'san.20140226.sbc.UTM' | sed "s/.*/'&',/g"
'san.20140226.sbc.UTM',
OR
$ echo 'san.20140226.sbc.UTM' | sed "s/^/'/;s/$/',/"
'san.20140226.sbc.UTM',
^ matches the start of a line and $ matches the end of a line.