Extracting first and last name from contacts with sed

Extracting first and last name from contacts with sed - sed

So i have a few first and last names in this format
firstname=John
lastname=Smith
adress=...
firstname=Whatever
lastname=Random
adress=...
How would i extract it in firstname-lastname format?
John-Smith

With GNU sed:
sed -n '/^firstname=/{s///;h};/^lastname=/{s///;H;x;s/\n/-/;p}' file
Output:
John-Smith
Whatever-Random
The first part (/^firstname=/{s///;h}) copies the part right from firstname= to sed's hold space.
The longer second part (/^lastname=/{s///;H;x;s/\n/-/;p}) appends the part right from lastname= to sed's hold space. Hold space contains now "John" with a trailing newline and "Smith" (e.g.). Then it swaps (x) content of sed's hold space and its pattern space and replaces (with s/\n/-/) in pattern space the embedded newline (\n) by - and then prints (p) sed's pattern space.

Related

Sed command to delete "\" which causes "*** multiple target patterns. Stop." error

In a file, I'm having the lines like this -
a.lo a.o: abc/util.c \
/usr/lib/def.h
b.lo b.o: hash/imp.h \
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
high/scan_f.c
Here you can see one extra \ (back slash) at the end of line number 4 (/usr/lib/toy.c ). How can I use sed command to remove this / (back slash)? Because of this I'm getting "*** multiple target patterns. Stop." error.
P.S. - I'm having this extra \ (back slash) at multiple places in my file. So using sed to delete it by line number won't be feasible. Need something which can check for .lo .o and check a line before, if it finds a \ (back slash) remove it.

Maybe not the simplest but this should work:
sed -nE '${s/\\$//;p;};N;s/\\([^\\]*:)/\1/;P;D' input_file
The main idea is to concatenate input lines in the pattern space (a sed internal text buffer), such that it always contains 2 consecutive lines, separated by a newline character. We then just delete the last \ before a :, if any, print the first of the 2 lines and remove it from the pattern space before continuing with the next line.
sed commands are separated by semi-columns (;) and grouped with curly braces ({...}). They are optionally preceded by a line(s) specification, for instance $ that stands for the last line of the input. So, in our case, ${s/\\$//;p;} applies only to the last line while the rest (N;s/\\([^\\]*:)/\1/;P;D) applies to all lines.
The -n option suppresses the default output. We need this to control the output ourselves with the p (print) command.
The -E option enables the use of extended regular expressions.
Let's first explain the tricky part: N;s/\\([^\\]*:)/\1/;P;D. It is a list of 4 commands that are run for each line of the input because there is no line(s) specification before the commands.
When sed starts processing the input the pattern space already contains the first line (a.lo a.o: abc/util.c \ in your example). This is how sed works: by default it puts the current line in the pattern space, applies the commands and restarts with the next line.
N appends the next input line (/usr/lib/def.h) to the pattern space with a newline character as separator. The pattern space now contains:
a.lo a.o: abc/util.c \
/usr/lib/def.h
N also increments the current line number which becomes 2.
s/\\([^\\]*:)/\1/ deletes the last \ before the first : in the pattern space, if there is one. In our example the only \ is after the first :. The pattern space is not modified.
P prints the first part of the pattern space, up to the first newline character. In our example what is printed is:
a.lo a.o: abc/util.c \
D deletes the first part of the pattern space, up to the first newline character (what has just been printed). The pattern space contains:
/usr/lib/def.h
D also starts a new cycle but different from the normal sed processing, it does not read the next line and leaves the pattern space and current line number unmodified. So when restarting the pattern space contains line number 2 and the the current line number is still 2.
By induction we see that, each time sed restarts executing the list of commands, the pattern space contains the current line, as normal. When processing line number 4 of your example it contains:
/usr/lib/toy.c \
After N it contains:
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
And there, the substitution command (s/\\([^\\]*:)/\1/) matches and deletes the first \:
/usr/lib/toy.c
c.lo c.o: high/scan.c \
It is thus:
/usr/lib/toy.c
that is printed and removed from the pattern space. Exactly what you want.
The last line needs a special treatment. When we start processing it the pattern space contains:
high/scan_f.c
If we don't do anything special N does not change it (there is no next line to concatenate) and terminates the processing. The last line is never printed.
This is why another list of commands is needed, just for the last line: ${s/\\$//;p;}. It applies only to the last line because it is preceded by a line(s) specification ($ for last line). The first command in the list (substitute s/\\$//) removes a trailing \, if there is one. The second (p) prints the pattern space.
Note: if you know that the last line does not end with a trailing backslash you can simplify a bit:
sed -nE '$p;N;s/\\([^\\]*:)/\1/;P;D' input_file

I agree with #G.M. in general, but this will work.
sed captures text before trailing "\" (if present) on lines starting with "\" and prints only that text on those lines. All other text is also printed, of course
sed -e 's/\(.* \)\\$/\1/' input_file

The question is a bit unclear about how to identify the lines from which a trailing backslash should be removed, but inasmuch as the input looks like set of a makefile-format prerequisite lists from which some lines have been removed, I take the objective to be to remove backslashes where they appear after the last (remaining) prerequisite in a list. That requires looking ahead to the next line, so it will be helpful to make use of sed's hold space to store data while you look ahead at the next line to figure out what to do with it.
This would be a pretty robust solution for that problem:
sed -nE 's/\s*(\\){0,1}$/ \\/; :a; /:/ { x; s/\s*\\$//; p; d; }; H; $ { s/.*/:/; b a }' input
That builds up each prerequisite list in the hold space, with backslashes and newlines embedded, then dumps it when the next target list or the end of the input arrives.
Details:
the -n option turns off automatically printing the pattern space after each line
the -E option turns on extended regular expressions
the sed expression contains several sub-expressions, joined by semicolons:
s/\s*(\\){0,1}$/ \\/ : ensure that the current line in the pattern space ends with a space and backslash, without adding a second backslash to lines that already have one
:a : labels that point in the script 'a'
/:/ { x; s/\s*\\$//; p; d; } : on lines that contain a colon, swap the pattern and hold spaces, remove the trailing backslash from (the new contents of) the pattern space, print the result, then start the next cycle
H : (if control reaches this point) append a newline and the contents of the pattern space to the hold space
$ { s/.*/:/; b a } : on the last line of input trigger dumping the hold space by putting a colon in the pattern space and jumping to label 'a'
[end of expression] : read the next line into the pattern space and start over
Alternatively, it would more exactly follow your request, and avoid introducing a leading blank line, to do this:
sed -n ':a; /\\$/! { p; d; }; h; :b; $ { x; s/\\//; p; }; n; /:/ { x; s/\\$//; p; x; b a; }; H; /\\$/ b b; s/.*//; x; p' input
That also assembles pieces in the hold space before ultimately printing them, but it goes about it in a different way:
it starts (at label a) by checking whether the line in the pattern space ends with a backslash. If not (/\\$/!), then it prints the pattern space and starts the next cycle.
otherwise, it replaces the current contents of the hold space with the contents of the pattern space (which must already end with a backslash), then
(at label b) if the current line is the last then it retrieves the contents of the hold space, strips the trailing newline, and prints the result ($ { x; s/\\//; p; }). Either way,
it attempts to read the next input line, and terminates if there are no more (n).
if that results in the pattern space containing a colon within, then the contents of the hold space are printed, less trailing backslash, and control is sent back to label a to process the colon-containing line as a new first line (/:/ { x; s/\\$//; p; x; b a; }).
otherwise, a newline and the contents of the pattern space are appended to the hold space (H).
if the pattern space ends with a backslash then control branches back to label b to consider reading another line (/\\$/ b b).
otherwise, the hold space is printed and cleared (s/.*//; x; p), and
if there are any more lines then the next is read and a new cycle started.
That makes fewer assumptions about the nature of the input, but it is a bit more complicated.

Append specific caracter at the end of each line

I have a file and I want to append a specific text, \0A, to the end of each of its lines.
I used this command,
sed -i s/$/\0A/ file.txt
but that didn't work with backslash \0A.

In its default operations, sed cyclically appends a line from input, less it's terminating <newline>-character, into the pattern space of sed.
The OP wants to use sed to append the character \0A at the end of a line. This is the hexadecimal representation of the <newline>-character (cfr. http://www.asciitable.com/). So from this perspective, the OP attempts to double space a files. This can be easilly done using:
sed G file
The G command, appends a newline followed by the content of the hold space to the pattern space. Since the hold space is always empty, it just appends a newline character to the pattern space. The default action of sed is to print the line. So this just double-spaces a file.

Your command should be fixed by simply enclosing s/$/\0A/ in single quotes (') and escaping the backslash (with another backslash):
sed -i 's/$/\\0A/' file.txt
Notice that the surrounding 's protect that string from being processed by the shell, but the bashslash still needed escape in order to protect it from SED itself.
Obviously, it's still possible to avoid the single quotes if you escape enough:
sed -i s/$/\\\\0A/ file.txt
In this case there are no single quotes to protect the string, so we need write \\ in the shell to get SED fed with \, but we need two of those \\, i.e. \\\\, so that SED is fed with \\, which is an escaped \.
Move obviously, I'd never ever suggest the second alternative.

sed pattern matching, stop at first found character

Take the string "hello_world 1 2 3"
I want the output to be "hello_world"
My attempt is "s/\(.*\) .*/\1/g"
But I get "hello_world 1 2"
Instead of stopping at the first space after the sequence, it gets the last space on the line.
I want to take any length of characters \(.*\) followed by a space ' ' and remove anything that comes after it .*
How can I do it?

Could you please try following.
echo "hello_world 1 2 3" | sed 's/\([^ ]*\).*/\1/'
Explanation of above:
Using sed's capability of storing matched regex into a temp buffer. Which could be later accessed by variables like 1, 2 and so on(depending upon number of buffers you are mentioning).
In here we are capturing everything till occurrence of first space into 1st temp buffer and then keeping everything as it is .*. While substituting we are mentioning \1 here which means substitute whole line's value with first matched/stored value of 1st temp buffer(which is hello_world).
Why OP's code not working: Because OP using .* which is a greedy matched regex and capturing all the line in 1st buffer itself that's why when its used \1 its actually printing whole line there.

This might work for you (GNU sed):
sed 's/\s.*//' file
Matches the first white space character and everything thereafter and removes it, leaving whatever is in front of i.e. all non-white space characters.
Same as:
sed 's/^(\S+).*/\1/' -E file

gsed add _single_ newline at end of file

I'm attempting to create a single newline at the end of a file.
My command is this:
gsed -i '$a\\r' outfiles/*.txt
Somehow this creates two newlines, and I cannot figure out what I am doing wrong.
Any thoughts?

In my first thought I would on the last line substitute end of line with a newline.
sed '$s/$/\n/'
But my second thought is just nice:
sed '$G'
Grabbing from a hold space appends a newline to pattern space and then appends the hold space to pattern space. Because hold space is empty, it effectively adds just only the newline.

Keep it clear and simple, just use gawk:
gawk -i inplace 'ENDFILE{print ""}' outfiles/*.txt

Can I use the sed command to replace multiple empty line with one empty line?

I know there is a similar question in SO How can I replace mutliple empty lines with a single empty line in bash?. But my question is can this be implemented by just using the sed command?
Thanks

Give this a try:
sed '/^$/N;/^\n$/D' inputfile
Explanation:
/^$/N - match an empty line and append it to pattern space.
; - command delimiter, allows multiple commands on one line, can be used instead of separating commands into multiple -e clauses for versions of sed that support it.
/^\n$/D - if the pattern space contains only a newline in addition to the one at the end of the pattern space, in other words a sequence of more than one newline, then delete the first newline (more generally, the beginning of pattern space up to and including the first included newline)

You can do this by removing empty lines first and appending line space with G command:
sed '/^$/d;G' text.txt
Edit2: the above command will add empty lines between each paragraph, if this is not desired, you could do:
sed -n '1{/^$/p};{/./,/^$/p}'
Or, if you don't mind that all leading empty lines will be stripped, it may be written as:
sed -n '/./,/^$/p'
since the first expression just evaluates the first line, and prints it if it is blank.
Here: -n option suppresses pattern space auto-printing, /./,/^$/ defines the range between at least one character and none character (i.e. empty space between newlines) and p tells to print this range.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Extracting first and last name from contacts with sed - sed

So i have a few first and last names in this format firstname=John lastname=Smith adress=... firstname=Whatever lastname=Random adress=... How would i extract it in firstname-lastname format? John-Smith

Related

Sed command to delete "\" which causes "*** multiple target patterns. Stop." error

Append specific caracter at the end of each line

sed pattern matching, stop at first found character

gsed add _single_ newline at end of file

Can I use the sed command to replace multiple empty line with one empty line?

Categories

Resources