Concatenate Lines in Bash - sed

Most command-line programs just operate on one line at a time.
Can I use a common command-line utility (echo, sed, awk, etc) to concatenate every set of two lines, or would I need to write a script/program from scratch to do this?
$ cat myFile
line 1
line 2
line 3
line 4
$ cat myFile | __somecommand__
line 1line 2
line 3line 4

sed 'N;s/\n/ /;'
Grab next line, and substitute newline character with space.
seq 1 6 | sed 'N;s/\n/ /;'
1 2
3 4
5 6

$ awk 'ORS=(NR%2)?" ":"\n"' file
line 1 line 2
line 3 line 4
$ paste - - < file
line 1 line 2
line 3 line 4

Not a particular command, but this snippet of shell should do the trick:
cat myFile | while read line; do echo -n $line; [ "${i}" ] && echo && i= || i=1 ; done

You can also use Perl as:
$ perl -pe 'chomp;$i++;unless($i%2){$_.="\n"};' < file
line 1line 2
line 3line 4

Here's a shell script version that doesn't need to toggle a flag:
while read line1; do read line2; echo $line1$line2; done < inputfile

Related

xargs and sed to extract specific lines

I want to extract lines that have a particular pattern, in a certain column. For example, in my 'input.txt' file, I have many columns. I want to search the 25th column for 'foobar', and extract only those lines that have 'foobar' in the 25th column. I cannot do:
grep foobar input.txt
because other columns may also have 'foobar', and I don't want those lines. Also:
the 25th column will have 'foobar' as part of a string (i.e. it could be 'foobar ; muller' or 'max ; foobar ; john', or 'tom ; foobar35')
I would NOT want 'tom ; foobar35'
The word in column 25 must be an exact match for 'foobar' (and ; so using awk $25=='foobar' is not an option.
In other words, if column 25 had the following lines:
foobar ; muller
max ; foobar ; john
tom ; foobar35
I would want only lines 1 & 2.
How do I use xargs and sed to extract these lines? I am stuck at:
cut -f25 input.txt | grep -nw foobar | xargs -I linenumbers sed ???
thanks!
Do not use xargs and sed, use the other tool common on so many machines and do this:
awk '{if($25=="foobar"){print NR" "$0}}' input.txt
print NR prints the line number of the current match so the first column of the output will be the line number.
print $0 prints the current line. Change it to print $25 if you only want the matching column. If you only want the output, use this:
awk '{if($25=="foobar"){print $0}}' input.txt
EDIT1 to match extended question:
Use what #shellter and #Jotne suggested but add string delimiters.
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' '$25~/foobar/' input.txt
[^ ]* matches all characters that are not a space.
'[^']*' matches everything inside single quotes.
EDIT2 to exclude everything but foobar:
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$25~/[;' ]foobar[;' ]/" input.txt
[;' ] only allows ;, ' and in front and after foobar.
Tested with this file:
1 "1 ; 1" 4
2 'kom foobar' 33
3 "ll;3" 3
4 '1; foobar' asd
7 '5 ;foobar' 2
7 '5;foobar' 0
2 'kom foobar35' 33
2 'kom ; foobar' 33
2 'foobar ; john' 33
2 'foobar;paul' 33
2 'foobar1;paul' 33
2 'foobarli;paul' 33
2 'afoobar;paul' 33
and this command awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$2~/[;' ]foobar[;' ]/" input.txt
To get the line with foobar as part of the 25 field.
awk '$25=="foobar"' input.txt
$25 25th filed
== equal to
"foobar"
Since no action spesified, print the complete line will be done, same as {print $0}
Or
awk '$25~/^foobar$/' input.txt
This might work for you (GNU sed):
sed -En 's/\S+/\n&\n/25;s/\n(.*foobar.*)\n/\1/p' file
Surround the 25th field by newlines and pattern match for foobar between newlines.
If you only want to match the word foobar use:
sed -En 's/\S+/\n&\n/25;s/\n(.*\<foobar\>.*)\n/\1/p' file

sed - Remove previous line and current line based on pattern

I want to delete the previous line and current line based on pattern match on next line
This is my sample.txt
This is test line 11
This is test line 999
This is test line 12
This is test line 13
This is test line 16
This is test line 999
This is test line 17
This is test line 18
I want to match for pattern 999 and delete both itself and previous line
I am trying this command but i get no output
sed -Ene ':a;N;/999/{d;}; ba; P' sample.txt
This might work for you (GNU sed):
sed 'N;/\n.*999/d;P;D' file
Open a running window of two lines throughout the length of the file.
If the second line of the window contains 999 delete both lines.
Otherwise, print the first line of the window, delete the first line and repeat.
An alternative solution for line 1 or 2 or more contiguous lines containing 999:
sed -n ':a;$!N;/\n.*999/{:b;n;/999/bb;ba};/999/!P;D' file
tried on gnu sed
sed -Ez 's/[^\n]*\n[^\n]*999\n//g' sample.txt
Could you please try following(if ok with awk).
awk 'prev && $NF!=999{print prev ORS FNR,$0;prev="";next} $NF==999{prev=""} $NF!=999{prev=FNR FS $0}' Input_file
Or if you have even number of lines and you want to take care of printing last odd even.
awk 'prev && $NF!=999{print prev ORS FNR,$0;prev="";next} $NF==999{prev=""} $NF!=999{prev=FNR FS $0} END{if(prev){print prev}}' Input_file
With a more comprehensive sample input of:
$ cat file
This is test line 999
This is test line 11
This is test line 999
This is test line 12
This is test line 13
This is test line 999
This is test line 999
This is test line 999
This is test line 14
This is test line 15
This is test line 16
This is test line 999
This is test line 17
This is test line 18
Try this:
$ cat tst.awk
$NF == 999 {
prev = ""
next
}
{
printf "%s", prev
prev = $0 ORS
}
END {
printf "%s", prev
}
$ awk -f tst.awk file
This is test line 12
This is test line 14
This is test line 15
This is test line 17
This is test line 18
or if you favor brevity over clarity:
$ awk '$NF==999{p="";next} {printf "%s",p; p=$0 ORS} END{printf "%s",p}' file
This is test line 999
This is test line 11
This is test line 999
This is test line 12
This is test line 13
This is test line 999
This is test line 999
This is test line 999
This is test line 14
This is test line 15
This is test line 16
This is test line 999
This is test line 17
This is test line 18
Notice that the above will work even if some other part of your line than the last field contained 999 or if the last field as 9999 instead of your target 999, it doesn't require 999 to be written/tested multiple times in the script, if you wanted to test, say, the 3rd field in the line instead of the last field you could just change $NF to $3 =, if you WANTED to test the whole line for a regexp you'd just change $NF==999 to /999/, it'll work even if your target string contains regexp metacharacters, and it will work in any awk in any shell on any UNIX box.
For a sed1 solution, that handles all edge cases (999 in first 2 rows or consecutive rows of 999):
sed '
1{
/999/d # Special case needed for line 1. Delete if it contains 999.
}
$!N # Append next line. $!N stops exit w/o printing at EOF.
/999/d # If pattern space contains 999, d & begin next cycle.
P # If we get to here, there is no 999. Print to first newline.
D # Delete to first newline.
' FILE
Output:
This is test line 12
This is test line 13
This is test line 17
This is test line 18
1 Tested on both BSD (Mac OS X) & GNU sed.

Delete \n characters from line range in text file

Let's say we have a text file with 1000 lines.
How can we delete new line characters from line 20 to 500 (replace them with space for example)?
My try:
sed '20,500p; N; s/\n/ /;' #better not to say anything
All other lines (1-19 && 501-1000) should be preserved as-is.
As I'm familiar with sed, awk or perl solutions are welcomed, but please give an explanation with them as I'm a perl and awk newbie.
You could use something like this (my example is on a slightly smaller scale :-)
$ cat file
1
2
3
4
5
6
7
8
9
10
$ awk '{printf "%s%s", $0, (2<=NR&&NR<=5?FS:RS)}' file
1
2 3 4 5 6
7
8
9
10
The second %s in the printf format specifier is replaced by either the Field Separator (a space by default) or the Record Separator (a newline) depending on whether the Record Number is within the range.
Alternatively:
$ awk '{ORS=(2<=NR&&NR<=5?FS:RS)}1' file
1
2 3 4 5 6
7
8
9
10
Change the Output Record Separator depending on the line number and print every line.
You can pass variables for the start and end if you want, using awk -v start=2 -v end=5 '...'
This might work for you (GNU sed):
sed -r '20,500{N;s/^(.*)(\n)/\2\1 /;D}' file
or perhaps more readably:
sed ':a;20,500{N;s/\n/ /;ta}' file
Using a perl one-liner to strip the newline:
perl -i -pe 'chomp if 20..500' file
Or to replace it with a space:
perl -i -pe 's/\R/ / if 20..500' file
Explanation:
Switches:
-i: Edit <> files in place (makes backup if extension supplied)
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
chomp: Remove new line
20 .. 500: if Range operator .. is between line numbers 20 to 500
Here's a perl version:
my $min = 5; my $max = 10;
while (<DATA>) {
if ($. > $min && $. < $max) {
chomp;
$_ .= " ";
}
print;
}
__DATA__
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Output:
1
2
3
4
5
6 7 8 9 10
11
12
13
14
15
It reads in DATA (which you can set to being a filehandle or whatever your application requires), and checks the line number, $.. While the line number is between $min and $max, the line ending is chomped off and a space added to the end of the line; otherwise, the line is printed as-is.

Get multi-line text in between horizontal delimiter with sed / awk

I would like to get multi-line text in between horizontal delimiter and ignore anything else before and after the delimiter.
An example would be:-
Some text here before any delimiter
----------
Line 1
Line 2
Line 3
Line 4
----------
Line 1
Line 2
Line 3
Line 4
----------
Some text here after last delimiter
And I would like to get
Line 1
Line 2
Line 3
Line 4
Line 1
Line 2
Line 3
Line 4
How do I do this with awk / sed with regex? Thanks.
You can try this.
file: a.awk:
BEGIN { RS = "-+" }
{
if ( NR > 1 && RT != "" )
{
print $0
}
}
run: awk -f a.awk data_file
If you can comfortably fit the entire file into memory, and if Perl is acceptable instead of awk or sed,
perl -0777 -pe 's/\A.*?\n-{10}\n//s;
s/(.*\n)-{10}\n.*?\Z/\1/s;
s/\n-{10}\n/\n\n\n/g' file >newfile
The main FAQs here are the -0777 option (slurp mode) and the /s (dot matches newlines) regex flag.
This might work for you:
sed '1,/^--*$/d;:a;$!{/\(^\|\n\)--*$/!N;//!ba;s///p};d' file

sed + awk + verify line in file

I have the following example file
/etc/sysconfig/network/script.sh = -exe $Builder
run_installation 123 44 556 4 = run_installation arg1 arg2 arg3 948
EXE=somthing
EXE somthing
I have three questions (I write bash script)
how to verify by sed or awk if the string "-exe" exist after "=" character
how to verify by sed or awk if the string run_installation exist in the first of the line (the first word in the line) and after the "=" character as example below (file)
the string EXE in file can be "EXE" or as "EXE=" , how to delete by sed the EXE or EXE=
I do:
sed s'/EXE//g' | sed s'/EXE=//g'
but its not nice way to do in my bash script
• I need three different answers!
Lidia
you did not give further criteria on what to do if conditions 1 and 2 are not found...
awk '/=.*-exe/{f=1;}
/^run_installation.*=.*run_installation/{g=1}
/^EXE/{ gsub(/EXE=|EXE/,"") }
f && g{ print "ok" ;exit }
' file
The above code checks for condition 1 and condition 2 and print "ok" when both are found. The substitution of EXE for condition 3 is added for illustration purpose. State more clearly what you want to do and show your expected output next time
To verify them separately,
awk '/= -exe/{print "found"}' file
awk '/^run_installation.*=.*run_installation/{print "found"}' file