Sed/Awk - remove blankspaces / join lines in ldif dump - sed

I got some entries in my ldif file that makes my dump bad for next import.
sambaPasswordHistory: 712BC301C488FD2651BEF5AA11899950547B9ED3C059FF83CE39049B
BAEECB31692629A94A3C1F4737E3EA854C001704793DB9A67EB977563CE601DF98E7E23C2851F
082D3D695C8655378629DCCDAF125ACA63141B361190ABC750AF403FDEF000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000
homeDirectory: /home_nfs/
How can I make using sed/awk/etc to change it to
sambaPasswordHistory: 712BC301C488FD2651BEF5AA11899950547B9ED3C059FF83CE39049BBAEECB31692629A94A3C1F4737E3EA854C001704793DB9A67EB977563CE601DF98E7E23C2851F082D3D695C8655378629DCCDAF125ACA63141B361190ABC750AF403FDEF000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
homeDirectory: /home_nfs/
Aka keep everything in one line

One way using GNU sed:
sed -n 'H; ${ x; s/\n//; s/\n //g; p}' file.txt
Result:
sambaPasswordHistory: 712BC301C488FD2651BEF5AA11899950547B9ED3C059FF83CE39049BBAEECB31692629A94A3C1F4737E3EA854C001704793DB9A67EB977563CE601DF98E7E23C2851F082D3D695C8655378629DCCDAF125ACA63141B361190ABC750AF403FDEF000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
homeDirectory: /home_nfs/

$ cat file
sambaPasswordHistory: abc
def
12345
67
homeDirectory: /home_nfs/
$
$ awk 'NR>1 && !sub(/^ /,""){print s; s=""} {s = s $0} END{print s}' file
sambaPasswordHistory: abcdef1234567
homeDirectory: /home_nfs/

This might work for you (GNU sed):
sed ':a;N;s/\n //;ta;P;D' file
Open a window of two lines. Remove a newline followed by a space and repeat the pattern fails. Finally print the first line and if there is still a second line in the pattern space, repeat.

One way to do using sed:
sed ':a;$!N;s/\n //;ta' file
sed joins(N) every line other than the last line($!). After joining, the newline followed by space(\n ) is removed. 'ta' is to loop to the branch 'a' till the substitution fails.

If the only occurrences of \n, i.e. newline followed by space, are where the lines need to be joined, you could use bbe like this:
<file bbe -e 's/\n //'

Another solution:
awk 'ORS="";!/home/{$1=$1; print}{RS="\n"}END{print "\n" $0 "\n"}' file

Related

Selecting records on base of first character

I have a file which contains the following records
+aaaa
+bbbb
cccc-123
-dddd
eeee+789
-fff+456
ggg
Now I want to keep only the records if the first character is a "+" or "-" sign
so the (new) file should look like this
+aaaa
+bbbb
-dddd
-fff+456
Can this be done via a grep or sed command ?
Try this:
grep '^\[+-\]' myfile.txt
or
grep '^[+-]' myfile.txt
Depending on your flavour of grep
This might work for you (GNU sed):
sed '/^[+-]/!d' file
or:
sed -n '/^[+-]/p' file
In the first solution: if the first character of the line is not + or - delete the line.
In the second solution: if the first character of the line is + or - print it.

Remove range of words with sed

I'm trying to remove a range of words in Unix command line with sed from a file and I just can't figure it out. For example, how can I remove the words at positions 2-4?
If the file contains: "This is a file created by me." I want it to be: "This created by me."
Thanks a lot!
Try this with GNU sed (to print word 1 and word 5 to last word):
echo "This is a file created by me." | sed 'y/ /\n/' | sed -n '1p;5,$p' | sed 'N;N;N;y/\n/ /'
Output:
This created by me.
You can use also use awk for this:
echo "This is a file created by me." | awk '{for (i=1;i<=NF;i++) if (i<2||i>4) printf "%s ",$i;print ""}'
This created by me.
This might work for you (GNU sed):
sed -r 's/(\s+\S+){3}//' file

sed - insert lines when text found / not found

I have issue with sed, i need to accomplish two things with a csv file
in front of each line that does not start UNES I need to add tag "BF2;"
at the start of the file (after UNES if present) I need to add a tag "UNH;"
Example (no UNES;)
50000024;IE15;041111;113901;verstuurd;Aangift;
50000024;IE15;041111;113901;verstuurd;Aangifte;
50000024;IE15;041111;113901;verstuurd;Aangifte;
Example (with UNES;)
UNES;
50000024;IE15;041111;113901;verstuurd;Aangift;
50000024;IE15;041111;113901;verstuurd;Aangifte;
50000024;IE15;041111;113901;verstuurd;Aangifte;
so far I have this:
sed -e 's/^\([^"UNES"]\)/BF2;\1/' | sed '/UNES/ a\UNH;'
THis works as long as a UNES; tag is present - I can't seem to figure out how to insert the UNH; when UNES is not present!
Any help much appreciated
Sample output:
UNES;
UNH;
BF2;50000024;IE15;041111;113901;verstuurd;Aangifte;
BF2;50000024;IE15;041111;113901;verstuurd;Aangifte;
BF2;50000024;IE15;041111;113901;verstuurd;Aangifte;
Here's how you could do it using awk:
awk 'NR==1 {if(f=/^UNES;/)print; print "UNH;"} !f{print "BF2;" $0} {f=0}' file
On the first line, if /^UNES;/ is matched, print it and set the flag f. Always print "UNH;". If the f flag has been set, don't do the next action, which works for the rest of the lines. Always reset f to 0 after the first line so all further lines have "BF2;" added to the start.
Testing it out:
$ cat file
UNES;
50000024;IE15;041111;113901;verstuurd;Aangift;
50000024;IE15;041111;113901;verstuurd;Aangifte;
50000024;IE15;041111;113901;verstuurd;Aangifte;
$ awk 'NR==1 {if(f=/^UNES;/)print; print "UNH;"} !f{print "BF2;" $0} {f=0}' file
UNES;
UNH;
BF2;50000024;IE15;041111;113901;verstuurd;Aangift;
BF2;50000024;IE15;041111;113901;verstuurd;Aangifte;
BF2;50000024;IE15;041111;113901;verstuurd;Aangifte;
$ cat file2
50000024;IE15;041111;113901;verstuurd;Aangift;
50000024;IE15;041111;113901;verstuurd;Aangifte;
50000024;IE15;041111;113901;verstuurd;Aangifte;
$ awk 'NR==1 {if(f=/^UNES;/)print; print "UNH;"} !f{print "BF2;" $0} {f=0}' file2
UNH;
BF2;50000024;IE15;041111;113901;verstuurd;Aangift;
BF2;50000024;IE15;041111;113901;verstuurd;Aangifte;
BF2;50000024;IE15;041111;113901;verstuurd;Aangifte;
You can use this sed command:
sed '/^UNES;$/{i\
UNH;
n};s/^/BF2;/;' file.txt
details:
/^UNES;$/i\
UNH; insert a new line when UNES; is the whole line.
n replaces the pattern space with the next line
Try this, its works for me
sed '/^UNES;$/{i\
UNH;
n};s/^[0-9]*/BF2;&/;'

Sed or awk: how to call line addresses from separate file?

I have 'file1' with (say) 100 lines. I want to use sed or awk to print lines 23, 71 and 84 (for example) to 'file2'. Those 3 line numbers are in a separate file, 'list', with each number on a separate line.
When I use either of these commands, only line 84 gets printed:
for i in $(cat list); do sed -n "${i}p" file1 > file2; done
for i in $(cat list); do awk 'NR==x {print}' x=$i file1 > file2; done
Can a for loop be used in this way to supply line addresses to sed or awk?
This might work for you (GNU sed):
sed 's/.*/&p/' list | sed -nf - file1 >file2
Use list to build a sed script.
You need to do > after the loop in order to capture everything. Since you are using it inside the loop, the file gets overwritten. Inside the loop you need to do >>.
Good practice is to or use > outside the loop so the file is not open for writing during every loop iteration.
However, you can do everything in awk without for loop.
awk 'NR==FNR{a[$1]++;next}FNR in a' list file1 > file2
You have to >>(append to the file) . But you are overwriting the file. That is why, You are always getting 84 line only in the file2.
Try use,
for i in $(cat list); do sed -n "${i}p" file1 >> file2; done
With sed:
sed -n $(sed -e 's/^/-e /' -e 's/$/p/' list) input
given the example input, the inner command create a string like this: `
-e 23p
-e 71p
-e 84p
so the outer sed then prints out given lines
You can avoid running sed/awk in a for/while loop altgether:
# store all lines numbers in a variable using pipe
lines=$(echo $(<list) | sed 's/ /|/g')
# print lines of specified line numbers and store output
awk -v lineS="^($lines)$" 'NR ~ lineS' file1 > out

Command to trim the first and last character of a line in a text file

I am looking for I one liner hopefully, that can trim the first and last character of a line, on multiple lines e.g. test.txt
Before:
xyyyyyyyyyyyyyyyyyyyx
pyyyyyyyyyyyyyyyyyyyz
After:
yyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyy
$ cat /tmp/txt
xyyyyyyyyyyyyyyyyyyyx
pyyyyyyyyyyyyyyyyyyyz
$ sed 's/^.\(.*\).$/\1/' /tmp/txt
yyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyy
There is little trick :)
sed 's/^.(.*).$/\1/' file > file1 ; rm file ; echo file1 > file ; rm file1
sed -ne 's,^.\(.*\).$,\1,p'
This command will delete all lines that have less than two characters, since one cannot really strip the first and last character from them.