Trim the last character of a stream - sed

I use sed -e '$s/.$//' to trim the last character of a stream. Is it the correct way to do so? Are there other better ways to do so with other command line tools?
$ builtin printf 'a\nb\0' | sed -e '$s/.$//' | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
EDIT: It seems that this command is not robust. The expected output is a\nb for the following example. Better methods (but not too verbose) are needed.
$ builtin printf 'a\nb\n' | sed -e '$s/.$//' | od -c -t x1 -Ax
000000 a \n \n
61 0a 0a
000003

You may use head -c -1:
printf 'a\nb\0' | head -c -1 | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
printf 'a\nb\n' | head -c -1 | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003

It seems you can't rely on any line-oriented tools (like sed) that automatically remove and re-add newlines.
Perl can slurp the whole stream into a string and can remove the last char:
$ printf 'a\nb\0' | perl -0777 -pe chop | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
$ printf 'a\nb\n' | perl -0777 -pe chop | od -c -t x1 -Ax
000000 a \n b
61 0a 62
000003
The tradeoff is that you need to hold the entire stream in memory.

Related

Hot to replace newline characters with a string in sed

First, this is not a duplicate of, e.g., How can I replace each newline (\n) with a space using sed?
What I want is to exactly replace every newline (\n) in a string, like so:
printf '%s' $'' | sed '...; s/\n/\\&/g'
should result in the empty string
printf '%s' $'a' | sed '...; s/\n/\\&/g'
should result in a (not followed by a newline)
printf '%s' $'a\n' | sed '...; s/\n/\\&/g'
should result in
a\
(the trailing \n of the final line should be replaced, too)
A solution like :a;N;$!ba; s/\n/\\&/g from the other question doesn't do that properly:
printf '%s' $'' | sed ':a;N;$!ba; s/\n/\\&/g' | hd
works;
printf '%s' $'a' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 |a|
00000001
works;
printf '%s' $'a\nb' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 5c 0a 62 |a\.b|
00000004
works;
but when there's a trailing \n on the last line
printf '%s' $'a\nb\n' | sed ':a;N;$!ba;s/\n/\\&/g' | hd
00000000 61 5c 0a 62 0a |a\.b.|
00000005
it doesn't get quoted.
Easier to use perl than sed, since it has (by default, at least) a more straightforward treatment of the newlines in its input:
printf '%s' '' | perl -pe 's/\n/\\\n/' # Empty string
printf '%s' a | perl -pe 's/\n/\\\n/' # a
printf '%s\n' a | perl -pe 's/\n/\\\n/' # a\<newline>
printf '%s\n' a b | perl -pe 's/\n/\\\n/' # a\<newline>b\<newline>
# etc
If your inputs aren't huge, you could use
perl -0777 -pe 's/\n/\\\n/g'
instead to read the entire input at once instead of line by line, which can be more efficient.
how to replace newline charackters with a string in sed
It's not possible. From sed script point of view, the trailing line missing or not makes no difference and is undetectable.
Aaaanyway, use GNU sed with sed -z:
sed -z 's/\n/\\\n/g'
GNU awk can use the RT variable to detect a missing record terminator:
$ printf 'a\nb\n' | gawk '{ORS=(RT != "" ? "\\" : "") RT} 1'
a\
b\
$ printf 'a\nb' | gawk '{ORS=(RT != "" ? "\\" : "") RT} 1'
a\
b$
This adds a "\" before each non-empty record terminator.
Using any awk:
$ printf 'a\nb\n\n' | awk '{printf "%s%s", sep, $0; sep="\\\n"}'
a\
b\
$ printf 'a\nb\n' | awk '{printf "%s%s", sep, $0; sep="\\\n"}'
a\
b$
Or { cat file; echo; } | awk ... – always add a newline to the input.

Split results of du command by new line

I have got a list of the top 20 files/folders that are taking the most amount of room on my hard drive. I would like to separate them into size path/to/file. Below is what I have done so far.
I am using: var=$(du -a -g /folder/ | sort -n -r | head -n 20). It returns the following:
120 /path/to/file
115 /path/to/another/file
110 /file/path/
etc.
I have tried the following code to split it up into single lines.
for i in $(echo $var | sed "s/\n/ /g")
do
echo "$i"
done
The result I would like is as follows:
120 /path/to/file,
115 /path/to/another/file,
110 /file/path/,
etc.
This however is the result I am getting:
120,
/path/to/file,
115,
/path/to/another/file,
110,
/file/path/,
etc.
I think awk will be easier, can be combined with a pipe to the original command:
du -a -g /folder/ | sort -n -r | head -n 20 | awk '{ print $1, $2 "," }'
If you can not create a single pipe, and have to use $var
echo "$var" | awk '{ print $1, $2 "," }'

Perl pack integer

Considering this value:
my $value = hex('0x12345678');
And I would like my hexdump to be like this (same bits order):
0000000 1234 5678
I used this method but it mixes up my value:
open(my $out, '>:raw', 'foo') or die "Unable to open: $!";
print $out pack('l', $value); # Test in little endian
print $out pack('l>', $value); # Test in big endian
Here's what I get:
0000000 5678 1234 3412 7856
How can I get the bits in order?
EDIT
So the problem might come from my hexdump, because I get the same output with the suggested answer.
$ perl -e 'print pack $_, 0x12345678 for qw( l> N )' | hexdump
0000000 3412 7856 3412 7856
I got the correct result with hexdump -C:
$ perl -e 'print pack $_, 0x12345678 for qw( l> N )' | hexdump -C
00000000 12 34 56 78 12 34 56 78 |.4Vx.4Vx|
And I found the explanation here:
hexdump confusion
The 'l>' option works for me (note there's no call to hex, though). Also, N as the template works:
perl -e 'print pack $_, 0x12345678 for qw( l> N )' | xxd
0000000: 1234 5678 1234 5678

Removing matching text from line

I have a example cut down from a log file.
112 172.172.172.1#50912 (ssl.bing.com):
I would like some how to remove the # and numbers after and (): from the url.
Would like the result.
112 172.172.172.1 ssl.bing.com
Here is the sed oneliner I have been working on.
cat newdns.log | sed -e 's/.*query: //' | cut -f 1 -d' ' | sort | uniq -c | sort -k2 > old.log
Thanks
Using sed, you could say:
sed 's/#[0-9]*//;s/(\(.*\)):$/\1/' filename
or, in a single substitution:
sed 's/#[0-9]* *(\(.*\)):$/ \1/' filename
Another sed:
sed -r 's/#[^ ]+|[():]//g'
$ echo '112 172.172.172.1#50912 (ssl.bing.com):' | sed -r 's/#[^ ]+|[():]//g'
112 172.172.172.1 ssl.bing.com

Pattern extraction using SED or AWK

How do I extract 68 from v1+r0.68?
Using awk, returns everything after the last '.'
echo "v1+r0.68" | awk -F. '{print $NF}'
Using sed to get the number after the last dot:
echo 'v1+r0.68' | sed 's/.*[.]\([0-9][0-9]*\)$/\1/'
grep is good at extracting things:
kent$ echo " v1+r0.68"|grep -oE "[0-9]+$"
68
Match the digit string before the end of the line using grep:
$ echo 'v1+r0.68' | grep -Eo '[0-9]+$'
68
Or match any digits after a .
$ echo 'v1+r0.68' | grep -Po '(?<=\.)\d+'
68
Print everything after the . with awk:
echo "v1+r0.68" | awk -F. '{print $NF}'
68
Substitute everything before the . with sed:
echo "v1+r0.68" | sed 's/.*\.//'
68
type man grep
and you will see
...
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
then type echo 'v1+r0.68' | grep -o '68'
if you want it any where special do:
echo 'v1+r0.68' | grep -o '68' > anyWhereSpecial.file_ending