Perl do input one char from stdin - perl

How can Perl do input from stdin, one char like
readline -N1
does?

You can do that with the base perl distribution, no need to install extra packages:
use strict;
sub IO::Handle::icanon {
my ($fh, $on) = #_;
use POSIX;
my $ts = new POSIX::Termios;
$ts->getattr(fileno $fh) or die "tcgetattr: $!";
my $f = $ts->getlflag;
$ts->setlflag($on ? $f | ICANON : $f & ~ICANON);
$ts->setattr(fileno $fh) or die "tcsetattr: $!";
}
# usage example
# a key like `Left` or `á` may generate multiple bytes
STDIN->icanon(0);
sysread STDIN, my $c, 256;
STDIN->icanon(1);
# the read key is in $c
Reading just one byte may not be a good idea because it will just leave garbage to be read later when pressing a key like Left or F1. But you can replace the 256 with 1 if you want just that, no matter what.

<STDIN> will read stdin one byte (C char type, which is not the same as a character which these days are typically made of several bytes except for those in the US-ASCII charset) at a time from stdin if the record separator is set to a reference to the number 1.
$ echo perl | perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'
<p>
Note that underneath, it may read (consume) more than one byte from the input. Above, the next <STDIN> within perl would return <e>, but possibly from some large buffer that was read beforehand.
$ echo perl | (perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<p>
0
Above, you'll notice that wc didn't receive any input as it had all already been consumed by perl.
$ echo perl | (PERLIO=raw perl -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<p>
4
This time, wc got 4 bytes (e, r, l, \n) as we told perl to use raw I/O so the <STDIN> translates to a read(0, bud, 1).
Instead of <STDIN>, you can use perl's read with the same caveat:
$ echo perl | (perl -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
0
$ echo perl | (PERLIO=raw perl -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
4
Or use sysread which is the true wrapper for the raw read():
$ echo perl | (perl -le 'sysread STDIN, $a, 1; print "<$a>"'; wc -c)
<p>
4
To read one character at a time, you need to read one byte at a time until the end of the character.
You can do it for UTF-8 encoded input (in locales using that encoding) in perl with <STDIN> or read (not sysread) with the -C option, including with raw PERLIO:
$ echo été | (PERLIO=raw perl -C -le '$/ = \1; $a = <STDIN>; print "<$a>"'; wc -c)
<é>
4
$ echo été | (PERLIO=raw perl -C -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<é>
4
With strace, you'd see perl does two read(0, buf, 1) system calls underneath to read that 2-byte é character.
Like with ksh93 / bash's read -N (or zsh's read -k), you can get surprises if the input is not properly encoded in UTF-8:
$ printf '\375 12345678' | (PERLIO=raw perl -C -le 'read STDIN, $a, 1; print "<$a>"'; wc -c)
<� 1234>
4
\375 (\xFD) would normally be the first byte of the encoding of a 6 byte character in UTF-8¹, so perl reads all 6 bytes here even though the second to sixth can't possibly be part of that character as they don't have the 8th bit set.
Note that when stdin is a tty device, read() will not return until the terminal at the other end sends a LF (eol), CR (which is by default converted to LF), or eof (usually ^D) or eol2 (usually not defined) character as configured in the tty line discipline (like with the stty command) as the tty driver implements its own internal line editor allowing you to edit what you type before pressing enter.
If you want to read the byte(s) that is(are) sent for each key pressed by the user there, you'd need to disable that line editor (which bash/ksh93's read -N or zsh's read -k do when stdin is a tty), see #guest's answer for details on how to do that.
¹ While now Unicode restricts codepoints to up to 0x10FFFF which means UTF-8 encodings have at most 4 bytes, UTF-8 was originally designed to encode code points up to 0x7fffffff (up to 6 byte encoding) and perl extends it to up to 0x7FFFFFFFFFFFFFFF (13 byte encoding)

Related

Why is there a 0 on a new line when I print in perl?

I'm trying to get the inode alone of a file that is passed through as an argument.
When I extract the inode, however, there is a 0 printed on a new line. I've tried to get rid of it with regex but I can't. I'm passing the script /usr/bin/vim The 0 isn't there when I run the command (ls -i /usr/bin/vim | awk '{print $1}'), but it is when I run my script.
How can I get rid of this 0?
my $filepath = $ARGV[0];
chomp $filepath;
$filepath =~ s/\s//g;
print ("FILEPATH: $filepath\n"); #looks good
my $inode = system("ls -i $filepath | awk '{print \$1}'");
$inode =~ s/[^a-zA-Z0-9]//g;
$inode =~ s/\s//g;
print ("$inode\n");
So my result is
137699967
0
When you invoke system you run the command provided as its argument, and that's what's outputting the inode number.
The return value of system is the exit code of the command run, which in this case is 0, and that's what your subsequent print call outputs.
To run an external program and capture its output, use the qx operator, like so:
my $inode = qx/ls -i $filepath | awk '{print \$1}'"/;
However, as Sobrique explained in their answer, you don't actually need to call an external program, you can use Perl's built-in stat function instead.
my $inode = stat($filepath)[1];
stat returns a list containing a variety of information about a file - index 1 holds its inode. This code won't handle if the file doesn't exist, of course.
Don't, just use the stat builtin instead
print (stat($filepath))[1]."\n";
print join "\n", map { (stat)[1] } #ARGV,"\n"

How to add blank line after every grep result using Perl?

How to add a blank line after every grep result?
For example, grep -o "xyz" may give something like -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I want the output to be like this -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I would like to do something like
grep "xyz" | perl (code to add a new line after every grep result)
This is the direct answer to your question:
grep 'xyz' | perl -pe 's/$/\n/'
But this is better:
perl -ne 'print "$_\n" if /xyz/'
EDIT
Ok, after your edit, you want (almost) this:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++'
If you don’t like the blank line at the beginning, make it:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++ && $. > 1'
NOTE: This won’t work right on filenames with colons in them. :)½
If you want to use perl, you could do something like
grep "xyz" | perl -p -e 's/(.*)/\1\n/g'
If you want to use sed (where I seem to have gotten better results), you could do something like
grep "xyz" | sed 's/.*/\0\n/g'
This prints a newline after every single line of grep output:
grep "xyz" | perl -pe 'print "\n"'
This prints a newline in between results from different files. (Answering the question as I read it.)
grep 'xyx' * | perl -pe '/(.*?):/; if ($f ne $1) {print "\n"; $f=$1}'
Use a state machine to determine when to print a blank line:
#!/usr/bin/env perl
use strict;
use warnings;
# state variable to determine when to print a blank line
my $prev_file = '';
# change DATA to the appropriate input file handle
while( my $line = <DATA> ){
# did the state change?
if( my ( $file ) = $line =~ m{ \A ([^:]*) \: .*? xyz }msx ){
# blank lines between states
print "\n" if $file ne $prev_file && length $prev_file;
# set the new state
$prev_file = $file;
}
# print every line
print $line;
}
__DATA__
file1:xyz
file2:xyz
file2:xyz2
file3:xyz

Printing reverse complement of DNA in single-line Perl

I want to write a quick single-line perl script to produce the reverse complement of a sequence of DNA. The following isn't working for me, however:
$ cat sample.dna.sequence.txt | perl -ne '{while (<>) {$seq = $_; $seq =~ tr /atcgATCG/tagcTAGC/; $revComp = reverse($seq); print $revComp;}}'
Any suggestions? I'm aware that
tr -d "\n " < input.txt | tr "[ATGCatgcNn]" "[TACGtacgNn]" | rev
works in bash, but I want to do it with perl for the practice.
Your problem is that is that you're using both -n and while (<>) { }, so you end up with while (<>) { while (<>) { } }.
If you know how to do <file.txt, why did you switch to cat file.txt|?!
perl -0777ne's/\n //g; tr/ATGCatgcNn/TACGtacgNn/; print scalar reverse $_;' input.txt
or
perl -0777pe's/\n //g; tr/ATGCatgcNn/TACGtacgNn/; $_ = reverse $_;' input.txt
Or if you don't need to remove the newlines:
perl -pe'tr/ATGCatgcNn/TACGtacgNn/; $_ = reverse $_;' input.txt
If you need to use cat, the following one liner should work for you.
ewolf#~ $cat foo.txt
atNgNt
gatcGn
ewolf#~ $cat foo.txt | perl -ne '$seq = $_; $seq =~ tr/atcgATCG/tagcTAGC/;print reverse( $seq )'
taNcNa
ctagCn
Considering the DNA sequences in single-line format in a multifasta file:
cat multifasta_file.txt | while IFS= read L; do if [[ $L == >* ]]; then echo "$L"; else echo $L | rev | tr "ATGCatgc" "TACGtacg"; fi; done > output_file.txt
If your multifasta file is not in single-line format, you can transform your file to single-line before using the command above, like this:
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' <multifasta_file.txt >multifasta_file_singleline.txt<="" p="">
Then,
cat multifasta_file_SingleLine.txt | while IFS= read L; do if [[ $L == >* ]]; then echo "$L"; else echo $L | rev | tr "ATGCatgc" "TACGtacg"; fi; done > output_file.txt
Hope it is useful for someone. It took me some time to build it.
The problem is that you're using -n in the perl flag, yet you've written your own loop. -n wraps your supplied code in a while loop like while(<STDIN>){...}. So the STDIN file handle has already been read from and your code does it again, getting EOF (end of file) or rather 'undefined'. You either need to remove the n from -ne or remove the while loop from your code.
Incidentally, a complete complement tr pattern, including ambiguous bases, is:
tr/ATGCBVDHRYKMatgcbvdhrykm/TACGVBHDYRMKtacgvbhdyrmk/
Ambiguous bases have complements too. For example, a V stands for an A, C, or G. Their complements are T, G, and C, which is represented by the ambiguous base B. Thus, V and B are complementary.
You don't need to include any N's or n's in your tr pattern (as was demonstrated in another answer) because the complement is the same and leaving them out will leave them untouched. It's just extra processing to put them in the pattern.

Counting lines ignored by grep

Let me try to explain this as clearly as I can...
I have a script that at some point does this:
grep -vf ignore.txt input.txt
This ignore.txt has a bunch of lines with things I want my grep to ignore, hence the -v (meaning I don't want to see them in the output of grep).
Now, what I want to do is I want to be able to know how many lines of input.txt have been ignored by each line of ignore.txt.
For example, if ignore.txt had these lines:
line1
line2
line3
I would like to know how many lines of input.txt were ignored by ignoring line1, how many by ignoring line2, and so on.
Any ideas on how can I do this?
I hope that made sense... Thanks!
Note that the sum of the ignored lines plus the shown lines may NOT add up to the total number of lines... "line1 and line2 are here" will be counted twice.
#!/usr/bin/perl
use warnings;
use strict;
local #ARGV = 'ignore.txt';
chomp(my #pats = <>);
foreach my $pat (#pats) {
print "$pat: ", qx/grep -c $pat input.txt/;
}
According to unix.stackexchange
grep -o pattern file | wc -l
counts the total number of a given pattern in the file. A solution, given this and the information, that you already use a script, is to use several grep instances to filter and count the patterns, which you want to ignore.
However, I'd try to build a more comfortable solution involving a scripting language like e.g. python.
This script will count the matched lines by hash lookup and save the lines to be printed in #result, where you may process them as you will. To emulate grep, just print them.
I made the script so it can print out an example. To use with the files, uncomment the code in the script, and comment the ones marked # example line.
Code:
use strict;
use warnings;
use v5.10;
use Data::Dumper; # example line
# Example data.
my #ignore = ('line1' .. 'line9'); # example line
my #input = ('line2' .. 'line9', 'fo' .. 'fx', 'line2', 'line3'); # example line
#my $ignore = shift; # first argument is ignore.txt
#open my $fh, '<', $ignore or die $!;
#chomp(my #ignore = <$fh>);
#close $fh;
my #result;
my %lookup = map { $_ => 0 } #ignore;
my $rx = join '|', map quotemeta, #ignore;
#while (<>) { # This processes the remaining arguments, input.txt etc
for (#input) { # example line
chomp; # Required to avoid bugs due to missing newline at eof
if (/($rx)/) {
$lookup{$1}++;
} else {
push #result, $_;
}
}
#say for #result; # This will emulate grep
print Dumper \%lookup; # example line
Output:
$VAR1 = {
'line6' => 1,
'line1' => 0,
'line5' => 1,
'line2' => 2,
'line9' => 1,
'line3' => 2,
'line8' => 1,
'line4' => 1,
'line7' => 1
};
while IFS= read -r pattern ; do
printf '%s:' "$pattern"
grep -c -v "$pattern" input.txt
done < ignore.txt
grep with -c counts matching lines, but with -v added it counts non-matching lines. So, simply loop over the patterns and count once for each pattern.
This will print the number of ignored matches along with the matching pattern:
grep -of ignore.txt input.txt | sort | uniq -c
For example:
$ perl -le 'print "Coroline" . ++$s for 1 .. 21' > input.txt
$ perl -le 'print "line2\nline14"' > ignore.txt
$ grep -of ignore.txt input.txt | sort | uniq -c
1 line14
3 line2
I.e., A line matching "line14" was ignored once. A line matching "line2" was ignored 3 times.
If you just wanted to count the total ignored lines this would work:
grep -cof ignore.txt input.txt
Update: modified the example above to use strings so that the output is a little clearer.
This might work for you:
# seq 1 15 | sed '/^1/!d' | sed -n '$='
7
Explanation:
Delete all lines except those that match. Pipe these matching (ignored) lines to another sed command. Delete all these lines but show the line number only of the last line. So in this example 1 thru 15, lines 1,10 thru 15 are ignored - a total of 7 lines.
EDIT:
Sorry misread the question (still a little confused!):
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt| uniq -c,' ignore.txt | sh
This shows the number of matches for each pattern in the the ignore.txt
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,' ignore.txt | sh
This shows the number of non-matches for each pattern in the the ignore.txt
If using GNU sed, these should work too:
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt | uniq -c,;e' ignore.txt
or
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,;e' ignore.txt
N.B. Your success with patterns may vary i.e. check for meta characters beforehand.
On reflection I thought this can be improved to:
sed 's,.*,/&/i\\matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
or
sed 's,.*,/&/!i\\non-matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
But NO, on large files this is actually slower.
Are both ignore.txt and input.txt sorted?
If so, you can use the comm command!
$ comm -12 ignore.txt input.txt
How many lines are ignored?
$ comm -12 ignore.txt input.txt | wc -l
Or, if you want to do more processing, combine comm with awk.:
$ comm ignore.txt input.txt | awk '
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}
{
if ($0 !~ /^\t/) {uniqtotal+=1}
if ($0 ~ /^\t[^\t]/) {commtotal+=1}
if ($0 ~ /^\t\t/) {igtotal+=1}
}'
Here I'm taking advantage with the tabs that are placed in the output by the comm command:
* If there are no tabs, the line is in ignore.txt only.
* If there is a single tab, it is in input.txt only
* If there are two tabs, the line is in both files.
By the way, not all the lines in ignore.txt are ignored. If the line isn't also in input.txt, the line can't really be said to be ignored.
With Dennis Williamson's Suggestion
comm ignore.txt input.txt | awk '
!/^\t/ {uniqtotal++}
/^\t[^\t]/ {commtotal++}
/^\t\t/ {igtotal++}
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}'

How can I use a Perl one-liner to decode an ASCII string encoded in hex?

I’d like to write a Perl one-liner to decode a line of ASCII characters encoded as hexadecimal numbers (for example the line 48 54 54 50 should be decoded as HTTP). I came up with this:
perl -nE 'say map(chr, map { qq/0x$_/ } split)'
It prints an empty line. What am I doing wrong and how would you write it?
It's your qq/0x$_/ trick that doesn't work. chr expects a number as argument, but gets the string literal "0x48". Use the hex function to convert 48 to a decimal number, like datageist does in his answer.
This works for me:
echo '48 54 54 50' | perl -nE 'say map(chr, map { hex } split)'
This works:
echo '48 54 54 50' | perl -nE 'say map{chr(hex)} split'
I’m assuming you want to feed the data from STDIN.
As always with Perl TIMTOWTDI.
I thought I would submit several options, and show what they would look like if they were written normally. If you want to know more about the command line options perldoc perlrun is a useful resource.
These all output the same thing. With the exception that some of them don't print a newline on the end.
echo '48 54 54 50' | perl -0x20 -pe'$_=chr hex$_'
echo '48 54 54 50' | perl -0x20 -ne'print chr hex$_'
echo '48 54 54 50' | perl -0777 -anE'say map chr,map hex,#F'
echo '48 54 54 50' | perl -0777 -anE'say map{chr hex$_}#F'
echo '48 54 54 50' | perl -0apple'$_=chr hex$_' -0x20
echo '48 54 54 50' | perl -apple'$_=join"",map{chr hex}#F'
echo '48 54 54 50' | perl -lanE'say map{chr hex}#F'
The following is what some of the examples would look like if they were written normally. If you want to figure out what the rest of them do, definitely look at perldoc perlrun.
perl -0x20 -pe'$_=chr hex$_'
This is one is fairly straight forward. It is perhaps the best example here, and is also the shortest one. It pretends that spaces are used to separate lines, so that there is only one letter to deal with inside of the loop.
# perl -0x20 -pe'$_=chr hex$_'
$/ = " "; # -0 ( input separator )
while( <> ){
$_ = chr hex $_;
} continue {
print $_;
}
perl -0apple'$_=chr hex$_' -0x20
This one has a few command line options that don't do anything useful.
The first -0 option is there so that -l sets the output separator to an empty string.
Which is actually the default for the output separator.
There are two -p options where one would have sufficed.
The -a option sets up the #F array, but we don't actually use it.
Basically I used -a -l and a second -p so that the options would spell apple. Otherwise this one is the same as the last example.
echo '48 54 54 50' | perl -0x20 -pe'$_=chr hex$_'
# perl -0apple'$_=chr hex$_' -0x20
$/ = ""; # -0 ( input separator )
$\ = $/; # -l ( output separator )
$/ = " "; # -0x20 ( input separator )
while( <> ){
#F = split " ", $_; # -a ( unused )
$_ = chr hex $_;
} continue {
print $_;
}
perl -lanE'say map{chr hex}#F'
I figured I already spelled apple, I might as well spell lanE.
-l isn't really useful, because we already are using say.
Used -E instead of -e so that we could use say.
# perl -lanE'say map{chr hex}#F'
$\ = $/; # -l ( output separator set to "\n" )
while( <> ){
#F = split " ", $_; # -a
say map { chr hex $_ } #F;
}
Play perlgolf?
-ple y/0-9A-Fa-f//cd;$_=pack"H*",$_
-ple $_=pack"H*",$_,join"",split
-nE say map chr hex,split
-naE say map chr hex,#F