Can I construct in Perl a network type message that has odd byte length - perl

$str = "0xa"; #my hex number
$m = pack("n",hex("$str")); --> the output is 000a
$m = pack("c",hex("$str")); --> the output is 0a
I need the result to be only a. The bottom line is that, with pack, I can send on a socket, messages that have odd length (like A675). If I try to send A675B then with pack I will have A6750B

A675 is two bytes. A675B is two and a half bytes. Sockets don't support sending anything smaller than a byte. You could send a flag that tells the receiver to ignore one nybble of the message, but that's about it.

Related

perl: utf8 <something> does not map to Unicode while <something> doesn't seem to be present present

I'm using MARC::Lint to lint some MARC records, but every now and them I'm getting an error (on about 1% of the files):
utf8 "\xCA" does not map to Unicode at /usr/lib/x86_64-linux-gnu/perl/5.26/Encode.pm line 212.
The problem is that I've tried different methods but cannot find "\xCA" in the file...
My script is:
#!perl -w
use MARC::File::USMARC;
use MARC::Lint;
use utf8;
use open OUT => ':utf8';
my $lint = new MARC::Lint;
my $filename = shift;
my $file = MARC::File::USMARC->in( $filename );
while ( my $marc = $file->next() ) {
$lint->check_record( $marc );
# Print the errors that were found
print join( "\n", $lint->warnings ), "\n";
} # while
and the file can be downloaded here: http://eroux.fr/I14376.mrc
Is "\xCA" hidden somewhere? Or is this a bug in MARC::Lint?
The problem has nothing to do with MARC::Lint. Remove the lint check, and you'll still get the error.
The problem is a bad data file.
The file contains a "directory" of where the information is located in the file. The following is a human-readable rendition of the directory for the file you provided:
tagno|offset|len # Offsets are from the start of the data portion.
001|00000|0017 # Length include the single-byte field terminator.
006|00017|0019 # Offset and lengths are in bytes.
007|00036|0015
008|00051|0041
035|00092|0021
035|00113|0021
040|00134|0018
050|00152|0022
066|00174|0009
245|00183|0101
246|00284|0135
264|00419|0086
300|00505|0034
336|00539|0026
337|00565|0026
338|00591|0036
546|00627|0016
500|00643|0112
505|00755|9999 <--
506|29349|0051
520|29400|0087
533|29487|0115
542|29602|0070
588|29672|0070
653|29742|0013
710|29755|0038
720|29793|0130
776|29923|0066
856|29989|0061
880|30050|0181
880|30231|0262
Notice the length of the field with tag 505, 9999. This is the maximum value supported (because the length is stored as four decimal digits). The catch is that value of that field is far larger than 9,999 bytes; it's actually 28,594 bytes in size.
What happens is that the module extracts 9,999 bytes rather than 28,594. This happens to cut a UTF-8 sequence in half. (The specific sequence is CA BA, the encoding of ʼ.) Later, when the module attempts to decode that text, an error is thrown. (CA must be followed by another byte to be valid.)
Are these records you are generating? If so, you need to make sure that no field requires more than 9,999 bytes.
Still, the module should handle this better. It could read until it finds a end-of-field marker instead of using the length when it finds no end-of-field marker where it expects one and/or it could handle decoding errors in a non-fatal manner. It already has a mechanism to report these problems ($marc->warnings).
In fact, if it hadn't died (say if the cut happened to occur in between characters instead of in the middle of one), $marc->warnings would have returned the following message:
field does not end in end of field character in tag 505 in record 1

Perl SMTP: can't send email with non-ascii characters in body

Code, sending email (working good):
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Email::Sender::Simple qw(sendmail);
use Email::Sender::Transport::SMTP ();
use Email::Simple ();
use open ':std', ':encoding(UTF-8)';
sub send_email
{
my $email_from = shift;
my $email_to = shift;
my $subject = shift;
my $message = shift;
my $smtpserver = 'smtp.gmail.com';
my $smtpport = 465;
my $smtpuser = 'user#gmail.com';
my $password = 'secret';
my $transport = Email::Sender::Transport::SMTP->new({
host => $smtpserver,
port => $smtpport,
sasl_username => $email_from,
sasl_password => $password,
debug => 1,
ssl => 1,
});
my $email = Email::Simple->create(
header => [
To => $email_to,
From => $email_from,
Subject => $subject,
],
body => $message,
);
$email->header_set( 'Content-Type' => 'text/html' );
$email->header_set( 'charset' => 'UTF-8' );
sendmail($email, { transport => $transport });
}
send_email('user#gmail.com', 'user#gmail.com', 'Hello', 'test email');
As soon as I add non-ascii characters to the body:
send_email('user#gmail.com', 'user#gmail.com', 'Hello', 'test email. Русский текст');
it hangs with the last message in debug output:
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> charset: UTF-8
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>>
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> test email. Русский текст
Net::SMTP::_SSL=GLOB(0x8d41fa0)>>> .
How to fix?
TL;TR: the fix is simple but the problem itself is complex. To fix the issue add:
$email = Encode::encode('utf-8',$email->as_string)
before giving the mail to sendmail(...). But note the warning at the end of this answer about possible problems when sending 8 bit data like this inside a mail in the first place.
To actually understand the problem and the fix one has to look deeper into the handling of characters vs. octets in sockets in Perl:
Email::Sender::Transport::SMTP uses Net::SMTP which itself uses the syswrite method of the underlying IO::Socket::SSL or IO::Socket::IP (or IO::Socket::INET) socket, depending if SSL was used or not.
syswrite expects octets and it expects the number of octets written to the socket.
But, the mail you construct with Email::Simple returns not octets but a string with the UTF8 flag set. In this string the number of characters is different from the number of octets because the Russian текст is treated as 5 characters while it is 10 octets when converted with the UTF-8.
Email::Sender::Transport::SMTP just forwards the UTF8 string of the email to Net::SMTP which uses it inside a syswrite. The length is computed using length which gives the number of characters which is different from the number of octets in this case. But on the socket site it will take the octets and not the characters out of the string and will treat the given length as number of octets.
Because it will treat the given length as octets and not characters it will ultimately send less data to the server as expected by the upper layers of the program.
This way the end-of-mail marker (line with single dot) gets not send and thus the server is waiting for the client to send more data while the client is not aware of more data to send.
As an example take a mail which consists only of two Russian characters 'ий'. With line ends and the end-of-mail marker it consists of 7 characters:
ий\r\n.\r\n
But, these 7 characters are actually 9 octets because the first 2 characters are two octets each
и й \r \n . \r \n
d0 b8 d0 b9 0d 0a 2e 0d 0a
Now, a syswrite($fd,"ий\r\n.\r\n",7) will only write the first 7 octets of the 7 character but 9 octets long string:
и й \r \n .
d0 b8 d0 b9 0d 0a 2e
This means that the end-of-mail marker is incomplete. And this means that the mail server will wait for more data while the mail client is not aware of any more data it needs to sent. Which essentially causes the application to hang.
Now, who is too blame for this?
One could argue that IO::Socket::SSL::syswrite should deal with UTF8 data in a sane way and this what was requested but in RT#98732. But, the documentation for syswrite in IO::Socket::SSL clearly says that it works on bytes. And since it is practically impossible to create a sane character based behavior when considering non-blocking sockets this bug was rejected. Also non-SSL sockets will have problems with UTF8 strings too: if you would not use SSL in the first place the program would not hang but crash with Wide character in syswrite ... instead.
Next layer up would be to expect Net::SMTP to properly handle such UTF8 strings. Only, it is explicitly said in the documentation of Net::SMTP::data:
DATA may be a reference to a list or a list and must be encoded by the caller to octets of whatever encoding is required, e.g. by using the Encode module's encode() function.
Now one could argue that either Email::Transport should handle UTF8 strings properly or that Email::Simple::as_string should not return a UTF8 string in the first place.
But one could go even another layer up: to the developer itself. Mail is traditionally ASCII only and sending non-ASCII characters inside a mail is a bad idea since it works only reliably with mail servers having the 8BITMIME extension. If mail servers are involved which don't support this extension the results are unpredictable, i.e. mail can be transformed (which might break signatures), can be changed to be unreadable or could be lost somewhere. Thus better use a more complex module like Email::MIME and set an appropriate content transfer encoding.

Missing either data or entire lines on serial port read

I am trying to read streaming serial data at 115200b and cant seem to pick it all up.
using the input method ($data = $Port -> input), I get the first 14 or 15 characters of every line. I need the whole line.
using the read method ($data= $Port -> read(4096)) and by adjusting the read_interval I can either get partials of every line using
$Port->read_interval(1);
or fully every third line using
$Port->read_interval(2);
I need all of every line.
here is the code
my $App_Main_Port = Win32::SerialPort->start ($Test_cfgfile);
$App_Main_Port->read_interval(1);
$App_Main_Port->read_char_time(1);
for ($i=0;;) {
# $data = $App_Main_Port -> input;
$data= $App_Main_Port -> read(4096);
print "$data\n";
}
by adjusting the read interval I get the results mentioned above. I started with the default values of 100 in the interval and char_time parameters, but only get every third line.
Tx for any insight!
Chris
Serial port reads often just return what's buffered in the serial port at that moment. A 16550 UART only buffers 16 bytes total, and that's what most PCs emulate.
From the Win32::SerialPort page, it appears you need to call ->read() in a list context, so you can find out how many bytes actually got read. If you really do want to block until you've received all 4K characters, try a loop like this:
# read in 4K bytes
my ($data, $temp, $count_in, $total_in);
$total_in = 0;
while ($total_in < 4096) {
( $count_in, $temp ) = $App_Main_Port->read( 4096 - $total_in );
$data .= $temp;
$total_in += $count_in;
}
That said, your real problem sounds like it could be a flow control issue. Even with such a loop, you might still get dropped characters if you don't have proper flow control. You might experiment with handshake("dtr") or handshake("rts").

Convert bit vector to binary in Perl

I'm not sure of the best way to describe this.
Essentially I am attempting to write to a buffer which requires a certain protocol. The first two bytes I would like to are "10000001" and "11111110" (bit by bit). How can I write these two bytes to a file handle in Perl?
To convert spelled-out binary to actual bytes, you want the pack function with either B or b (depending on the order you have the bits in):
print FILE pack('B*', '1000000111111110');
However, if the bytes are constant, it's probably better to convert them to hex values and use the \x escape with a string literal:
print FILE "\x81\xFE";
How about
# open my $fh, ...
print $fh "\x81\xFE"; # 10000001 and 11111110
Since version 5.6.0 (released in March 2000), perl has supported binary literals as documented in perldata:
Numeric literals are specified in any of the following floating point or integer formats:
12345
12345.67
.23E-10 # a very small number
3.14_15_92 # a very important number
4_294_967_296 # underscore for legibility
0xff # hex
0xdead_beef # more hex
0377 # octal (only numbers, begins with 0)
0b011011 # binary
You are allowed to use underscores (underbars) in numeric literals between digits for legibility. You could, for example, group binary digits by threes (as for a Unix-style mode argument such as 0b110_100_100) or by fours (to represent nibbles, as in 0b1010_0110) or in other groups.
You may be tempted to write
print $fh 0b10000001, 0b11111110;
but the output would be
129254
because 10000001₂ = 129₁₀ and 11111110₂ = 254₁₀.
You want a specific representation of the literals’ values, namely as two unsigned bytes. For that, use pack with a template of "C2", i.e., octet times two. Adding underscores for readability and wrapping it in a convenient subroutine gives
sub write_marker {
my($fh) = #_;
print $fh pack "C2", 0b1000_0001, 0b1111_1110;
}
As a quick demo, consider
binmode STDOUT or die "$0: binmode: $!\n"; # we'll send binary data
write_marker *STDOUT;
When run as
$ ./marker-demo | od -t x1
the output is
0000000 81 fe
0000002
In case it’s unfamiliar, the od utility is used here for presentational purposes because the output contains a control character and Þ (Latin small thorn) in my system’s encoding.
The invocation above commands od to render in hexadecimal each byte from its input, which is the output of marker-demo. Note that 10000001₂ = 81₁₆ and 11111110₂ = FE₁₆. The numbers in the left-hand column are offsets: the special marker bytes start at offset zero (that is, immediately), and there are exactly two of them.

What's the simplest way of adding one to a binary string in Perl?

I have a variable that contains a 4 byte, network-order IPv4 address (this was created using pack and the integer representation). I have another variable, also a 4 byte network-order, subnet. I'm trying to add them together and add one to get the first IP in the subnet.
To get the ASCII representation, I can do inet_ntoa($ip&$netmask) to get the base address, but it's an error to do inet_ntoa((($ip&$netmask)+1); I get a message like:
Argument "\n\r&\0" isn't numeric in addition (+) at test.pm line 95.
So what's happening, the best as I can tell, is it's looking at the 4 bytes, and seeing that the 4 bytes don't represent a numeric string, and then refusing to add 1.
Another way of putting it: What I want it to do is add 1 to the least significant byte, which I know is the 4th byte? That is, I want to take the string \n\r&\0 and end up with the string \n\r&\1. What's the simplest way of doing that?
Is there a way to do this without having to unpack and re-pack the variable?
What's happening is that you make a byte string with $ip&$netmask, and then try to treat it as a number. This is not going to work, as such. What you have to feed to inet_ntoa is.
pack("N", unpack("N", $ip&$netmask) + 1)
I don't think there is a simpler way to do it.
Confusing integers and strings. Perhaps the following code will help:
use Socket;
$ip = pack("C4", 192,168,250,66); # why not inet_aton("192.168.250.66")
$netmask = pack("C4", 255,255,255,0);
$ipi = unpack("N", $ip);
$netmaski = unpack("N", $netmask);
$ip1 = pack("N", ($ipi&$netmaski)+1);
print inet_ntoa($ip1), "\n";
Which outputs:
192.168.250.1