Chunked transfer encoding browser experience - perl

Why the output of this simple Perl script >>
print "Content-type: text/plain\n";
print "Transfer-Encoding: chunked\n\n";
print "11\n\n";
print "0123456789ABCDEF\n";
print "11\n\n";
print "0123456789ABCDEF\n";
print "0\n\n";
...works for Chrome browser and does not for IE10..?

You’ve implemented the chunked transfer coding wrong: Each chunk consists of the chunk size in bytes in hexadecimal notation, followed by a CRLF sequence, followed by the chunk data:
chunk = chunk-size [ chunk-extension ] CRLF
chunk-data CRLF
chunk-size = 1*HEX
last-chunk = 1*("0") [ chunk-extension ] CRLF
chunk-data = chunk-size(OCTET)
So your code should look like this:
print "Content-type: text/plain\r\n";
print "Transfer-Encoding: chunked\r\n";
print "\r\n";
# first chunk
print "10\r\n";
print "0123456789ABCDEF\r\n";
# second chunk
print "10\r\n";
print "0123456789ABCDEF\r\n";
# last chunk
print "0\r\n";
print "\r\n";

Related

Perl - How to calculate CRC16 of N bytes from array M-dimensional (with N<M) using Digest::CRC

I need to calculate a CRC16 of N-bytes (5 in the example, for the sake of simplicity) extracted from a binary file of size M (a pair of Kb, not so relevant for my scopes).
printf "offset\tvalue\tcrc16\n";
#Read N bytes from file and copy in the container
for my $counter (0 .. 5- 1)
{
my $oneByte;
read(FH, $oneByte, 1) or die "Error reading $inFile!";
my $ctx2 = Digest::CRC->new( type => 'crc16' );
my $digest2 = ($ctx2->add($oneByte))->hexdigest;
# PRINT for debugging
printf "0x%04X\t0x%02X\t0x", $counter, ord $oneByte;
print $digest2, "\n";
}
Considering this binary input
I obtain the result:
The script is performing byte-by-byte CRC16 (correct by the way), but I need the CRC16 of the full binary stream of 5 bytes (the expected value should be 0x6CD6).
Where am I wrong in the script?
Calling hexdigest or digest or b64digest clears the buffer and begins the next digest from scratch. (If you were computing digests of several files/streams, you wouldn't want the data from one stream to affect the digest of a separate stream).
So wait until the stream is completely read to call digest
... {
...
$ctx2->add($oneByte);
}
print "digest = ", $ctx2->hexdigest, "\n";
Or to help in debugging, save the stream and redigest the stream after each new byte
my $manyBytes = "";
... {
...
$manyBytes .= $oneByte;
$digest2 = $ctx2->add($manyBytes)->hexdigest;
...
}
You can use ->add. You can either pass the whole string at once, chunk by chunk, or character by character.
$ perl -M5.010 -MDigest::CRC -e'
my $d = Digest::CRC->new( type => "crc16" );
$d->add("\x49\x34\x49\x31\x31");
say $d->hexdigest;
'
6cd6
$ perl -M5.010 -MDigest::CRC -e'
my $d = Digest::CRC->new( type => "crc16" );
$d->add($_) for "\x49", "\x34", "\x49", "\x31", "\x31";
say $d->hexdigest;
'
6cd6
As shown, use a single object, and add every byte before calling ->digest (etc) as this resets the process.

splitting text from a file between [ ] perl

I am reading in from a log file and finding lines that have a certain error in them, after that I am able to output the line to a file. But I don't want to output the whole line to the file, just the date that the error occurred. The format of the log file is like this:
"[5/13/14 0:00:31:444 EDT] some other text". I want to be able to just write the date to another file. I am having trouble doing this with split. Here is what I have:
if (/WSVR0605W/)
{
my #vals =~ split(/\[/, $string);
$vals[0] =~ s/\\//g;
print ERRORFILE "$vals[0]\n";
}
Thanks
You might want to split on ] and later remove [.
my #vals = split(/\]/, $string);
$vals[0] =~ s/\[//;
which may be better job for regex,
my ($date) = $string =~ /\[(.+?)\]/;

Perl - Format text data before writing to a file

I am writing data in a file. The file will look like this.
[section1] [section2] [section3]
[section1] [section2] [section3]
I am not writing data to the file directly.
I am first appending rows in a string and then writing to a file.
$str .= "section1_data section2_data section3_data\n";
$str .= "section1_more_data section2_more_data section3_more_data\n";
Now what I want is that all the sections should be 30 chars long.
The data inside all sections will always be less than or equal to 30 chars.
Is there a way to do this in perl?
I am using following syntax to write to file
open FH,">>filename";
print FH $str;
close FH;
$str .= sprintf("[%-30s] [%-30s] [%-30s]\n",
$section1_data,
$section2_data,
$section3_data,
);

Perl converting binary stream to hex

The problem I am having is when I have a Perl script reading data (PE Executable) via STDIN and the stream contains a line terminator "0A" the conversion to hex misses it. Then when I convert the hex data back it is corrupted (missing 0A in the hex format). So how can I detect the "windows" version of line feed "0A" in Perl?
Note: Linux OS (Perl) is reading a Windows PE
!usr/bin/perl
while($line = <STDIN>)
{
chomp($line);
#bytes = split //, $line;
foreach (#bytes)
{
printf "%02lx", ord $_;
}
}
Usage example:
[root#mybox test]# cat test.exe | perl encoder.pl > output
In your loop, you are running chomp on each input line. This is removing whatever value is currently in $/ from the end of your line. Chances are this is 0x0a, and that's where the value is going. Try removing chomp($line) from your loop.
In general, using line oriented reading doesn't make sense for binary files that are themselves not line oriented. You should take a look at the lower level read function which allows you to read a block of bytes from the file without caring what those bytes are. You can then process your data in blocks instead of lines.
#With split
cat magic.exe | perl -e 'print join("", map { sprintf("\\x%02x", ord($_)) } split(//, join("", <STDIN>)))' > hex_encoded_binary
#With pack
cat magic.exe| perl -e 'print join("", map { "\\x" . $_ } unpack("H*", join("", <STDIN>)) =~ /.{2}/gs)' > hex_encoded_binary

Perl: utf8::decode vs. Encode::decode

I am having some interesting results trying to discern the differences between using Encode::decode("utf8", $var) and utf8::decode($var). I've already discovered that calling the former multiple times on a variable will eventually result in an error "Cannot decode string with wide characters at..." whereas the latter method will happily run as many times as you want, simply returning false.
What I'm having trouble understanding is how the length function returns different results depending on which method you use to decode. The problem arises because I am dealing with "doubly encoded" utf8 text from an outside file. To demonstrate this issue, I created a text file "test.txt" with the following Unicode characters on one line: U+00e8, U+00ab, U+0086, U+000a. These Unicode characters are the double-encoding of the Unicode character U+8acb, along with a newline character. The file was encoded to disk in UTF8. I then run the following perl script:
#!/usr/bin/perl
use strict;
use warnings;
require "Encode.pm";
require "utf8.pm";
open FILE, "test.txt" or die $!;
my #lines = <FILE>;
my $test = $lines[0];
print "Length: " . (length $test) . "\n";
print "utf8 flag: " . utf8::is_utf8($test) . "\n";
my #unicode = (unpack('U*', $test));
print "Unicode:\n#unicode\n";
my #hex = (unpack('H*', $test));
print "Hex:\n#hex\n";
print "==============\n";
$test = Encode::decode("utf8", $test);
print "Length: " . (length $test) . "\n";
print "utf8 flag: " . utf8::is_utf8($test) . "\n";
#unicode = (unpack('U*', $test));
print "Unicode:\n#unicode\n";
#hex = (unpack('H*', $test));
print "Hex:\n#hex\n";
print "==============\n";
$test = Encode::decode("utf8", $test);
print "Length: " . (length $test) . "\n";
print "utf8 flag: " . utf8::is_utf8($test) . "\n";
#unicode = (unpack('U*', $test));
print "Unicode:\n#unicode\n";
#hex = (unpack('H*', $test));
print "Hex:\n#hex\n";
This gives the following output:
Length: 7
utf8 flag:
Unicode:
195 168 194 171 194 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 4
utf8 flag: 1
Unicode:
232 171 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 2
utf8 flag: 1
Unicode:
35531 10
Hex:
e8ab8b0a
This is what I would expect. The length is originally 7 because perl thinks that $test is just a series of bytes. After decoding once, perl knows that $test is a series of characters that are utf8-encoded (i.e. instead of returning a length of 7 bytes, perl returns a length of 4 characters, even though $test is still 7 bytes in memory). After the second decoding, $test contains 4 bytes interpreted as 2 characters, which is what I would expect since Encode::decode took the 4 code points and interpreted them as utf8-encoded bytes, resulting in 2 characters. The strange thing is when I modify the code to call utf8::decode instead (replace all $test = Encode::decode("utf8", $test); with utf8::decode($test))
This gives almost identical output, only the result of length differs:
Length: 7
utf8 flag:
Unicode:
195 168 194 171 194 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 4
utf8 flag: 1
Unicode:
232 171 139 10
Hex:
c3a8c2abc28b0a
==============
Length: 4
utf8 flag: 1
Unicode:
35531 10
Hex:
e8ab8b0a
It seems like perl first counts the bytes before decoding (as expected), then counts the characters after the first decoding, but then counts the bytes again after the second decoding (not expected). Why would this switch happen? Is there a lapse in my understanding of how these decoding functions work?
Thanks,Matt
You are not supposed to use the functions from the utf8 pragma module. Its documentation says so:
Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.
Always use the Encode module, and also see the question Checklist for going the Unicode way with Perl. unpack is too low-level, it does not even give you error-checking.
You are going wrong with the assumption that the octects E8 AB 86 0A are the result of UTF-8 double-encoding the characters 諆 and newline. This is the representation of a single UTF-8 encoding of these characters. Perhaps the whole confusion on your side stems from that mistake.
length is unappropriately overloaded, at certain times it determines the length in characters, or the length in octets. Use better tools such as Devel::Peek.
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Devel::Peek qw(Dump);
use Encode qw(decode);
my $test = "\x{00e8}\x{00ab}\x{0086}\x{000a}";
# or read the octets without implicit decoding from a file, does not matter
Dump $test;
# FLAGS = (PADMY,POK,pPOK)
# PV = 0x8d8520 "\350\253\206\n"\0
$test = decode('UTF-8', $test, Encode::FB_CROAK);
Dump $test;
# FLAGS = (PADMY,POK,pPOK,UTF8)
# PV = 0xc02850 "\350\253\206\n"\0 [UTF8 "\x{8ac6}\n"]
Turns out this was a bug: https://rt.perl.org/rt3//Public/Bug/Display.html?id=80190.