Chilkat encryption doesn't work as expected - perl

I was trying to test file encryption using the chilkat functionality. Based on code found on this example page, I replaced the last part with this:
# Encrypt a string...
# The input string is 44 ANSI characters (i.e. 44 bytes), so
# the output should be 48 bytes (a multiple of 16).
# Because the output is a hex string, it should
# be 96 characters long (2 chars per byte).
my $input = "sample.pdf";
# create file handle for the pdf file
open my $fh, '<', $input or die $!;
binmode ($fh);
# the output should be sample.pdf.enc.dec
open my $ffh, '>', "$input.enc.dec" or die $!;
binmode $ffh;
my $encStr;
# read 16 bytes at a time
while (read($fh,my $block,16)) {
# encrypt the 16 bytes block using encryptStringEnc sub provided by chilkat
$encStr = $crypt->encryptStringENC($block);
# Now decrypt:
# decrypt the encrypted block
my $decStr = $crypt->decryptStringENC($encStr);
# print it in the sample.pdf.enc.dec file
print $ffh $decStr;
}
close $fh;
close $ffh;
Disclaimer:
I know the CBC mode is not recommended for file encryption because if one block is lost, the other blocks are lost too.
The output file is corrupted and when I look with beyond compare at the two files, there are chunks of the file which match and there are chunks of file which doesn't. What am I doing wrong?

You're trying to use character string encryption (encryptStringENC(), decryptStringENC()) for what is, at least partly, a binary file.
This worked for me:
my $input = "sample.pdf";
# create file handle for the pdf file
open my $fh, '<', $input or die $!;
binmode $fh;
# the output should be sample.pdf.enc.dec
open my $ffh, '>', "$input.enc.dec" or die $!;
binmode $ffh;
my $inData = chilkat::CkByteData->new;
my $encData = chilkat::CkByteData->new;
my $outData = chilkat::CkByteData->new;
# read 16 bytes at a time
while ( my $len = read( $fh, my $block, 16 ) ) {
$inData->clear;
$inData->append2( $block, $len );
$crypt->EncryptBytes( $inData, $encData );
$crypt->DecryptBytes( $encData, $outData );
print $ffh $outData->getData;
}
close $fh;
close $ffh;
You likely better off perusing the Chilkat site further though, there are sample codes for binary data.

I'm going to write and post a link to a sample that is much better than the examples posted here. The examples posted here are not quite correct. There are two important Chilkat Crypt2 properties that one needs to be aware of: FirstChunk and LastChunk. By default, both of these properties are true (or the value 1 in Perl). This means that for a given call to encrypt/decrypt, such as EncryptBytes, DecryptBytes, etc. it assumes the entire amount of data was passed. For CBC mode, this is important because the IV is used for the first chunk, and for the last chunk, the output is padded to the block size of the algorithm according to the value of the PaddingScheme property.
One can instead feed the input data to the encryptor chunk-by-chunk by doing the following:
For the 1st chunk, set FirstChunk=1, LastChunk=0.
For middle chunks, set FirstChunk=0, LastChunk=0.
For the final chunk (even if a 0-byte final chunk), set FirstChunk=0, LastChunk=1. This causes a final padded output block to be emitted.
When passing chunks using FirstChunk/LastChunk, one doesn't need to worry about passing chunks matching the block size of the algorithm. If a partial block is passed in, or if the bytes are not an exact multiple of the block size (16 bytes for AES), then Chilkat will buffer the input and the partial block will be added to the data passed in the next chunk. For example:
FirstChunk=1, LastChunk=0, pass in 23 bytes, output is 16 bytes, 7 bytes buffered.
FirstChunk=0, LastChunk=0, pass in 23 bytes, output is 16 bytes, (46-32 bytes) 14 bytes buffered
FirstChunk=0, LastChunk=1, pass in 5 bytes, output is 32 bytes, (14 buffered bytes + 5 more = 19 bytes. The 19 bytes is one full block (16 bytes) plus 3 bytes remainder, which is padded to 16, and thus the output is 32 bytes and the CBC stream is ended.

This example demonstrates using FirstChunk/LastChunk. Here's the example: https://www.example-code.com/perl/encrypt_file_chunks_cbc.asp

Related

Perl - How to calculate CRC16 of N bytes from array M-dimensional (with N<M) using Digest::CRC

I need to calculate a CRC16 of N-bytes (5 in the example, for the sake of simplicity) extracted from a binary file of size M (a pair of Kb, not so relevant for my scopes).
printf "offset\tvalue\tcrc16\n";
#Read N bytes from file and copy in the container
for my $counter (0 .. 5- 1)
{
my $oneByte;
read(FH, $oneByte, 1) or die "Error reading $inFile!";
my $ctx2 = Digest::CRC->new( type => 'crc16' );
my $digest2 = ($ctx2->add($oneByte))->hexdigest;
# PRINT for debugging
printf "0x%04X\t0x%02X\t0x", $counter, ord $oneByte;
print $digest2, "\n";
}
Considering this binary input
I obtain the result:
The script is performing byte-by-byte CRC16 (correct by the way), but I need the CRC16 of the full binary stream of 5 bytes (the expected value should be 0x6CD6).
Where am I wrong in the script?
Calling hexdigest or digest or b64digest clears the buffer and begins the next digest from scratch. (If you were computing digests of several files/streams, you wouldn't want the data from one stream to affect the digest of a separate stream).
So wait until the stream is completely read to call digest
... {
...
$ctx2->add($oneByte);
}
print "digest = ", $ctx2->hexdigest, "\n";
Or to help in debugging, save the stream and redigest the stream after each new byte
my $manyBytes = "";
... {
...
$manyBytes .= $oneByte;
$digest2 = $ctx2->add($manyBytes)->hexdigest;
...
}
You can use ->add. You can either pass the whole string at once, chunk by chunk, or character by character.
$ perl -M5.010 -MDigest::CRC -e'
my $d = Digest::CRC->new( type => "crc16" );
$d->add("\x49\x34\x49\x31\x31");
say $d->hexdigest;
'
6cd6
$ perl -M5.010 -MDigest::CRC -e'
my $d = Digest::CRC->new( type => "crc16" );
$d->add($_) for "\x49", "\x34", "\x49", "\x31", "\x31";
say $d->hexdigest;
'
6cd6
As shown, use a single object, and add every byte before calling ->digest (etc) as this resets the process.

Can someone explain this loop to me?

I have the following Perl code. I Know what the end result is: if I run it and pass in an x9.37 file, it will spit out each field of text. That's great, but I am trying to port this to another language, and I can't read Perl at all. If someone could turn this into some form of pseudocode (I don't need working Java - I can write that part) I just need someone to explain what is going on in the Perl below!
#!/usr/bin/perl -w
use strict;
use Encode;
my $tiff_flag = 0;
my $count = 0;
open(FILE,'<',$ARGV[0]) or die 'Error opening input file';
binmode(FILE) or die 'Error setting binary mode on input file';
while (read (FILE,$_,4)) {
my $rec_len = unpack("N",$_);
die "Bad record length: $rec_len" unless ($rec_len > 0);
read (FILE,$_,$rec_len);
if (substr($_,0,2) eq "\xF5\xF2") {
$_ = substr($_,0,117);
}
print decode ('cp1047', $_) . "\n";
}
close FILE;
read (FILE,$_,4) : read 4 bytes from FILE input stream and load into the variable $_
$rec_len = unpack("N",$_): interpret the first 4 bytes of the variable $_ as an unsigned 32-bit integer in big-endian order, assign to the variable $rec_len
read (FILE,$_,$rec_len): read $rec_len bytes from FILE stream into variable $_
substr($_,0,2): the first two characters of the variable $_
"\xF5\xF2": a two-character string consisting of the bytes 245 and 242
$_ = substr($_,0,117): set $_ to the first 117 characters of $_
use Encode;print decode ('cp1047', $_): interpret the contents of $_ with "code page 1047", i.e., EBCDIC and output to standard output
-w is the old way of enabling warnings.
my declares a lexically scoped variable.
open with < opens a file for reading, the filename is taken from the #ARGV array, i.e. the program's parameters. FILE is the file handle associated with the file.
read reads four bytes into the $_ variable. unpack interprets it as an unsigned 32-bit long (so the following condition can fail only when it's 0).
The next read reads that many bytes to $_ again. substr extracts a substring, and if the first two bytes there are "\xf5\xf2", it shortens the string to the first 117 bytes. It then converts the string to the code page 1047.

PERL: Jumping to lines in a huge text file

I have a very large text file (~4 GB).
It has the following structure:
S=1
3 lines of metadata of block where S=1
a number of lines of data of this block
S=2
3 lines of metadata of block where S=2
a number of lines of data of this block
S=4
3 lines of metadata of block where S=4
a number of lines of data of this block
etc.
I am writing a PERL program that read in another file,
foreach line of that file (where it must contain a number),
search the huge file for a S-value of that number minus 1,
and then analyze the lines of data of the block belongs to that S-value.
The problem is, the text file is HUGE, so processing each line with a
foreach $line {...} loop
is very slow. As the S=value is strictly increasing, are there any methods to jump to a particular line of the required S-value?
are there any methods to jump to a particular line of the required S-value?
Yes, if the file does not change then create an index. This requires reading the file in its entirety once and noting the positions of all the S=# lines using tell. Store it in a DBM file with the key being the number and the value being the byte position in the file. Then you can use seek to jump to that point in the file and read from there.
But if you're going to do that, you're better off exporting the data into a proper database such as SQLite. Write a program to insert the data into the database and add normal SQL indexes. This will probably be simpler than writing the index. Then you can query the data efficiently using normal SQL, and make complex queries. If the file change you can either redo the export, or use the normal insert and update SQL to update the database. And it will be easy for anyone who knows SQL to work with, as opposed to a bunch of custom indexing and search code.
I know the op has already accepted an answer, but a method that's served me well is to slurp the file into an array, based on changing the "record separator" ($/).
If you do something like this (not tested, but this should be close):
$/ = "S=";
my #records=<fh>;
print $records[4];
The output should be the entire fifth record (the array starts at 0, but your data starts at 1), starting with the record number (5) on a line by itself (you might need to strip that out later), following by all the remaining lines in that record.
It's very simple and fast, although it is a memory pig...
If the blocks of text are of the same length (in bytes or characters) you can calculate the position of the needed S-value in the file and seek there, then read. Otherwise, in principle you need to read lines to find the S value.
However, if there are only a few S-values to find you can estimate the needed position and seek there, then read enough to capture an S-value. Then analyze what you read to see how far off you are, and either seek again or read lines with <> to get to the S-value.
use warnings;
use strict;
use feature 'say';
use Fcntl qw(:seek);
my ($file, $s_target) = #ARGV;
die "Usage: $0 filename\n" if not $file or not -f $file;
$s_target //= 5; #/ default, S=5
open my $fh, '<', $file or die $!;
my $est_text_len = 1024;
my $jump_by = $est_text_len * $s_target; # to seek forward in file
my ($buff, $found);
seek $fh, $jump_by, SEEK_CUR; # get in the vicinity
while (1) {
my $rd = read $fh, $buff, $est_text_len;
warn "error reading: $!" if not defined $rd;
last if $rd == 0;
while ($buff =~ /S=([0-9]+)/g) {
my $s_val = $1;
# Analyze $s_val and $buff:
# (1) if overshot $s_target adjust $jump_by and seek back
# (2) if in front of $s_target read with <> to get to it
# (3) if $s_target is in $buff extract needed text
if ($s_val == $s_target) {
say "--> Found S=$s_val at pos ", pos $buff, " in buffer";
seek $fh, - $est_text_len + pos($buff) + 1, SEEK_CUR;
while (<$fh>) {
last if /S=[0-9]+/; # next block
print $_;
}
$found = 1;
last;
}
}
last if $found;
}
Tested with your sample, enlarged and cleaned up (change S=n in text as it is the same as the condition!), with $est_text_len and $jump_by set at 100 and 20.
This is a sketch. A full implementation needs to negotiate over and under seeking as outlined in comments in code. If text-block sizes don't vary much it can get in front of the needed S-value in two seek-and-reads, and then read with <> or use regex as in the example.
Some comments
The "analysis" sketched above need be done carefully. For one, a buffer may contain multiple S-value lines. Also, note that the code keeps reading if an S-value isn't in buffer.
Once you are close enough and in front of $s_target read lines by <> to get to it.
The read may not get as much as requested so you should really put that in a loop. There are recent posts with that.
Change to sysread from read for efficiency. In that case use sysseek, and don't mix with <> (which is buffered).
The code above presumes one S-value to find; adjust for more. It absolutely assumes that S-values are sorted.
This is clearly far more complex than reading lines but it does run much faster, with a very large file and only a few S-values to find. If there are many values then this may not help.
The foreach (<$fh>), indicated in the question, would cause the whole file to be read first (to build the list for foreach to go through); use while (<$fh>) instead.
If the file doesn't change (or the same file need be searched many times) you can first process it once to build an index of exact positions of S-values. Thanks to Danny_ds for a comment.
Binary search of a sorted list is an O(log N) operation. Something like this using seek:
open my $fh, '>>+', $big_file;
$target = 123_456_789;
$low = 0;
$high = -s $big_file;
while ($high - $low > 0.01 * -s $big_file) {
$mid = ($low + $high) / 2;
seek $fh, $mid, 0;
while (<$fh>) {
if (/^S=(\d+)/) {
if ($1 < $target) { $low = $mid; }
else { $high = $mid }
last;
}
}
}
seek $fh, $low, 0;
while (<$fh>) {
# now you are searching through the 1% of the file that contains
# your target S
}
Sort the numbers in the second file. Now you can proceed thru the huge file in order, processing each S-value as needed.

Unpack fields from IBM data file

I have an EBCDIC-encoded data file from an IBM mainframe source that needs to be parsed and converted to ASCII. I was able to convert by reading it per byte in hexadecimal and look for corresponding matches on ASCII.
My issue is that the EBCDIC-encoded file there are 30 bytes that are packed and need to be unpacked to get the actual values. I am trying out ways using PHP pack/unpack function as well as with Perl but found no luck. The value that I am getting doesn't seem to be the exact value that I am looking for. I tried unpacking it with C c H h N.
Assuming that file holds an EBCDIC encoded data;
pack fields are on position 635-664, 30 bytes long
data1 = 9 bytes
data2 = 9 bytes
data3 = 3 bytes
data4 = 3 bytes
data5 = 3 bytes
data6 = 3 bytes
PHP:
$datafile = fopen("/var/www/data/datafile", "rb");
$regebcdicdata = fread($datafile, 634);
$packfields = fread($datafile, 30);
$arr= unpack('c9data1/c9bdata2/c3data3/C3data4/C3data5/C3data6',$packfields);
print_r($arr);
PERL:
open my $fh, '<:raw', '/var/www/html/PERL/test';
my $bytes_read = read $fh, my $bytes, 634;
my $bytes_read2 = read $fh, my $bytes2, 30;
my ($data1,$data2,$data3,$data4,$data5,$data6) = unpack 'C9 C9 C3 C3 C3 C3', $bytes2;
UPDATE:
Already found a solution. Those 30 bytes were packed in a specified format. So I just unpack using PHP unpack function.
For EBCDIC conversion. I read it per byte, get the hexadecimal value using bin2hex() function, find matching ASCII hexadecimal value and get the ASCII representation so user can see it in readable format using chr() function.
I used conversion table at https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_ASCII_and_EBCDIC_Conversion_Tables.html.
I can't possibly help you to unpack those thirty bytes without knowing how they have been packed. Surely you must have some idea?
As for the regular EBCDIC text, you need to establish exactly which code page your document uses, and then you can simply use Perl IO to decode it
Suppose you are dealing with code page 37, then you can open your file like this
open my $fh, '<:encoding(cp37)', 'ebcdic_file' or die $!
and then you can read the data as normal. It will be retrieved as Unicode characters
This is wild guess, as I know neither which EBCDIC code page you are using nor how the thirty bytes are packed. But there is a slim chance that it will do what you want
Please try running this program and tell us the results
use strict;
use warnings 'all';
use feature 'say';
my #data = do {
open my $fh, '<:encoding(cp37)', '/var/www/html/PERL/test' or die $!;
local $/;
my $data = <$fh>;
unpack '#634 A9 A9 A3 A3 A3 A3', $data;
};
say for #data;

How to use file as a negative mask for reading another file, in Perl?

I want to extract only the junk data from the free space of a raw partition image (EXT4).
So I got this idea, to zero out the free space and then to use the result as a mask.
I have raw partition image (14GB) containing data and free space and the same raw partition image, with free space zeroed.
I want to do the following operation between these two files in Perl, for each byte of them in order to obtain the raw partition image processed, will contain only junk data from free space.
RPM - raw partition image
RPMz - raw partition image with free space zeroed
RPMp - raw partition image processed, will contain only junk data from free space
for each byte: RPM & !RPMz => RPMp
Can someone help me out with a Perl script or a starting point for this?
This is what I wrote for inverting the bytes, in order to obtain !RPMz. But it's slow, and with 100MB chunks I'm out of memory. I need some help.
use strict;
use warnings;
use bignum;
my $buffer = "";
my $path1="F:/data-lost-workspace/partition-for-zerofree/mmcblk0p12.raw";
my $path2="F:/data-lost-workspace/partition-for-zerofree/mmcblk0p12_invert.raw";
open(FILE_IN, "<$path1");
binmode(FILE_IN);
my $offset=0;
my $remaining = -s $path1;
my $length=1024*1024*100;
my $index=1;
unlink $path2;
while($remaining>0)
{
my $line=read(FILE_IN, $buffer, $length);
print $index." ".$line."\r\n";
$index++;
$remaining=$remaining-$length;
my $buffer_invert=();
my #c = split('', $buffer);
for(my $i=0;$i<$length;$i++)
{
if(ord($c[$i])==0x0)
{
$c[$i]=chr(0xFF);
}
else
{
$c[$i]=chr(0x00);
}
}
$buffer_invert=join('', #c);
open(FILE_OUT, ">>$path2");
binmode(FILE_OUT);
print FILE_OUT $buffer_invert;
close(FILE_OUT);
}
close(FILE_IN);