I got stuck on this "Cookie "mojolicious" is bigger than 4KiB" - perl

It seems I can't get rid of this "Cookie "myname" is bigger than 4KiB" error
in my startup sub:
my $app = $self->sessions(Mojolicious::Sessions->new);
my $sessions = $app->sessions;
$sessions->cookie_name('cookie_name');
# $sessions->max_cookie_size(4096*2); <= that's what I'm looking for
Any ideas on how to get around this?
Is there a way to increase and check the cookie size?
Help would be appreciated.

RFC 6265, suggests (SHOULD) that browsers allow at least 4096 for the entire collection of cookie data (not just the value):
At least 4096 bytes per cookie (as measured by the sum of the
length of the cookie's name, value, and attributes).
That's the number that the popular browsers support. If they get a bigger cookie, they likely ignore it.
Note that if you have a large cookie, the client has to send that back. That might have to split a request, even if it doesn't need the cookie, to be split over several packets and so on. Small wins matter, so the RFC goes on to say:
Servers SHOULD use as few and as small cookies as possible to avoid
reaching these implementation limits and to minimize network
bandwidth due to the Cookie header being included in every request.
Beyond that, realize that storing a bunch of information in the cookie means that the client side has the opportunity to change that information in some way that allows them to do things you don't intend.
On the technical side, there are various things that you can do to shorten a string by using your 4096 octets more efficiently. Things the uuencode (7-bit encoding), Perl's pack for numbers, and other binary techniques (Sereal, msgpack, whatever) can squeeze more info in.

Related

PlayWS calculate the size of a http call without consuming the stream

I'm currently using the PlayWS http client which returns an Akka stream. From my understanding, I can consume the stream and turn it into a Byte[] to calculate the size. However, this also consumes the stream and I can't use it anymore. Anyway around this?
I think there are two different aspects related to the question.
You want to know the size of the server response in advance to prepare buffer. Unfortunately there is no guaranteed way to do this. HTTP 1.1 spec explicitly allows transfer mode when the server does not know the size of the response in advance via chunked transfer encoding. See also quote from 3.3.1. Transfer-Encoding:
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance.
Section 3.3.3. Message Body Length specifies how length of a message body is defined and it besides the aforementioned chunked transfer encoding it also contains quite unhelpful
Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.
This is added for backward compatibility and discouraged from usage but is still legally allowed.
Still in many real world scenarios you can use Content-Length header field that the server may return. However there is a catch here as well: if gzip Content-Encoding is used, then Content-Length will contain size of the compressed body.
To sum up: in general case you can't get the size of the message body in advance before you fully get the server response i.e. in terms of code perform a blocking call on the response. You may try to use Content-Length and it might or might not help in your specific case.
You already have a fully downloaded response (or you are OK with blocking on your StreamedResponse) and you want to process it by first getting the size and only then processing the actual data. In such case you may first use getBodyAsBytes method which returns IndexedSeq[Byte] and thus has size, and then convert it into a new Source using Source.single which is actually exactly what the default (i.e. non-streaming) implementation of getBodyAsSource does.

Ensure Completeness of HTTP Messages

I am currently working on an application that is supposed to get a web page and extract information from its content.
As I learned from my research (or as it seems to me at least), there is no ideal way to determine the end of an HTTP message.
Generally, I found two different ways to do so:
Set O_NONBLOCK flag for the socket and fetch data with recv() in a while loop. Assume that the message is complete and break if it occurs once that there are no bytes in the stream.
Rely on the HTTP Content-Length header and determine the end of the message with it.
Both ways don't seem to be completely safe to me. Solution (1) could possibly break the recv loop before the message was completed. On the other hand, solution (2) requires the Content-Length header to be set correctly.
What's the best way to proceed in this case? Can I always rely on the Content-Length header to be set?
Let me start here:
Can I always rely on the Content-Length header to be set?
No, you can't. Content-Length is an optional header. However, HTTP messages absolutely must feature a way to determine their body length if they are to be RFC-compliant (cf RFC7230, sec. 3.3.3). That being said, get ready to parse on chunked encoding whenever a content length isn't specified.
As for your original problem: Ensuring the completeness of a message is actually something that should be TCP's job. But as there are such complicated things like message pipelining around, it is best to check for two things in practice:
Have all reads from the network buffer been successful?
Is the number of the received bytes identical to the predicted message length?
Oh, and as #MartinJames noted, non-blocking probably isn't the best idea here.
The end of a HTTP response is defined:
By the final (empty) chunk in case Transfer-Encoding chunked is used.
By reaching the given length if a Content-length header is given and no chunked transfer encoding is used.
By the end of the TCP connection if neither chunked transfer encoding is used not Content-length is given.
In the first two cases you have a well defined end so you can verify that the data were fully received. Only in the last case (end of TCP connection) you don't know if the connection was closed before sending all the data. But usually you get either case 1 or case 2.
To make your life easier, you might want to provide
Connection: close
header when making HTTP request - than web-server will close connection after giving you the full page requested and you will not have to deal with chunks.
It is only a viable option if you only are interested in this single page, and will not request additional resources (script files, images, etc) - in latter case this will be a very inefficient solution for both your app and the server.

security code permutations; security methodology

I'm writing a Perl email subscription management app, based on a url containing two keycode parameters. At the time of subscription, a script will create two keycodes for each subscriber that are unique in the database (see below for script sample).
The codes will be created using Digest::SHA qw(sha256_hex). My understanding of it is that one way to ensure that codes are not duplicated in the database is to create a unique prefix in the raw data to be encoded. (see below, also).
Once a person is subscribed, I then have a database record of a person with two "code" fields, each containing values that are unique in the database. Each value is a string of alphanumeric characters that is 64 characters long, using lower case (only?) a-z and 0-9, e.g:
code1: ae7518b42b0514d69ae4e87d7d9f888ad268f4a398e7b88cbaf1dc2542858ba3
code2: 71723cf0aecd27c6bbf73ec5edfdc6ac912f648683470bd31debb1a4fbe429e8
These codes are sent in newsletter emails as parameters in a subscription management url. Thus, the person doesn't have to log in to manage their subscription; but simply click the url.
My question is:
If a subscriber tried to guess the values of the pair of codes for another person, how many possible combinations would there be to not only guess code1 correctly, but also guess code2? I suppose, like the lottery, a person could get lucky and just guess both; but I want to understand the odds against that, and its impact on security.
If the combo is guessed, the person would gain access to the database; thus, I'm trying to determine the level of security this method provides, compared to a more normal method of a username and 8 character password (which generically speaking could be considered two key codes themselves, but much shorter than the 64 characters above.)
I also welcome any feedback about the overall security of this method. I've noticed that many, many email newsletters seem to use similar keycodes, and don't require logging in to unsubscribe, etc. To, the primary issue (besides ease of use) is that a person should not be able to unsubscribe someone else.
Thanks!
Peter (see below for the code generation snippet)
Note that each ID and email would be unique.
The password is a 'system' password, and would be the same for each person.
#
#!/usr/bin/perl
use Digest::SHA qw(sha256_hex);
$clear = `clear`;
print $clear;
srand;
$id = 1;
$email = 'someone#domain.com';
$tag = ':!:';
$password = 'z9.4!l3tv+qe.p9#';
$rand_str = '9' x 15;
$rand_num = int(rand( $rand_str ));
$time = time() * $id;
$key_data = $id . $tag . $password . $rand_num . $time;
$key_code = sha256_hex($key_data);
$email_data = $email . $tag . $password . $time . $rand_num;
$email_code = sha256_hex($email_data);
print qq~
ID: $id
EMAIL: $email
KEY_DATA: $key_data
KEY_CODE: $key_code
EMAIL_DATA: $email_data
EMAIL_CODE: $email_code
~;
exit;
#
This seems like a lot of complexity to guard against a 3rd party unsubscribing someone. Why not generate a random code for each user, and store it in the database alongside the username? The method you are using creates a long string of digits, but there isn't actually much randomness in it. SHA is a deterministic algorithm that thoroughly scrambles bits, but it doesn't add entropy.
For an N bit truly random number, an attacker will only have a 1/(2^N) chance of guessing it right each time. Even with a small amount of entropy, say, 64 bits, your server should be throttling unsubscribe requests from the attacking IP address long before the attacker gets significant odds of succeeding. They'd have better luck guessing the user's email password, or intercepting the unencrypted email in transit.
That is why the unsubscribe codes are usually short. There's no need for a long code, and a long URL is more likely to be truncated or mistyped.
If you're asking how difficult it would be to "guess" two 256-bit "numbers", getting the one specific person you want to hack, that'd be 2^512:1 against. If there are, say, 1000 users in the database, and the attacker doesn't care which one s/he gets, that's 2^512:1000 against - not a significant change in likelihood.
However, it's much simpler than that if your attacker is either in control of (hacked in is close enough) one of the mail servers from your machine to the user's machine, or in control of any of the routers along the way, since your email goes out in plain text. A well-timed hacker who saw the email packet go through would be able to see the URL you've embedded no matter how many bits it is.
As with many security issues, it's a matter of how much effort to put in vs the payoff. Passwords are nice in that users expect them, so it's not a significant barrier to send out URLs that then need a password to enter. If your URL were even just one SHA key combined with the password challenge, this would nearly eliminate a man-in-the-middle attack on your emails. Up to you whether that's worth it. Cheap, convenient, secure. Pick one. :-)
More effort would be to gpg-encrypt your email with the client's public key (not your private one). The obvious downside is that gpg (or pgp) is apparently so little used that average users are unlikely to have it set up. Again, this would entirely eliminate MITM attacks, and wouldn't need a password, as it basically uses the client-side gpg private key password.
You've essentially got 1e15 different possible hashes generated for a given user email id (once combined with other information that could be guessed). You might as well just supply a hex-encoded random number of the same length and require the 'unsubscribe' link to include the email address or user id to be unsubscribed.
I doubt anyone would go to the lengths required to guess a number from 1 to 1e15, especially if you rate limit unsubscribe requests, and send a 'thanks, someone unsubscribed you' email if anyone is unsubscribed, and put a new subsubscription link into that.
A quick way to generate the random string is:
my $hex = join '', map { unpack 'H*', chr(rand(256)) } 1..8;
print $hex, "\n";
b4d4bfb26fddf220
(This gives you 2^64, or about 2*10^19 combinations. Or 'plenty' if you rate limit.)

Safe non-tamperable URL component in Perl using symmetric encryption?

OK, I'm probably just having a bad Monday, but I have the following need and I'm seeing lots of partial solutions but I'm sure I'm not the first person to need this, so I'm wondering if I'm missing the obvious.
$client has 50 to 500 bytes worth of binary data that must be inserted into the middle of a URL and roundtrip to their customer's browser. Since it's part of the URL, we're up against the 1K "theoretical" limit of a GET URL. Also, $client doesn't want their customer decoding the data, or tampering with it without detection. $client would also prefer not to store anything server-side, so this must be completely standalone. Must be Perl code, and fast, in both encoding and decoding.
I think the last step can be base64. But what are the steps for encryption and hashing that make the most sense?
I have some code in a Cat App that uses Crypt::Util to encode/decode a user's email address for an email verification link.
I set up a Crypt::Util model using Catalyst::Model::Adaptor with a secret key. Then in my Controller I have the following logic on the sending side:
my $cu = $c->model('CryptUtil');
my $token = $cu->encode_string_uri_base64( $cu->encode_string( $user->email ) );
my $url = $c->uri_for( $self->action_for('verify'), $token );
I send this link to the $user->email and when it is clicked on I use the following.
my $cu = $c->model('CryptUtil');
if ( my $id = $cu->decode_string( $cu->decode_string_uri_base64($token) ) ) {
# handle valid link
} else {
# invalid link
}
This is basically what edanite just suggested in another answer. You'll just need to make sure whatever data you use to form the token with that the final $url doesn't exceed your arbitrary limit.
Create a secret key and store it on the server. If there are multiple servers and requests aren't guaranteed to come back to the same server; you'll need to use the same key on every server. This key should be rotated periodically.
If you encrypt the data in CBC (Cipher Block Chaining) mode (See the Crypt::CBC module), the overhead of encryption is at most two blocks (one for the IV and one for padding). 128 bit (i.e. 16 byte) blocks are common, but not universal. I recommend using AES (aka Rijndael) as the block cipher.
You need to authenticate the data to ensure it hasn't been modified. Depending on the security of the application, just hashing the message and including the hash in the plaintext that you encrypt may be good enough. This depends on attackers being unable to change the hash to match the message without knowing the symmetric encryption key. If you're using 128-bit keys for the cipher, use a 256-bit hash like SHA-256 (you can use the Digest module for this). You may also want to include some other things like a timestamp in the data to prevent the request from being repeated multiple times.
I see three steps here. First, try compressing the data. With so little data bzip2 might save you maybe 5-20%. I'd throw in a guard to make sure it doesn't make the data larger. This step may not be worth while.
use Compress::Bzip2 qw(:utilities);
$data = memBzip $data;
You could also try reducing the length of any keys and values in the data manually. For example, first_name could be reduced to fname.
Second, encrypt it. Pick your favorite cipher and use Crypt::CBC. Here I use Rijndael because its good enough for the NSA. You'll want to do benchmarking to find the best balance between performance and security.
use Crypt::CBC;
my $key = "SUPER SEKRET";
my $cipher = Crypt::CBC->new($key, 'Rijndael');
my $encrypted_data = $cipher->encrypt($data);
You'll have to store the key on the server. Putting it in a protected file should be sufficient, securing that file is left as an exercise. When you say you can't store anything on the server I presume this doesn't include the key.
Finally, Base 64 encode it. I would use the modified URL-safe base 64 which uses - and _ instead of + and / saving you from having to spend space URL encoding these characters in the base 64 string. MIME::Base64::URLSafe covers that.
use MIME::Base64::URLSafe;
my $safe_data = urlsafe_b64encode($encrypted_data);
Then stick it onto the URL however you want. Reverse the process for reading it in.
You should be safe on size. Encrypting will increase the size of the data, but probably by less than 25%. Base 64 will increase the size of the data by a third (encoding as 2^6 instead of 2^8). This should leave encoding 500 bytes comfortably inside 1K.
How secure does it need to be? Could you just xor the data with a long random string then add an MD5 hash of the whole lot with another secret salt to detect tampering?
I wouldn't use that for banking data, but it'd probably be fine for most web things...
big

Compressing HTTP request with LWP, Apache, and mod_deflate

I have a client/server system that performs communication using XML transferred using HTTP requests and responses with the client using Perl's LWP and the server running Perl's CGI.pm through Apache. In addition the stream is encrypted using SSL with certificates for both the server and all clients.
This system works well, except that periodically the client needs to send really large amounts of data. An obvious solution would be to compress the data on the client side, send it over, and decompress it on the server. Rather than implement this myself, I was hoping to use Apache's mod_deflate's "Input Decompression" as described here.
The description warns:
If you evaluate the request body yourself, don't trust the Content-Length header! The Content-Length header reflects the length of the incoming data from the client and not the byte count of the decompressed data stream.
So if I provide a Content-Length value which matches the compressed data size, the data is truncated. This is because mod_deflate decompresses the stream, but CGI.pm only reads to the Content-Length limit.
Alternatively, if I try to outsmart it and override the Content-Length header with the decompressed data size, LWP complains and resets the value to the compressed length, leaving me with the same problem.
Finally, I attempted to hack the part of LWP which does the correction. The original code is:
# Set (or override) Content-Length header
my $clen = $request_headers->header('Content-Length');
if (defined($$content_ref) && length($$content_ref)) {
$has_content = length($$content_ref);
if (!defined($clen) || $clen ne $has_content) {
if (defined $clen) {
warn "Content-Length header value was wrong, fixed";
hlist_remove(\#h, 'Content-Length');
}
push(#h, 'Content-Length' => $has_content);
}
}
elsif ($clen) {
warn "Content-Length set when there is no content, fixed";
hlist_remove(\#h, 'Content-Length');
}
And I changed the push line to:
push(#h, 'Content-Length' => $clen);
Unfortunately this causes some problem where content (truncated or not) doesn't even get to my CGI script.
Has anyone made this work? I found this which does compression on a file before uploading, but not compressing a generic request.
Although you said you didn't want to do the compression yourself, there are lots of perl modules which will do both sides for you, Compress::Zlib for example.
I have a cheat (with a .net part of the company) where I get passed XML as a separate parameter posted in, then can handle it as if it was a string rather than faffing about with SOAP like stuff.
I don't think you can change the Content-Length like that. It would confuse Apache, because mod_deflate wouldn't know how much compressed data to read. What about having the client add an X-Uncompressed-Length header, and then use a modified version of CGI.pm that uses X-Uncompressed-Length (if present) instead of Content-Length? (Actually, you probably don't need to modify CGI.pm. Just set $ENV{'CONTENT_LENGTH'} to the appropriate value before initializing the CGI object or calling any CGI functions.)
Or, use a lower-level module that uses the bucket brigade to tell how much data to read.
I am not sure if I am following you with what you want, but I have a custom get/post module, that I use to do some non standard stuff. The below code will read in anything sent via post, or STDIN.
read(STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
Instead of using using $ENV's value use your's. I hope this helps, and sorry if it doesn't.