How can I encode a string for HTML? - perl

I'm looking for a simple way to HTML encode a string/object in Perl. The fewer additional packages used the better.

HTML::Entities is your friend here.
use HTML::Entities;
my $encoded = encode_entities( "foo & bar & <baz>" );

When this question was first answered, HTML::Entities was the module most people probably used. It's pure Perl and by default will escape the HTML reserved characters ><'"& and wide characters.
Recently, HTML::Escape showed up. It has both XS and pure Perl. If you're using the XS version, it's about ten times faster than HTML::Entities. However, it only escapes ><'"& and has no way to change the defaults. Here's the difference with the XS version:
Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 14 wallclock secs (14.09 usr + 0.01 sys = 14.10 CPU) # 709.22/s (n=10000)
html_escape: 1 wallclock secs ( 0.68 usr + 0.00 sys = 0.68 CPU) # 14705.88/s (n=10000)
And here's the fair fight with pure Perl versions on each side:
Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 14 wallclock secs (13.79 usr + 0.01 sys = 13.80 CPU) # 724.64/s (n=10000)
html_escape: 7 wallclock secs ( 7.57 usr + 0.01 sys = 7.58 CPU) # 1319.26/s (n=10000)
You can get these benchmarks in Surveyor::Benchmark::HTMLEntities. I explain how I distribute benchmarks using Surveyor::App.

Which do you need to encode, a string or an object? If it's just a string, then you should just have to worry about encoding issues such as UTF-8, and CGI::escape will probably do the trick for you. If it's an object, you'll need to serialize it first, which opens up a whole new set of issues, but you might want to consider JSON-encoding it.
PS. Although since I can't find any recent documentation on this method (it's actually imported from CGI::Util and is marked as "internal"), you should probably use escapeHTML() as daxim points out in his comment: http://search.cpan.org/perldoc?CGI#AUTOESCAPING_HTML

Related

What is the recommended way in Julia to create a shared module?

After some experiment and searching, I figured out 2 ways of creating a shared module
that holds some constant values.
SCHEME A:
# in file sharedconstants.jl:
module sharedconstants
kelvin = 273.15
end
# -------------------------
# in file main.jl:
include("./sharedconstants.jl");
using .sharedconstants
print(sharedconstants.kelvin, "\n");
# -------------------------
SCHEME B:
# in file sharedconstants.jl:
module sharedconstants
kelvin = 273.15
end
# -------------------------
# in file main.jl:
import sharedconstants
print(sharedconstants.kelvin, "\n");
# -------------------------
Scheme B does not always work and when it fails it throws
the error of not finding sharedconstants in current Path. Plus, Scheme B
requires the name of module (sharedconstants) the same as the trunk of
the file name. I wonder which way of the above is better in terms of
compiling and execution. Also is there any other approach to do the job?
I transferred from FORTRAN and I am quite used to simply
use sharedconstants in my code.
For performance reasons this should be a const (BTW module names use CamelNaming):
module SharedConstants2
const kelvin = 273.15
end
Writing it this way makes it type-stable which results in huge performance difference:
julia> #btime sharedconstants.kelvin * 3
18.574 ns (1 allocation: 16 bytes)
819.4499999999999
julia> #btime SharedConstants2.kelvin * 3
0.001 ns (0 allocations: 0 bytes)
819.4499999999999
Regarding the question "where to place it" I would recommend doing a Julia package - start reading here: https://pkgdocs.julialang.org/v1/creating-packages/
Finally, you might have a look at the PhysicalConstants.jl package https://github.com/JuliaPhysics/PhysicalConstants.jl

How to generate a good seed

I'm looking for a method to generate a good seed for generating different series of random numbers in processes that starts at the same time.
I would like to avoid using one of the math or crypto libraries because I'm picking random numbers very frequently and my cpu resources are very limited.
I found few example for setting seeds. I tested them using the following method:
short program that picks 100 random numbers out of 5000 options. So each value has 2% chance to be selected.
run this program 100 times, so in theory, in a truly random environment, all possible values should be picked at least once.
count the number of values that were not selected at all.
This is the perl code I used. In each test I opt in only one method for generating seed:
#!/usr/bin/perl
#$seed=5432;
#$seed=(time ^ $$);
#$seed=($$ ^ unpack "%L*", `ps axww | gzip -f`);
$seed=(time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);
srand ($seed);
for ($i=0 ; $i< 100; $i++) {
printf ("%03d \n", rand (5000)+1000);
}
I ran the program 100 time and counted the values NOT selected using:
# run the program 100 times
for i in `seq 0 99`; do /tmp/rand_test.pl ; done > /tmp/list.txt
# test 1000 values (out of 5000). It should be good-enough representation.
for i in `seq 1000 1999`; do echo -n "$i "; grep -c $i /tmp/list.txt; done | grep " 0" | wc -l
The table shows the result of the tests (Lower value is better):
count Seed generation method
114 default - the line: "srand ($seed);" is commented ou
986 constant seed (5432)
122 time ^ $$
125 $$ ^ unpack "%L*", `ps axww | gzip -f`
163 time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`
The constant seed method showed 986 or 1000 values not selected. In other words, only 1.4% of the possible values were selected. This is close enough to the 2% that was expected.
However, I expected that the last option that was recommended in few places, would be significantly better than the default.
Is there any better method to generate a seed for each of the processes?
I'm picking random numbers very frequently and my cpu resources are very limited.
You're worrying before you even have made a measurement.
Is there any better method to generate a seed for each of the processes?
Yes. You have to leave user space which is prone to manipulation. Simply use Crypt::URandom.
It is safe for any purpose, including fetching a seed.
It will use the kernel CSPRNG for each operating system (see source code) and hence avoid the problems shown in the article above.
It does not suffer from the documented rand weakness.
Don't generate a seed. Let Perl do it for you. Don't call srand (or call it without a parameter if you do).
Quote srand,
If srand is not called explicitly, it is called implicitly without a parameter at the first use of the rand operator
and
When called with a parameter, srand uses that for the seed; otherwise it (semi-)randomly chooses a seed.
It doesn't simply use the time as the seed.
$ perl -M5.014 -E'say for srand, srand'
2665271449
1007037147
Your goal seems to be how to generate random numbers rather than how to generate seeds. In most cases, just use a cryptographic RNG (such as Crypt::URandom in Perl) to generate the random numbers you want, rather than generate seeds for another RNG. (In general, cryptographic RNGs take care of seeding and other issues for you.) You should not use a weaker RNG unless—
the random values you generate aren't involved in information security (e.g., the random values are neither passwords nor nonces nor encryption keys), and
either—
you care about repeatable "randomness" (which is not the case here), or
you have measured the performance of your application and find random number generation to be a performance bottleneck.
Since you will generate random names for the purpose of querying a database, which may be in a remote location, it will be highly unlikely that the random number generation itself will be the performance bottleneck.

How to get system time in nano seconds in Perl?

I wanted to get system time in nano seconds in Perl. I tried Time::HiRes module and it's supporting only until micro seconds.
The Time::HiRes module supports up to microseconds. As #MarcoS answered in the today common hardware is nonsense to use nanosecond precision counted by software.
Two subsequent calls, getting the current microseconds and print both afterwards
perl -MTime::HiRes=time -E '$t1=time; $t2=time; printf "%.6f\n", $_ for($t1, $t2)'
results (on my system)
1411630025.846065
1411630025.846069
e.g. only getting the current time two times and nothing between costs 3-4 microseconds.
If you want some "nanosecond numbers", simply print the time with 9digit precision, like:
perl -MTime::HiRes=time -E '$t1=time;$t2=time; printf "%.9f\n", $_ for($t1, $t2)'
you will get like:
1411630910.582282066
1411630910.582283974
pretty nanosecond times ;)
Anyway, you can sleep with reasonable nanosecond precision. From the doc
nanosleep ( $nanoseconds )
Sleeps for the number of nanoseconds (1e9ths of a second) specified.
Returns the number of nanoseconds actually slept (accurate
only to microseconds, the nearest thousand of them).
...
Do not expect nanosleep() to be exact down to one nanosecond.
Getting even accuracy of one thousand nanoseconds is good.
The time resolution depends on harwdware clock frequency, of course.
For example, an AMD 5200 has a 2.6Ghz clock, which has 0.4ns interval. The cost of gettimeofday with RDTSCP is 221 cycles: that equals 88ns at best. The minimal cost of a Perl routine will be hundreds times...
So, the final answer is:
On today's hardware, forget nano seconds. With Perl and with any high level language... You can get in that proximity just with assembler, but forget to count single nanoseconds, with software...
Perl get Time
useing Time::HiRes
c:\Code>perl -MDateTime::HiRes -E "while (1) {say DateTime::HiRes->now()->strftime('%F %T.%N');}"
or
use Time::HiRes qw(time);
use POSIX qw(strftime);
my $t = time;
my $date = strftime "%F %T.%N", localtime $t;
$date .= sprintf ".%03d", ($t-int($t))*1000; # without rounding
print $date, "\n";
The fundamental issue is that Perl's Time::HiRes uses an ordinary floating point value to represent the timestamp, usually implemented as a native C double, which on many platforms is a 64-bit IEEE float, with a 53-bit mantissa.
That means that timestamps are recorded to a resolution that varies with how far from 1970 they are:
approximate date range
resolution
13 Nov 1969 ~ 18 Feb 1970
less than 0.93ns
nanosecond resolution available
25 Sep 1969 ~ 08 Apr 1970
1.86ns
20 Jun 1969 ~ 14 Jul 1970
3.73ns
08 Dec 1968 ~ 24 Jan 1971
7.45ns
16 Nov 1967 ~ 16 Feb 1972
14.9ns
Jul 1961 ~ Jun 1978
29.8ns
1978~1986 & 1953~1961
59.6ns
1987~2003 & 1936~1952
0.119µs
►
2004~2037 & 1902~1935
0.238µs
◄
2038~2105 & 1834~1901
0.477µs
2106~2242 & 1698~1833
0.954µs
before 1697 or after 2243
worse than microsecond resolution
The "forget nanoseconds" people are all wrong: perl on normal machines now often returns the same microseconds in subsequent calls, so while atual nanosecond resolution might not (yet) be achievable, you absolutely do need nanoseconds now because microseconds are to coarse.
The above answers are wrong - fudging more digits from the imprecision of floats is NOT giving more precision:-
perl -MTime::HiRes=time -E 'while(1){my $now=sprintf("%.9f",time); die if($now==$last);$last=$now}'
That code does this:
Died at -e line 1.

Perl script shows different behavior inside cron job

I'm executing the following commands in a perl script.
#!/usr/bin/perl
my $MPSTAT="/usr/bin/mpstat";
my $GREP="/bin/grep";
my $FREE = "/usr/bin/free";
my $AWK = "/bin/awk";
my $cpu = `$MPSTAT | $GREP all | $AWK '{print (100 - \$12)}'`;
print "CPU is $cpu";
When I run this perl script manually it's getting executed properly and providing the proper CPU Usage in % (100 - Idle CPU).
But when I execute it as a cronjob it always prints 100 & it appears that $12 of awk is getting the value of 0. Any pointers on why it's behaving differently in cron would be helpful.
The main differences between running as a child of cron are:
The user ID might be different (root vs normal user)
The environment is nearly empty, at least pretty different
The second part often means that programs might output in a different language or number format due to the values of the LANG and LC_* environment variables which might be set for the normal user but not when run under cron (or vice versa).
Found the solution using the hint provided by #WinnieNicklaus
mpstat is giving different results in cron.
Normal Execution:
04:53:18 PM all 49.51 0.00 4.79 2.67 0.02 0.34 0.00 0.00 42.68
Inside Cron:
16:54:01 all 49.51 0.00 4.79 2.67 0.02 0.34 0.00 0.00 42.68
Since PM is not getting printed inside cron, when changed the argument for awk as $11 instead of $12 it started working.

Perl module for parsing natural language time duration specifications (similar to the "at" command)?

I'm writing a perl script that takes a "duration" option, and I'd like to be able to specify this duration in a fairly flexible manner, as opposed to only taking a single unit (e.g. number of seconds). The UNIX at command implements this kind of behavior, by allowing specifications such as "now + 3 hours + 2 days". For my program, the "now" part is implied, so I just want to parse the stuff after the plus sign. (Note: the at command also parses exact date specifications, but I only want to parse durations.)
Is there a perl module for parsing duration specifications like this? I don't need the exact syntax accepted by at, just any reasonable syntax for specifying time durations.
Edit: Basically, I want something like DateTime::Format::Flexible for durations instead of dates.
Take a look at DateTime::Duration and DateTime::Format::Duration:
use DateTime::Duration;
use DateTime::Format::Duration;
my $formatter = DateTime::Format::Duration->new(
pattern => '%e days, %H hours'
);
my $dur = $formatter->parse_duration('2 days, 5 hours');
my $dt = DateTime->now->add_duration($dur);
Time::ParseDate has pretty flexible syntax for relative times. Note that it always returns an absolute time, so you can't tell the difference between "now + 3 hours + 2 days" and "3 hours + 2 days" (both of those are valid inputs to parsedate, and will return the same value). You could subtract time if you want to get a duration instead.
Also, it doesn't return DateTime objects, just a UNIX epoch time. I don't know if that's a problem for your application.
I ended up going with Time::Duration::Parse.