Related
I am confused by one perl question, anyone has some idea?
I use one hash structure to store the keys and values like:
$hash{1} - > a;
$hash{2} - > b;
$hash{3} - > c;
$hash{4} - > d;
....
more than 1000 lines. I give a name like %hash
and then, I plan to have one loop statement to search for all keys to see whether it will match with the value from the file.
for example, below is the file content:
first line 1
second line 2
nothing
another line 3
my logic is:
while(read line){
while (($key, $value) = each (%hash))
{
if ($line =~/$key/i){
print "found";
}
}
so my expectation is :
first line 1 - > return found
second line 2 - > return found
nothing
another line 3 - > return found
....
However, during my testing, only first line and second line return found, for 'another line3', the
program does not return 'found'
Note: the hash has more than 1000 records.
So I try to debug it and add some count inside and find out for those found case, the loop has run like 600 or 700 times, but for the 'another line3' case, it only runs around 300 times and just exit the loop and did not return found.
any idea why it happens like that?
and I have done one more testing is if my hash structure is small, like only 10 keys, the logic works.
and I try to use foreach, and It looks like foreach does not have this kind of issue.
The pseudo code you give should work fine, but there might be a subtle problem.
If after you found your key and print it out you end the while loop, the next time each is called, it will continue where you left. Put it in other words "each" is an iterator that stores its state in the hash it iterates over.
In http://blogs.perl.org/users/rurban/2014/04/do-not-use-each.html the author explains this in more detail. His conclusion:
So each should be treated as in php: Avoid it like a plague. Only use it in optimized cases where you know what you are doing.
The problem is not very well articulated by OP, provided sample data are poor for demonstration purpose.
Following sample code is an attempt based on provided problem description by OP.
Recreate filter hash from DATA block, compose $re_filter consisting of filter hash keys, walk through a file given as an argument on command line to filter out lines matching $re_filter.
use strict;
use warnings;
my $data = do { local $/; <DATA> };
my %hash = split ' ', $data;
my $re_filter = join('|',keys %hash);
/$re_filter/ && print for <>;
__DATA__
1 a
2 b
3 c
4 d
Input data file content
first line 1
second line 2
nothing
another line 3
Output
first line 1
second line 2
another line 3
I have this one-line Unix shell script
for i in 1 2 3 4; do sed "$(tr -dc '0-9' < /dev/urandom | fold -w 5 |
awk '$0>=35&&$0<=65570' | head -1)q;d" "$0"; done | perl -p00e
's/\n(?!\Z)/ /g'
The script has 65K words in it, one per line, from line 35 to 65570. The code and the data are in the same file.
This script outputs 4 space-separated random words from this list with a newline at the end. For example
first fourth third second
How can I make this one-liner much shorter with Perl, keeping the
tr -dc '0-9' < /dev/urandom
part?
Keeping it is important since it provides Cryptographically Secure Pseudo-Random Numbers (CSPRNs) for all Unix OSs. Of course, if Perl can get numbers from /dev/urandom then the tr can be replaced with Perl too, but the numbers from urandom need to stay.
For convenience, I shared the base script with 65K words
65kwords.txt
or
65kwords.txt
Please use only core modules. It would be used for generating "human memorable passwords".
Later, the (hashing) iteration count, where we would use this to store the passwords would be extremely high, so brute-force would be very slow, even with many many GPUs/FPGAs.
You mention needing a CSPRN, which makes this a non trivial exercise - if you need cryptographic randomness, then using built in stuff (like rand) is not a good choice, as the implementation is highly variable across platforms.
But you've got Rand::Urandom which looks like it does the trick:
By default it uses the getentropy() (only available in > Linux 3.17) and falls back to /dev/arandom then /dev/urandom.
#!/usr/bin/env perl
use strict;
use warnings;
use Rand::Urandom;
chomp ( my #words = <DATA> );
print $words[rand #words], " " for 1..4;
print "\n";
__DATA__
yarn
yard
wound
worst
worry
work
word
wool
wolf
wish
wise
wipe
winter
wing
wind
wife
whole
wheat
water
watch
walk
wake
voice
Failing that though - you can just read bytes from /dev/urandom directly:
#!/usr/bin/env perl
use strict;
use warnings;
my #number_of_words = 4;
chomp ( my #words = <DATA> );
open ( my $urandom, '<:raw', '/dev/urandom' ) or die $!;
my $bytes;
read ( $urandom, $bytes, 2 * $number_of_words ); #2 bytes 0 - 65535
#for testing
#unpack 'n' is n An unsigned short (16-bit)
# unpack 'n*' in a list context returns a list of these.
foreach my $value ( unpack ( "n*", $bytes ) ) {
print $value,"\n";
}
#actually print the words.
#note - this assumes that you have the right number in your list.
# you could add a % #words to the map, e.g. $words[$_ % #words]
#but that will mean wrapping occurs, and will alter the frequency distribution.
#a more robust solution would be to fetch additional bytes if the 'slot' is
#empty.
print join " ", ( map { $words[$_] } unpack ( "n*", $bytes )),"\n";
__DATA__
yarn
yard
wound
worst
#etc.
Note - the above relies on the fact that your wordlist is the same size as two bytes (16 bits) - if this assumption isn't true, you'll need to deal with 'missed' words. A crude approach would be to take a modulo, but that would mean some wrapping and therefore not quite truly even distribution of word picks. Otherwise you can bit-mask and reroll, as indicated below:
On a related point though - have you considered not using a wordlist, and instead using consonant-vowel-consonant groupings?
E.g.:
#!/usr/bin/env perl
use strict;
use warnings;
#uses /dev/urandom to fetch bytes.
#generates consonant-vowel-consonant groupings.
#each are 11.22 bits of entropy, meaning a 4-group is 45 bits.
#( 20 * 6 * 20 = 2400, which is 11.22 bits of entropy log2 2400
#log2(2400 ^ 4) = 44.91
#but because it's generated 'true random' it's a know entropy string.
my $num = 4;
my $format = "CVC";
my %letters = (
V => [qw ( a e i o u y )],
C => [ grep { not /[aeiouy]/ } "a" .. "z" ], );
my %bitmask_for;
foreach my $type ( keys %letters ) {
#find the next power of 2 for the number of 'letters' in the set.
#So - for the '20' letter group, that's 31. (0x1F)
#And for the 6 letter group that's 7. (0x07)
$bitmask_for{$type} = ( 2 << log ( #{$letters{$type}} ) / log 2 ) - 1 ;
}
open( my $urandom, '<:raw', '/dev/urandom' ) or die $!;
for ( 1 .. $num ) {
for my $type ( split //, $format ) {
my $value;
while ( not defined $value or $value >= #{ $letters{$type} } ) {
my $byte;
read( $urandom, $byte, 1 );
#byte is 0-255. Our key space is 20 or 6.
#So rather than modulo, which would lead to an uneven distribution,
#we just bitmask and discard and 'too high'.
$value = (unpack "C", $byte ) & $bitmask_for{$type};
}
print $letters{$type}[$value];
}
print " ";
}
print "\n";
close($urandom);
This generates 3 character CVC symbols, with a known entropy level (11.22 per 'group') for making reasonably robust passwords. (45 bits as opposed to the 64 bits of your original, although obviously you can add extra 'groups' to gain 11.22 bits per time).
This answer is not cryptographically safe!
I would do this completely in Perl. No need for a one-liner. Just grab your word-list and put it into a Perl program.
use strict;
use warnings;
my #words = qw(
first
second
third
fourth
);
print join( q{ }, map { $words[int rand #words] } 1 .. 4 ), "\n";
This grabs four random words from the list and outputs them.
rand #words evaluates #words in scalar context, which gives the number of elements, and creates a random floating point value between 0 and smaller than that number. int cuts off the decimals. This is used as the index to grab an element out of #words. We repeat this four times with the map statement, where the 1 .. 4 is the same as passing a list of (1, 2, 3, 4) into map as an argument. This argument is ignored, but instead our random word is picked. map returns a list, which we join on one space. Finally we print the resulting string, and a newline.
The word list is created with the quoted words qw() operator, which returns a list of quoted words. It's shorthand so you don't need to type all the quotes ' and commas ,.
If you'd want to have the word list at the bottom you could either put the qw() in a sub and call it at the top, or use a __DATA__ section and read from it like a filehandle.
The particular method using tr and fold on /dev/urandom is a lot less efficient than it could be, so let's fix it up a little bit, while keeping the /dev/urandom part.
Assuming that available memory is enough to contain your script (including wordlist):
chomp(#words = <DATA>);
open urandom, "/dev/urandom" or die;
read urandom, $randbytes, 4 * 2 or die;
print join(" ", map $words[$_], unpack "S*", $randbytes), "\n";
__DATA__
word
list
goes
here
This goes for brevity and simplicity without outright obfuscation — of course you could make it shorter by removing whitespace and such, but there's no reason to. It's self-contained and will work with several decades of perls (yes, those bareword filehandles are deliberate :-P)
It still expects exactly 65536 entries in the wordlist, because that way we don't have to worry about introducing bias to the random number choice using a modulus operator. A slightly more ambitious approach might be to read 48 bytes from urandom for each word, turning it into a floating-point value between 0 and 1 (portable to most systems) and multiplying it by the size of the word list, allowing for a word list of any reasonable size.
A lot of nonsense is talked about password strength, and I think you're overestimating the worth of several of your requirements here
I don't understand your preoccupation with making your code "much shorter with perl". (Why did you pick Perl?) Savings here can only really be useful to make the script quicker to read and compile, but they will be dwarfed by the half megabyte of data following the code which must also be read
In this context, the usefulness to a hacker of a poor random number generator depends on prior knowledge of the construction of the password together with the passwords that have been most recently generated. With a sample of only 65,000 words, even the worst random number generator will show insignificant correlation between successive passwords
In general, a password is more secure if it is longer, regardless of its contents. Forming a long password out of a sequence of English words is purely a way of making the sequence more memorable
"Of course later, the (hashing) iteration count ... would be extreme high, so brute-force [hacking?] would be very slow"
This doesn't follow at all. Cracking algorithms won't try to guess the four words you've chosen: they will see only a thirty-character (or so) string consisting only of lower-case letters and spaces, and whose origin is insignificant. It will be no more or less crackable than any other password of the same length with the same character set
I suggest that you should rethink your requirements and so make things easier for yourself. I don't find it hard to think of four English words, and don't need a program to do it for me. Hint: pilchard is a good one: they never guess that!
If you still insist, then I would write something like this in Perl. I've used only the first 18 lines of your data for
use strict;
use warnings 'all';
use List::Util 'shuffle';
my #s = map /\S+/g, ( shuffle( <DATA> ) )[ 0 .. 3 ];
print "#s\n";
__DATA__
yarn
yard
wound
worst
worry
work
word
wool
wolf
wish
wise
wipe
winter
wing
wind
wife
whole
wheat
output
wind wise winter yarn
You could use Data::Random::rand_words()
perl -MData::Random -E 'say join $/, Data::Random::rand_words(size => 4)'
This is a perl code I use for compiling pressure data.
$data_ct--;
mkdir "365Days", 0777 unless -d "365Days";
my $file_no = 1;
my $j = $num_levels;
for ($i = 0; $i < $data_ct; $i++) {
if ($j == $num_levels) {
close OUT;
$j = 0;
my $file = "365days/wind$file_no";
$file_no++;
open OUT, "> $file" or die "Can't open $file: $!";
}
{
$wind_direction = (270-atan2($vwind[$i], $uwind[$i])*(180/pi))%360;
}
$wind_speed = sqrt($uwind[$i]*$uwind[$i]+$vwind[$i]*$vwind[$i]);
printf OUT "%.0f %.0f %.1f\n", $level[$i], $wind_direction, $wind_speed;
$j++;
}
$file_no--;
print STDERR "Wrote out $file_no wind files.\n";
print STDERR "Done\n";
The problem I am having is when it prints out the numbers, I want it to be in this format
Level Wind direction windspeed
250 320 1.5
870 56 4.6
Right now when I run the script the columns names do not show up rather just the numbers. Can someone direct me as to how to rectify the script?
There are several ways to do this in Perl. First, Perl has built in form ability. It's been a part of Perl since version 3.0 (about 20 years old). However, it is rarely used. In fact, it is so rarely used I am not even going to attempt to write an example with it because I'd have to spend way too much time relearning it. It's there and documented.
You can try to figure it out for yourself. Or, maybe some old timer Perl programmer might wake up from his nap and help out. (All bets are off if it's meatloaf night at the old age home, though).
Perl has evolved greatly in the last few decades, and this old forms bit represents a much older way of writing Perl programs. It just isn't pretty.
Another way this can be done and is more popular is to use the printf function. If you're not familiar with C and printf from there, it can be a bit intimidating to use. It depends upon formatting codes (the things that start with % to specify what you want to print (strings, integers, floating point numbers, etc.), and how you want those values formatted.
Fortunately, printf is so useful, that most programming languages have their own version of printf. Once you learn it, your knowledge is transferable to other places. There's an equivalent sprintf for setting variable with formats.
# Define header and line formats
my $header_fmt = "%-5.5s %-14.14s %-9.9s\n";
my $data_fmt = "%5d %14d %9.1f\n";
# Print my table header
printf $header_fmt, "Level", "Wind direction", "windspeed";
my $level = 230;
my $direction = 120;
my $speed = 32.3;
# Print my table data
printf $data_fmt, $level, $direction, $speed;
This prints out:
Level Wind direction windspeed
230 120 32.3
I like defining the format of my printed lines all together, so I can tweak them to get what I want. It's a great way to make sure your data line lines up with your header.
Okay, Matlock wasn't on tonight, so this crusty old Perl programmer has plenty of time.
In my previous answer, I said there was an old way of doing forms in Perl, but I didn't remember how it went. Well, I spent some time and got you an example of how it works.
Basically, you sort of need globalish variables. I thought you needed our variables for this to work, but I can get my variables to work if I define them on the same level as my format statements. It's not pretty.
You use GLOBS to define your formats with _TOP appended for your headers. Since I'm printing these on STDOUT, I define STDOUT_TOP for my heading and STDOUT for my data lines.
The format must start at the beginning of a column. The lone . on the end ends the format definition. You notice I write the entire thing with just a single write statement. How does it know to print out the heading? Perl tracks the number of lines printed and automatically writes a Form Feed character and a new heading when Perl thinks it's at the bottom of a page. I am assuming Perl uses 66 line pages as a default.
You can in Perl set your own form names via select. Perl uses $= as the number of lines on a page, and $- on the number of lines left. These variables are global, but are set by the selected format via the select statement. You can use IO::Handle for better variable naming.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
my #data = (
{
level => 250,
direction => 320,
speed => 1.5,
},
{
level => 870,
direction => 55,
speed => 4.5,
},
);
my $level;
my $direction;
my $speed;
for my $item_ref ( #data ) {
$level = $item_ref->{level};
$direction = $item_ref->{direction};
$speed = $item_ref->{speed};
write;
}
format STDOUT_TOP =
Level Wind Direction Windspeed
===== ============== =========
.
format STDOUT =
##### ############## ######.##
$level, $direction, $speed
.
This prints:
Level Wind Direction Windspeed
===== ============== =========
250 320 1.50
870 55 4.50
#Gunnerfan : Can you replace the line from your code as shown below
Your line of code: printf OUT "%.0f %.0f %.1f\n",$level[$i], wind_direction, $wind_speed;
Replacement code:
if($i==0) {
printf OUT "\n\t%s%-20s %-10s%-12s %-20s%s\n", 'Level' , 'Wind direction' , 'windspeed');
}
printf OUT "\t%s%-20s%s %-10s%s%-12s%s %-20s\n",$level[$i],$wind_direction, $wind_speed;
Perl Experts - My attempt to solve my problem is turning into a lot of code, which in PERL seems like I'm approaching this in-correctly. Here is my problem:
I have a block of text (example below) which can have variable amount of whitespace between the column data. I was using a simple split, but the problem now is that the column "code" now contains spaces in the data (I only accounted for that in the last column). What seems to be constant (although I don't have access to, or control of the source structure) is that there is a minimum of 3 spaces between columns (maybe more, but never less).
So, I'd like to say my column delimiter token is "3 spaces" and then trim the data within each to have my actual columnar data.
COL0 COL1 COL2 COL3 COL4 COL5
- 4 0.2 1 416489 463455 554
1 0.9 1 E1
0 3 1.4 14 E97-TEST 1
- 1 97.5 396 PASS Good
I'm just trying to get the values into 6 variables.
NOTE: COL0 may not have a value. COL4 may contain space in data. COL5 may contain no value, or data with space. All fixed formatting is done with spaces (no tabs or other special characters). To clarify -- the columns are NOT consistently sized. One file might have COL4 as 13 characters, another have COL4 with 21 characters wide. Or not strict as another SO member stated.
You'll need to figure out where the columns are. As a really quite disgusting hack, you can read the whole file in and then string-or the lines together:
my #file = <file>;
chomp #file;
my $t = "";
$t |= $_ foreach(#file);
$t will then contain space characters in columns only where there were always space characters in that column; other columns will contain binary junk. Now split it with a zero-width match that matches the non-space:
my #cols = split /(?=[^ ]+)/, $t;
We actually want the widths of the columns to generate an unpack() format:
#cols = map length, #cols;
my $format = join '', map "A$_", #cols;
Now process the file! :
foreach my $line (#file) {
my($field, $field2, ...) = unpack $format, $line;
your code here...
}
(This code has only been lightly tested.)
If you're dealing with strict columnar data like this, unpack is probably what you want:
#!perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my $data = <<EOD;
COL0 COL1 COL2 COL3 COL4 COL5
- 4 0.2 1 416489 463455 554
1 0.9 1 E1
0 3 1.4 14 E97-TEST 1
- 1 97.5 396 PASS Good
EOD
my #lines = split '\n', $data;
for my $line ( #lines ) {
my #values = unpack("a5 A7 A7 A7 A13 A*", $line);
print Dumper \#values;
}
This appears to dump out your values into the #values array as you wish, but they'll have leading spaces that you'll have to trim off.
I would use two passes: in the first, find those character columns that have a space in each line; then, split or unpack with those indices. Whitespace trimming is done afterwards.
Your example:
COL0 COL1 COL2 COL3 COL4 COL5
- 4 0.2 1 416489 463455 554
1 0.9 1 E1
0 3 1.4 14 E97-TEST 1
- 1 97.5 396 PASS Good
000011100001110000111000011100000000001110000000000
The 1s in the last line show which columns are all spaces.
I know CanSpice already answered (possibly a much better solution), but you can set the input delimiter using "$/". This must be done in a local scope (probably a sub) as it is a global variable, or you may see side effects. Ex:
local $/ = " ";
$input = <DATAIN>; # assuming DATAIN is the file-handler
You can trim whitespace using a nice little regex. See Wikipedia for an example.
I have an image file name that consists of four parts:
$Directory (the directory where the image exists)
$Name (for a art site, this is the paintings name reference #)
$File (the images file name minus extension)
$Extension (the images extension)
$example 100020003000.png
Which I desire to be broken down accordingly:
$dir=1000 $name=2000 $file=3000 $ext=.png
I was wondering if substr was the best option in breaking up the incoming $example so I can do stuff with the 4 variables like validation/error checking, grabbing the verbose name from its $Name assignment or whatever. I found this post:
is unpack faster than substr?
So, in my beginners "stone tool" approach:
my $example = "100020003000.png";
my $dir = substr($example, 0,4);
my $name = substr($example, 5,4);
my $file = substr($example, 9,4);
my $ext = substr($example, 14,3); # will add the the "." later #
So, can I use unpack, or maybe even another approach that would be more efficient?
I would also like to avoid loading any modules unless doing so would use less resources for some reason. Mods are great tools I luv'em but, I think not necessary here.
I realize I should probably push the vars into an array/hash but, I am really a beginner here and I would need further instruction on how to do that and how to pull them back out.
Thanks to everyone at stackoverflow.com!
Absolutely:
my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = unpack 'A4' x 4, $example;
print "$dir\t$name\t$file\t$ext\n";
Output:
1000 2000 3000 .png
I'd just use a regex for that:
my ($dir, $name, $file, $ext) = $path =~ m:(.*)/(.*)/(.*)\.(.*):;
Or, to match your specific example:
my ($dir, $name, $file, $ext) = $example =~ m:^(\d{4})(\d{4})(\d{4})\.(.{3})$:;
Using unpack is good, but since the elements are all the same width, the regex is very simple as well:
my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = $example =~ /(.{4})/g;
It isn't unpack, but since you have groups of 4 characters, you could use a limited split, with a capture:
my ($dir, $name, file, $ext) = grep length, split /(....)/, $filename, 4;
This is pretty obfuscated, so I probably wouldn't use it, but the capture in a split is an ofter overlooked ability.
So, here's an explanation of what this code does:
Step 1. split with capturing parentheses adds the values captured by the pattern to its output stream. The stream contains a mix of fields and delimiters.
qw( a 1 b 2 c 3 ) == split /(\d)/, 'a1b2c3';
Step 2. split with 3 args limits how many times the string is split.
qw( a b2c3 ) == split /\d/, 'a1b2c3', 2;
Step 3. Now, when we use a delimiter pattern that matches pretty much anything /(....)/, we get a bunch of empty (0 length) strings. I've marked delimiters with D characters, and fields with F:
( '', 'a', '', '1', '', 'b', '', '2' ) == split /(.)/, 'a1b2';
F D F D F D F D
Step 4. So if we limit the number of fields to 3 we get:
( '', 'a', '', '1', 'b2' ) == split /(.)/, 'a1b2', 3;
F D F D F
Step 5. Putting it all together we can do this (I used a .jpeg extension so that the extension would be longer than 4 characters):
( '', 1000, '', 2000, '', 3000, '.jpeg' ) = split /(....)/, '100020003000.jpeg',4;
F D F D F D F
Step 6. Step 5 is almost perfect, all we need to do is strip out the null strings and we're good:
( 1000, 2000, 3000, '.jpeg' ) = grep length, split /(....)/, '100020003000.jpeg',4;
This code works, and it is interesting. But it's not any more compact that any of the other solutions. I haven't bench-marked, but I'd be very surprised if it wins any speed or memory efficiency prizes.
But the real issue is that it is too tricky to be good for real code. Using split to capture delimiters (and maybe one final field), while throwing out the field data is just too weird. It's also fragile: if one field changes length the code is broken and has to be rewritten.
So, don't actually do this.
At least it provided an opportunity to explore some lesser known features of split.
Both substr and unpack bias your thinking toward fixed-layout, while regex solutions are more oriented toward flexible layouts with delimiters.
The example you gave appeared to be fixed layout, but directories are usually separated from file names by a delimiter (e.g. slash for POSIX-style file systems, backwardslash for MS-DOS, etc.) So you might actually have a case for both; a regex solution to split directory and file name apart (or even directory/name/extension) and then a fixed-length approach for the name part by itself.