Unique Character Count

Unique Character Count - perl

Hi I an extreme novice and I need help on what I should type so that the unique character count is displayed based on what the user inputs from their keyboard
I already have it set up to show the character count in the string
Here is the Code:
#!C:\Strawberry\perl\bin\perl
use strict;
use warnings;
print "Input Username";
my $str = <>;
chomp ($str);
print "You have typed: $str\n";
my $str_length = length($str);
print "Total Characters = " . $str_length . "\n";
exit;

You can use this function to get what you need:
sub func($) { my ($str, %hash) = shift; $hash{$_}++ for split //, $str; (length $str, scalar keys %hash) }
and this if you need to get count of certain char:
sub uniq_ch_count($$) { my ($ch, $str, %hash) = #_; $hash{$_}++ for split //, $str; $hash{$ch} }
EXAMPLE 1:
my ($chars_count, $uniq_chars_count) = func('one two three four');
print $chars_count . " " . $uniq_chars_count . "\n";
OUTPUT:
18 10
EXAMPLE 2:
print uniq_ch_count('d', "asdjkasdjd sdfj d ") . " " . uniq_ch_count(' ', "asdjkasdjd sdfj d ") . "\n";
OUTPUT:
5
3

The simplest method would be to use a hash:
# split the string into an array of characters
my #chars = split //, $str;
# lists of values can be assigned to multiple indexes at once
# here we assign each character an empty value, but since hash
# keys are unique in nature, every subsequent assignment overwrites
# the first.
my %uniq;
#uniq{#chars} = ();
# next get the list of keys from the hash and treat that list as
# a scalar which gives you the count
my $count = scalar keys %uniq;
See: http://perldoc.perl.org/perldata.html#Slices

OK, so the magic keyword here - as far as Perl is concerned is 'unique'. Because that usually means a hash is the tool for the job.
In perl, a hash is a set of key-value pairs, which means it's great for counting numbers of unique items.
So if you take your string, and split it into characters:
my %count_of;
foreach my $character ( split ( '', $str ) ) {
$count_of{$character}++;
}
You can then print out %count_of:
foreach my $character ( keys %count_of ) {
print "$character = $count_of{$character}\n";
}
But because keys %count_of gives you an array containing each 'key' - one of the nice tricks in perl, is an array in a scalar context, is just a number representing the number of elements. So you can do:
print scalar keys %count_of, " unique characters in $str\n";

Related

Perl list all keys in hash with identical values

If I have a colon-delimited file name FILE and I do:
cat FILE|perl -F: -lane 'my %hash = (); $hash{#F[0]} = #F[2]'
to assign the first and 3rd tokens as the key => value pairs for the hash..
1) Is that a sane way to assign key value pairs to a hash?
2) What is the simplest way to now find all keys with shared values and list them?
Assume FILE looks like:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male
Desired Output: Keys with value "Apple": Mike,Jared,David,Sam

Your example won't work as you want because the -n option puts a while loop around your one-line program, so the hash you declare is created and destoyed for every record in the file. You could get around that by not declaring the hash, and so making it a persistent package variable which will retain all values stored in it.
You can then write push #{ $hash{$F[2]} }, $F[0] but notice that it should be $F[0] etc. and not #F[0], and I have used push to create a list of column 1 values for each column 3 value instead of just a list of one-to-one values relating each column 1 value with its column 3 value.
To clarify, your method produces a hash looking like this, which has to be searched to produce the display that you want.
(
Beth => "Maize",
David => "Apple",
Don => "Corn",
Jared => "Apple",
Mike => "Apple",
Sam => "Apple",
)
while mine creates this, which as you can see is pretty much already in the form you want.
(
Apple => ["Mike", "Jared", "Sam", "David"],
Corn => ["Don"],
Maize => ["Beth"],
)
But I think this problem is a bit too big to be solved with a one-line Perl program. The solution below expects the path to the input file as a command-line parameter, like this
> perl prog.pl colons.csv
but it will default to myfile.csv if no file is specified.
use strict;
use warnings;
our #ARGV = 'myfile.csv' unless #ARGV;
my %data;
while (<>) {
my #fields = split /:/;
push #{ $data{$fields[2]} }, $fields[0];
}
while (my ($k, $v) = each %data) {
next unless #$v > 1;
printf qq{Keys with value "%s": %s\n}, $k, join ', ', #$v;
}
output
Keys with value "Apple": Mike, Jared, Sam, David

use strict;
use warnings;
open my $in, '<', 'in.txt';
my %data;
while(<$in>){
chomp;
my #split = split/:/;
$data{$split[0]} = $split[2];
}
my $query = 'Apple';
print "Keys with value $query = ";
foreach my $name (keys %data){
print "$name " if $data{$name} eq $query;
}
print "\n";

Arrays are used to hold list of values, so use an array.
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
END {
for my $fruit (keys %h) {
next if #{ $h{$fruit} } < 2;
print "$fruit: ", join(",", #{ $h{$fruit} });
}
}
' FILE
The END block is executed on exit. In it, we iterate over the keys of the hash. If the value of the current hash element is an array with only one element, it's skipped. Otherwise, we prints the key followed by contents of the array referenced by the hash element.

Here is another way:
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
}{
print "$_: ", join(",", #{ $h{$_} }) for grep { #{$h{$_}} > 1 } keys %h;
' file
We read each line and create hash of arrays using third column as key and first column as list of values for matching key. In the END block we iterate over our hash using grep and filter keys whose array count greater than 1 and print the key followed by array elements.

It doesn't have to be a one liner,
Good. It's not going to be...
Is that a sane way to assign key value pairs to a hash?
You're simply assigning the key value pairs as:
$hash{"key"} = "value";
Which is about as simple as it gets. There might be a way of doing it via map. However, the main issue I see is what should happen if you have duplicate keys.
Let's say your file looks like this:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male # Note this entry is here twice!
David:35:Wheat:Male # Note this entry is here twice!
Let's do a simple assignment loop:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
}
When you get to $hash{David}, it will first be set to Apple, but then you change the value to Wheat. There are four ways you can handle this:
Use whatever the last value is. No change in the loop.
Use the first value and ignore subsequent values. Simple enough to do.
If that happens, it's an error. Abort the program and report the error.
Keep all values.
This last one is the most interesting because it involves a reference to an array as the values for your hash:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = [] if not exists $hash{$name}; # I'm making this an array reference
push #{ $hash{$name} }, $category;
}
Now, each value in my hash is a reference to an array:
my #values = #{ $hash{David} ); # The values of David...
print "David is in categories " . join ( ", ", #values ) . "\n";
This will print out David is in categories Wheat, Apple
What is the simplest way to now find all keys with shared values and list them?
The easiest way is to create a second hash that's keyed by your value. In this hash, you will need to use an array reference. Let's assume no duplicate names for now:
my %hash;
my %indexed_hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
my $indexed_hash{$category} = [] if not exist $indexed_hash{$category};
push #{ $indexed_hash{$category} }, $name;
}
Now, if I want to find all the duplicates of Apple:
my #names = #{ $indexed_hash{Apple} };
print "The following are in 'Apple': " . join ( ", " #names ) . "\n";
Since we're getting into references, we could take things a step further and store all of your values of your file in your hash. Again, for simplicity, I am assuming that you will have one and only one entry per name:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name}->{AGE} = $age;
$hash{$name}->{CATEGORY} = $category;
$hash{$name}->{SEX} = $sex;
}
for my $name ( sort keys %hash ) {
print "$name Information:\n";
print " Age: " . $hash{$name}->{AGE} . "\n";
printf "Category: %s\n", $hash{$name}->{CATEGORY};
print " Sex: #{[$hash{$name}->{SEX}]}\n\n";
}
That last two statements are easier ways of interpolating complex data structures into a string. The printf is fairly clear. The second #{[...]} is a neat little trick.

What have you tried?
If you reverse the hash into a list of value => key pairs then use List::Util's pairs() against the list, you can transform the hash into a hash of values => key arrayrefs. i.e. ( foo => [ 'bar', 'baz' ] ), grep {#{$hash{$_}} > 1} keys %hash, and print the results.

How to skip splitting for some part of the line

Say I have a line lead=george wife=jane "his boy"=elroy. I want to split with space but that does not include the "his boy" part. I should be considered as one.
With normal split it is also splitting "his boy" like taking "his" as one and "boy" as second part. How to escape this
Following this i tried
split " ", $_
Just came to know that this will work
use strict; use warnings;
my $string = q(hi my name is 'john doe');
my #parts = $string =~ /'.*?'|\S+/g;
print map { "$_\n" } #parts;
But it does not looks good. Any other simple thing with split itself?

You could use Text::ParseWords for this
use Text::ParseWords;
$list = "lead=george wife=jane \"his boy\"=elroy";
#words = quotewords('\s+', 0, $list);
$i = 0;
foreach (#words) {
print "$i: <$_>\n";
$i++;
}
ouput:
0: <lead=george>
1: <wife=jane>
2: <his boy=elroy>

sub split_space {
my ( $text ) = #_;
while (
$text =~ m/
( # group ($1)
\"([^\"]+)\" # first try find something in quotes ($2)
|
(\S+?) # else minimal non-whitespace run ($3)
)
=
(\S+) # then maximum non-whitespace run ($4)
/xg
) {
my $key = defined($2) ? $2 : $3;
my $value = $4;
print( "key=$key; value=$value\n" );
}
}
split_space( 'lead=george wife=jane "his boy"=elroy' );
Outputs:
key=lead; value=george
key=wife; value=jane
key=his boy; value=elroy

PP posted a good solution. But just to make it sure, that there is a cool other way to do it, comes my solution:
my $string = q~lead=george wife=jane "his boy"=elroy~;
my #split = split / (?=")/,$string;
my #split2;
foreach my $sp (#split) {
if ($sp !~ /"/) {
push #split2, $_ foreach split / /, $sp;
} else {
push #split2,$sp;
}
}
use Data::Dumper;
print Dumper #split2;
Output:
$VAR1 = 'lead=george';
$VAR2 = 'wife=jane';
$VAR3 = '"his boy"=elroy';
I use a Lookahead here for splitting at first the parts which keys are inside quotes " ". After that, i loop through the complete array and split all other parts, which are normal key=values.

You can get the required result using a single regexp, which extract the keys and the values and put the result inside a hash table.
(\w+|"[\w ]+") will match both a single and multiple word in the key side.
The regexp captures only the key and the value, so the result of the match operation will be a list with the following content: key #1, value #1, key #2, value#2, etc.
The hash is automatically initiated with the appropriate keys and values, when the match result is assigned to it.
here is the code
my $str = 'lead=george wife=jane "hello boy"=bye hello=world';
my %hash = ($str =~ m/(?:(\w+|"[\w ]+")=(\w+)(?:\s|$))/g);
## outputs the hash content
foreach $key (keys %hash) {
print "$key => $hash{$key}\n";
}
and here is the output of this script
lead => george
wife => jane
hello => world
"hello boy" => bye

sorting an array on the first number found in each element

I'm looking for help sorting an array where each element is made up of "a number, then a string, then a number". I would like to sort on the first number part of the array elements, descending (so that I list the higher numbers first), while also listing the text etc.
am still a beginner so alternatives to the below are also welcome
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/$i,/g; # maybe try to make a string only once then search trough it... ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
}
#for (#count2) {print "$_\n";}
# try to add up all numbers in the first coloum to make sure they == 100
#sort #count2 and print the top 7
#count2 = sort {$b <=> $a} #count2; # try to stop printout of this, or sort on =~ m/^anumber/ ??? or just on the first one or two \d
foreach my $i (0..6) {
print $count2[$i] ."\n"; # seems to be sorted right anyway
}

First, store your data in an array, not in a string:
# inside the first loop, replace your line with the push() with this one:
push(#count2, [$count1, $i];
Then you can easily sort by the first element of each subarray:
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
And when you print it, construct the string:
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
See also: http://perldoc.perl.org/perlreftut.html, perlfaq4

Taking your requirements as is. You're probably better off not embedding count information in a string. However, I'll take it as a learning exercise.
Note, I am trading memory for brevity and likely speed by using a hash to do the counting.
However, the sort could be optimized by using a Schwartzian Transform.
EDIT: Create results array using only numbers that were drawn
#!/usr/bin/perl
use strict; use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 );
my %counts;
++$counts{$_} for #arr;
my #result = map sprintf('%d times for %d', $counts{$_}, $_),
sort {$counts{$a} <=> $counts{$b}} keys %counts;
print "$_\n" for #result;
However, I'd probably have done something like this:
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99; #initialize #arr capacity to 100 elements
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1); # pick a number
$arr[ $i ] = $n; # store it
++$counts{ $n }; # update count
}
# sort keys according to counts, keys of %counts has only the numbers drawn
# for each number drawn, create an anonymous array ref where the first element
# is the number drawn, and the second element is the number of times it was drawn
# and put it in the #result array
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
print Dump \#result;

Perl - Summarize Data in File

Whats the best way to summarize data from a file that has around 2 million records in Perl?
For eg: A file like this,
ABC|XYZ|DEF|EGH|100
ABC|XYZ|DEF|FGH|200
SDF|GHT|WWW|RTY|1000
SDF|GHT|WWW|TYU|2000
Needs to be summarized on the first 3 columns like this,
ABC|XYZ|DEF|300
SDF|GHT|WWW|3000
Chris

Assuming there are always five columns, the fifth of which is numeric, and you always want the first three columns to be the key...
use warnings;
use strict;
my %totals_hash;
while (<>)
{
chomp;
my #cols = split /\|/;
my $key = join '|', #cols[0..2];
$totals_hash{$key} += $cols[4];
}
foreach (sort keys %totals_hash)
{
print $_, '|', $totals_hash{$_}, "\n";
}

You can use a hash as:
my %hash;
while (<DATA>) {
chomp;
my #tmp = split/\|/; # split each line on |
my $value = pop #tmp; # last ele is the value
pop #tmp; # pop unwanted entry
my $key = join '|',#tmp; # join the remaining ele to form key
$hash{$key} += $value; # add value for this key
}
# print hash key-values.
for(sort keys %hash) {
print $_ . '|'.$hash{$_}."\n";
}
Ideone link

Presuming your input file has its records in separate lines.
perl -n -e 'chomp;#a=split/\|/;$h{join"|",splice#a,0,3}+=pop#a;END{print map{"$_: $h{$_}\n"}keys%h}' < inputfile

1-2-3-4 I declare A CODE-GOLF WAR!!! (Okay, a reasonably readable code-golf dust-up.)
my %sums;
m/([^|]+\|[^|]+\|[^|]+).*?\|(\d+)/ and $sums{ $1 } += $2 while <>;
print join( "\n", ( map { "$_|$sums{$_}" } sort keys %sums ), '' );

Sort to put all records with the same first 3 triplets next to each other. Iterate through and kick out a subtotal when a different set of triplets appears.
$prevKey="";
$subtotal=0;
open(INPUTFILE, "<$inFile");
#lines=<INPUTFILE>;
close (INPUTFILE);
open(OUTFILE, ">$outFile");
#sorted=sort(#lines);
foreach $line(#lines){
#parts=split(/\|/g, $line);
$value=pop(#parts);
$value-=0; #coerce string to a number
$key=$parts[0]."|".$parts[1]."|".$parts[2];
if($key ne $prevKey){
print OUTFILE "$prevKey|$subtotal\n";
$prevKey=$key;
$subtotal=0;
}
$subtotal+=$value;
}
close(OUTFILE);
If sorting 2 million chokes your box then you may have to put each record into a file based on the group and then do the subtotal for each file.

Reformulate a string query in perl

How do i reformulate a string in perl?
For example consider the string "Where is the Louvre located?"
How can i generate strings like the following:
"the is Louvre located"
"the Louvre is located"
"the Louvre located is"
These are being used as queries to do a web search.
I was trying to do something like this:
Get rid of punctuations and split the sentence into words.
my #words = split / /, $_[0];
I don't need the first word in the string, so getting rid of it.
shift(#words);
And then i need move the next word through out the array - not sure how to do this!!
Finally convert the array of words back to a string.

How can I generate all permutations of an array in Perl?
Then use join to glue each permutation array back together into a single string.

Somewhat more verbose example:
use strict;
use warnings;
use Data::Dumper;
my $str = "Where is the Louvre located?";
# split into words and remove the punctuation
my #words = map {s/\W+//; $_} split / /, $str;
# remove the first two words while storing the second
my $moving = splice #words, 0 ,2;
# generate the variations
my #variants;
foreach my $position (0 .. $#words) {
my #temp = #words;
splice #temp, $position, 0, $moving;
push #variants, \#temp;
}
print Dumper(\#variants);

my #head;
my ($x, #tail) = #words;
while (#tail) {
push #head, shift #tail;
print join " ", #head, $x, #tail;
};
Or you can just "bubble" $x through the array: $words[$n-1] and words[$n]
foreach $n (1..#words-1) {
($words[$n-1, $words[$n]) = ($words[$n], $words[$n-1]);
print join " ", #words, "\n";
};

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Unique Character Count - perl

Related

Perl list all keys in hash with identical values

How to skip splitting for some part of the line

sorting an array on the first number found in each element

Perl - Summarize Data in File

Reformulate a string query in perl

Categories

Resources