Reformulate a string query in perl - perl

How do i reformulate a string in perl?
For example consider the string "Where is the Louvre located?"
How can i generate strings like the following:
"the is Louvre located"
"the Louvre is located"
"the Louvre located is"
These are being used as queries to do a web search.
I was trying to do something like this:
Get rid of punctuations and split the sentence into words.
my #words = split / /, $_[0];
I don't need the first word in the string, so getting rid of it.
shift(#words);
And then i need move the next word through out the array - not sure how to do this!!
Finally convert the array of words back to a string.

How can I generate all permutations of an array in Perl?
Then use join to glue each permutation array back together into a single string.

Somewhat more verbose example:
use strict;
use warnings;
use Data::Dumper;
my $str = "Where is the Louvre located?";
# split into words and remove the punctuation
my #words = map {s/\W+//; $_} split / /, $str;
# remove the first two words while storing the second
my $moving = splice #words, 0 ,2;
# generate the variations
my #variants;
foreach my $position (0 .. $#words) {
my #temp = #words;
splice #temp, $position, 0, $moving;
push #variants, \#temp;
}
print Dumper(\#variants);

my #head;
my ($x, #tail) = #words;
while (#tail) {
push #head, shift #tail;
print join " ", #head, $x, #tail;
};
Or you can just "bubble" $x through the array: $words[$n-1] and words[$n]
foreach $n (1..#words-1) {
($words[$n-1, $words[$n]) = ($words[$n], $words[$n-1]);
print join " ", #words, "\n";
};

Related

Unique Character Count

Hi I an extreme novice and I need help on what I should type so that the unique character count is displayed based on what the user inputs from their keyboard
I already have it set up to show the character count in the string
Here is the Code:
#!C:\Strawberry\perl\bin\perl
use strict;
use warnings;
print "Input Username";
my $str = <>;
chomp ($str);
print "You have typed: $str\n";
my $str_length = length($str);
print "Total Characters = " . $str_length . "\n";
exit;
You can use this function to get what you need:
sub func($) { my ($str, %hash) = shift; $hash{$_}++ for split //, $str; (length $str, scalar keys %hash) }
and this if you need to get count of certain char:
sub uniq_ch_count($$) { my ($ch, $str, %hash) = #_; $hash{$_}++ for split //, $str; $hash{$ch} }
EXAMPLE 1:
my ($chars_count, $uniq_chars_count) = func('one two three four');
print $chars_count . " " . $uniq_chars_count . "\n";
OUTPUT:
18 10
EXAMPLE 2:
print uniq_ch_count('d', "asdjkasdjd sdfj d ") . " " . uniq_ch_count(' ', "asdjkasdjd sdfj d ") . "\n";
OUTPUT:
5
3
The simplest method would be to use a hash:
# split the string into an array of characters
my #chars = split //, $str;
# lists of values can be assigned to multiple indexes at once
# here we assign each character an empty value, but since hash
# keys are unique in nature, every subsequent assignment overwrites
# the first.
my %uniq;
#uniq{#chars} = ();
# next get the list of keys from the hash and treat that list as
# a scalar which gives you the count
my $count = scalar keys %uniq;
See: http://perldoc.perl.org/perldata.html#Slices
OK, so the magic keyword here - as far as Perl is concerned is 'unique'. Because that usually means a hash is the tool for the job.
In perl, a hash is a set of key-value pairs, which means it's great for counting numbers of unique items.
So if you take your string, and split it into characters:
my %count_of;
foreach my $character ( split ( '', $str ) ) {
$count_of{$character}++;
}
You can then print out %count_of:
foreach my $character ( keys %count_of ) {
print "$character = $count_of{$character}\n";
}
But because keys %count_of gives you an array containing each 'key' - one of the nice tricks in perl, is an array in a scalar context, is just a number representing the number of elements. So you can do:
print scalar keys %count_of, " unique characters in $str\n";

Efficient way to read columns in a file using Perl

I have an input file like so, separated by newline characters.
AAA
BBB
BBA
What would be the most efficient way to count the columns (vertically), first with first, second with second etc etc.
Sample OUTPUT:
ABB
ABB
ABA
I have been using the following, but am unable to figure out how to remove the scalar context from it. Any hints are appreciated:
while (<#seq_prot>){
chomp;
my #sequence = map substr (#seq_prot, 1, 1), $start .. $end;
#sequence = split;
}
My idea was to use the substring to get the first letter of the input (A in this case), and it would cycle for all the other letters (The second A and B). Then I would increment the cycle number + 1 so as to get the next line, until I reached the end. Of course I can't seem to get the first part going, so any help is greatly appreciated, am stumped on this one.
Basically, you're trying to transpose an array.
This can be done easily using Array::Transpose
use warnings;
use strict;
use Array::Transpose;
die "Usage: $0 filename\n" if #ARGV != 1;
for (transpose([map {chomp; [split //]} <>])) {
print join("", map {$_ // " "} #$_), "\n"
}
For an input file:
ABCDEFGHIJKLMNOPQRS
12345678901234
abcdefghijklmnopq
ZYX
Will output:
A1aZ
B2bY
C3cX
D4d
E5e
F6f
G7g
H8h
I9i
J0j
K1k
L2l
M3m
N4n
O o
P p
Q q
R
S
You'll have to read in the file once for each column, or store the information and go through the data structure later.
I was originally thinking in terms of arrays of arrays, but I don't want to get into References.
I'm going to make the assumption that each line is the same length. Makes it simpler that way. We can use split to split your line into individual letters:
my = $line = "ABC"
my #split_line = split //, $line;
This will give us:
$split_line[0] = "A";
$split_line[1] = "B";
$split_line[2] = "C";
What if we now took each letter, and placed it into a #vertical_array.
my #vertical_array;
for my $index ( 0..##split_line ) {
$vertical_array[$index] .= "$split_line[$index];
}
Now let's do this with the next line:
$line = "123";
#split_line = split //, $line;
for my $index ( 0..##split_line ) {
$vertical_array[$index] .= "$split_line[$index];
}
This will give us:
$vertical_array[0] = "A1";
$vertical_array[1] = "B2";
$vertical_array[2] = "C3";
As you can see, I'm building the $vertical_array with each interation:
use strict;
use warnings;
use autodie;
use feature qw(say);
my #vertical_array;
while ( my $line = <DATA> ) {
chomp $line;
my #split_line = split //, $line;
for my $index ( 0..$#split_line ) {
$vertical_array[$index] .= $split_line[$index];
}
}
#
# Print out your vertical lines
#
for my $line ( #vertical_array ) {
say $line;
}
__DATA__
ABC
123
XYZ
BOY
FOO
BAR
This prints out:
A1XBFB
B2YOOA
C3ZYOR
If I had used references, I could probably have built an array of arrays and then flipped it. That's probably more efficient, but more complex. However, that may be better at handling lines of different lengths.

How to Split on three different delimiters then ucfirst each result[]?

I am trying to figure out how to split a string that has three possible delimiters (or none) without a million lines of code but, code is still legible to a guy like me.
Many possible combinations in the string.
this-is_the.string
this.is.the.string
this-is_the_string
thisisthestring
There are no spaces in the string and none of these characters:
~`!##$%^&*()+=\][{}|';:"/?>,<.
The string is already stripped of all but:
0-9
a-Z
-
_
.
There are also no sequential dots, dashes or underscores.
I would like the result to be displayed like Result:
This Is The String
I am really having a difficult time trying to get this going.
I believe I will need to use a hash and I just have not grasped the concept even after hours of trial and error.
I am bewildered at the fact I could possibly split a string on multiple delimiters where the delimiters could be in any order AND/OR three different types (or none at all) AND maintain the order of the result!
Any possibilities?
Split the string into words, capitalise the words, then join the words while inserting spaces between them.
It can be coded quite succinctly:
my $clean = join ' ', map ucfirst lc, split /[_.-]+/, $string;
If you just want to print out the result, you can use
use feature qw( say );
say join ' ', map ucfirst lc, split /[_.-]+/, $string;
or
print join ' ', map ucfirst lc, split /[_.-]+/, $string;
print "\n";
It is simple to use a global regular expression to gather all sequences of characters that are not a dot, dash, or underscore.
After that, lc will lower-case each string and ucfirst will capitalise it. Stringifying an array will insert spaces between the elements.
for ( qw/ this-is_the.string this.is.the.string this-is_the_string / ) {
my #string = map {ucfirst lc } /[^-_.]+/g;
print "#string\n";
}
output
This Is The String
This Is The String
This Is The String
" the delimiters could be anywhere AND/OR three different types (or none at all)" ... you need a delimiter to split a string, you can define multiple delimiters with a regular expression to the split function
my #parts = split(/[-_\.]/, $string);
print ucfirst "$_ " foreach #parts;
print "\n"
Here's a solution that will work for all but your last test case. It's extremely hard to split a string without delimiters, you'd need to have a list of possible words, and even then it would be prone to error.
#!/usr/bin/perl
use strict;
use warnings;
my #strings = qw(
this-is_the.string
this.is.the.string
this-is_the_string
thisisthestring
);
foreach my $string (#strings) {
print join(q{ }, map {ucfirst($_)} split(m{[_.-]}smx,$string)) . qq{\n};
}
And here's an alternative for the loop that splits everything into separate statements to make it easier to read:
foreach my $string (#strings) {
my #words = split m{[_.-]}smx, $string;
my #upper_case_words = map {ucfirst($_)} #words;
my $string_with_spaces = join q{ }, #upper_case_words;
print $string_with_spaces . qq{\n};
}
And to prove that just because you can, doesn't mean you should :P
$string =~ s{([A-Za-z]+)([_.-]*)?}{ucfirst(lc("$1")).($2?' ':'')}ge;
For all but last possibility:
use strict;
use warnings;
my $file;
my $newline;
open $file, "<", "testfile";
while (<$file>) {
chomp;
$newline = join ' ', map ucfirst lc, split /[-_\.]/, $_;
print $newline . "\n";
}

How to remove all . from string except the last?

I would like to remove all . from a string except from the last.
It can be done in JavaScript like so
var s='1.2.3.4';
s=s.split('.');
s.splice(s.length-1,0,'.');
s.join('');
but when try the same in Perl
my #parts = split /./, $s;
my #a = splice #parts, $#parts-1,0;
$s = join "", #a;
I get
Modification of non-creatable array value attempted, subscript -2 at ./test.pl line 15.
Question
Can anyone figure out how to do this in Perl?
I would use a regexp with positive look-ahead in perl for the task:
perl -pe 's/\.(?=.*\.)//g' <<<"1.2.3.4"
Result:
123.4
EDIT to add a fix to your solution using split:
use warnings;
use strict;
my $s = '1.2.3.4';
my #parts = split /\./, $s;
$s = join( "", #parts[0 .. $#parts-1] ) . '.' . $parts[$#parts];
printf "$s\n";
First of all, escape the dot in split instruction: my #parts = split /\./, $s;
Your split is using a regex /./, in which case . is considered the wild card character. If you want to split on a literal period, you need to escape it:
... split /\./, $s;
splice takes arguments ARRAY or EXPR, OFFSET, LENGTH, LIST (perl v5.14). If LENGTH is 0, nothing is removed, and so nothing is returned.
Your code is contradictory to what you say you are trying to do, so I'm not quite sure what it really is you're trying to do, but assuming you want to remove all periods except the last, I would expect you'd do something like:
my #parts = split /\./, $s;
my $end = pop #parts;
$s = join "", #parts, ".$end";
Or perhaps manipulate the split
my #parts = split /\./, $s;
my $limit = #parts - 1; # the field count for split
$s = join "", split /\./, $s, $limit;
So basically, find out how many fields your string will be split into, subtract one, then perform a new split and set the LIMIT to that.
when in doubt, use diagnostics;
$ perl -Mdiagnostics -le " splice #ARGV, -1 ,0 "
Modification of non-creatable array value attempted, subscript -1 at -e line 1 (#1)
(F) You tried to make an array value spring into existence, and the
subscript was probably negative, even counting from end of the array
backwards.
Uncaught exception from user code:
Modification of non-creatable array value attempted, subscript -1 at -e line 1.
at -e line 1.
$ perl -Mdiagnostics -le " splice #ARGV, -1 ,0 " argv now not empty
I doubt you want to use negative offsets, I think you want to use offset 0 and size of array minus one (also known as the last index )
$ perl -le " print for splice #ARGV, 0, $#ARGV-1 " a b c
a
Ooops. $#ARGV is the last index, not $#ARGV -1, so
$ perl -le " print for splice #ARGV, 0, $#ARGV " a b c
a
b
but if you still want some arithmetic you can use #ARGV, cause in scalar context its the size of the array
$ perl -le " print for splice #ARGV, 0, #ARGV-1 " a b c
a
b
Side-benefit of using non-negative offsets with splice? It doesn't die when array is empty
$ perl -le " print for splice #ARGV, 0, 10 "
This looks more like what you were trying to do in Perl
my #parts = split /\./, $s;
$s = join('', splice(#parts, 0, -1)) . '.' . $parts[-1];
You missed the '.' off your splice call. Here's how it should look
use strict;
use warnings;
my $s = '1.2.3.4';
my #parts = split /\./, $s;
splice #parts, -1, 0, '.';
$s = join "", #parts;
The first argument of split is a regular expression. In regular expressions, "." means "match any character" (with /s) or "match any character except LF" (without /s). You need to escape it to match a literal ".".
my #parts = split(/\./, $s, -1); # "-1" to handle "1.2.3.4."
splice(#parts, -1, 0, '.') if #parts > 2; # "if" to handle "1234"
$s = join('', #parts);
A substitution could do it as well:
$s =~ s/\.(?=.*\.)//sg;

How to copy the contents of array into a single variable in Perl?

I have a data in an array as below. I want to copy all the content in a single variable. How can I do this ?
IFLADK
FJ
FAILED
FNKS
FKJ
FAILED
You could assign a reference to the array
my $scalar = \#array;
… or join all the strings in the array together
my $scalar = join "\n", #array;
With reference to previous question How to read n lines above the matched string in perl? Storing multiple hits in an array:
while (<$fh>) {
push #array, $_;
shift #array if #array > 4;
if (/script/) {
print #array;
push #found, join "", #array; # <----- this line
}
}
You could just use a scalar, e.g. $found = join "", #array, but then you would only store the last match in the loop.
Suppose the loop is finished, and now you have all the matches in array #found. If you want them in a scalar, just join again:
my $found = join "", #found;
Or you can just add them all at once in the loop:
$found .= join "", #array;
It all depends on what you intend to do with the data. Having the data in a scalar is rarely more beneficial than having it in an array. For example, if you are going to print it, there is no difference, as print $found is equivalent to print #found, because print takes a list of arguments.
If your intent is to interpolate the matches into a string:
print "Found matches: $found";
print "Found matches: ", #found;
$whole = join(' ', #lines)
But if you're reading the text from a file, it's easier to just read it all in one chunk, by (locally) undefining the record delimiter:
local $/ = undef;
$whole = <FILE>
Depends on what you are trying to do, but if you are wanting to package up an array into a scalar so that it can be retrieved later, then you might want Storable.
use Storable;
my #array = qw{foo bar baz};
my $stored_array = freeze \#array;
...
my #retrieved_array = #{ thaw($stored_array) };
Then again it could be that your needs may be served by just storing a reference to the array.
my #array = qw{foo bar baz};
my $stored_array = \#array;
...
my #retrieved_array = #$stored_array;