mongoDB perl '$in' not working - mongodb

I wrote up a Perl line for querying in MongoDB, but it doesn't seem to be working.
my $cursor = $collection->find({'genes.symbol'=>{'$in' => [#gene_list]}});
The elements (gene symbols) in the #gene_list array are separated by spaces (" "). I don't know if this is the issue, because in MongoDB shell, the elements should be separated by a comma. If the #gene_list has to be an array with the elements separated by comma, how should I do it?
This #gene_list array include 10 genes:
"RAD51C","FRAS1","GRIP1","FREM2","CHMP1A","WRAP53","VAX1","ACTG2","RNASEH2A","CTC1"
so, when I do
my $count = $myCursor->count;
print $count;
I assumed it would print '10' as the count, however, based on my Perl line, it always printed '0', which means the query was not successful.

It's a shame you're so reluctant to give us any more information about your problem. As it is, all I can do is to offer you an example program which inserts 13 documents into a collection and then uses find in the same way as your own code does to retrieve a subset
The only thing I can spot that may be wrong in your own code is that you are using genes.symbol as a field name, which is a little bit odd. Are you sure it the collection isn't genes while the field is symbol?
use strict;
use warnings 'all';
use feature 'say';
use MongoDB;
my $dbh = MongoDB->connect;
my $collection = $dbh->ns('test.test');
$collection->delete_many({}); # Empty the collection
for my $val ( 'A' .. 'M' ) {
$collection->insert_one({data => $val});
}
my #filter = qw/ A F N Z /;
my $curs = $collection->find({ data => { '$in' => [ #filter ] } });
my $n = 0;
while ( my $doc = $curs->next ) {
printf "%2d: %s\n", ++$n, $doc->{data};
}
output
1: A
2: F

Related

Find nearest option in a Perl Hash

I have a hashref that has data tied to days of the calendar year, for example:
my $calendarEntries = { '1' => 'Entry 1', '5' => 'Entry 2', '15' => 'Entry 3' };
I can obtain the day of the year using DateTime:
state $moduleDateTime = require DateTime;
my $dt = DateTime->now('time_zone' => 'America/Chicago');
my $dayOfTheYear = $dt->strftime('%j');
However, I'm trying to figure out the most efficient way to handle situations where the current day does not match any of the days in the hash. I'd like to always "round down" in those situations. E.g. today (which is the 7th day of the year), I'd like to load the entry with the key '5', since it is the most "recent" entry.
Is there a way to select a key in a hashref that is the closest candidate for being <= $dayOfTheYear? If I were using DBD, I could do a query like this:
'SELECT entry WHERE `key` <= ' . $dayOfTheYear . ' ORDER BY `key` DESC LIMIT 1'
But, I'd rather avoid needing to create a database and call it, if I can do something natively in Perl.
One way, expecting many searches
use List::MoreUtils qw(last_value);
my #entries = sort { $a <=> $b } keys %$calendarEntries;
my $nearest_le = last_value { $day >= $_ } #entries;
This returns the last element that is less or equal, for any input, so the key of interest.
The drawback of using simply a hash is that one needs an extra data structure to build. Any library that offers this sort of lookup must do that as well, of course, but those then come with other goodies and may be considerably better performing (depending on how often this is done).
If this 'rounding' need be done a lot for a given hash then it makes sense to build a lookup table for days, associating each with its nearest key in the hash.† ‡
If #entries is sorted descending ($b <=> $a) then the core List::Util::first does it.
† For example
my %nearest_le;
my #keys = sort { $a <=> $b } keys %$calendarEntries;
for my $day (1..366) {
for my $k (#keys) {
if ($k <= $day) {
$nearest_le{$day} = $k;
}
else { last }
}
};
This enumerates days of the year, as specified in the question.
‡ If this were needed for things other than the days (366 at most), where long lists may be expected, a better algorithmic behavior is afforded by binary searches on sorted lists (O(log n)).
The library used above, List::MoreUtils, also has lower_bound with O(log n)
Returns the index of the first element in LIST which does not compare less than val.
So this needs a few adjustments, for
use List::MoreUtils qw(lower_bound);
my #keys = sort { $a <=> $b } keys %$calendarEntries;
my $nearest_le = exists $calendarEntries->{$day}
? $day
: $keys[ -1 + lower_bound { $_ <=> $day } #keys ];
A nice simple solution.
use List::Util qw( max );
max grep { $_ <= $dayOfTheYear } keys %$calendarEntries
Notes:
Best to make sure $calendarEntries->{ $dayOfTheYear } doesn't exist first.
You'll need to handle the case where there is no matching key.
It's faster than sorting unless you perform many searches. But even then, we're only dealing with at most 365 keys, so simplicity is key here.
The simplest solution is to simply look up the value for your date, and if it is not found, go down until you find a value. In this sample, I included a rudimentary error handling.
use strict;
use warnings;
use feature 'say';
my $calendarEntries = { '1' => 'Entry 1', '5' => 'Entry 2', '15' => 'Entry 3' };
my $find = shift // 7; # for testing purposes
my $date = get_nearest_below($calendarEntries, $find);
if (defined $date) {
say "Nearest date below to '$find' is '$date'";
} else { # error handling
warn "Nearest date below not found for '$find'";
}
sub get_nearest_below {
my ($href, $n) = #_;
while ($n > 0) { # valid dates are > 0
return $n if defined $href->{$n}; # find a defined value
$n--; # or go to the next key below
}
return undef; # or return error if nothing is found before 0
}
Output:
$ foo.pl
Nearest date below to '7' is '5'
$ foo.pl 12
Nearest date below to '12' is '5'
$ foo.pl 123
Nearest date below to '123' is '15'
$ foo.pl 0
Nearest date below not found for '0' at foo.pl line 13.

Find matching list item where another list has the match strings

I have a list of text strings and a hash of RE / match strings + tags and I need, unsurprisingly, to match them up. I also need to know if there is any item in the first list that isn't matched by one of the items in the second, for example:
#strings = qw(red orange yellow blue);
%matches = (
re => "apple",
or => "orange",
ye => "banana",
);
I need to match up each of the $strings with a $hash_value (red <-> apple for example) and raise an error for 'blue'.
I can do this with two nested loops, iterating through #strings with the $hash_keys looking for a match and getting the hash value. I think I need to use a flag variable to identify whether the inner $hash_keys loop finished without matching anything.
I can probably also do it by iterating through the $hash_keys and 'grep'ing #strings, but I don't know how to spot unmatched $strings.
Both of those options seems really clunky and I feel like there must be a 'cleaner' way. Am I missing something obvious?
EDIT
Apologies, the code I have is something like:
foreach $string (#strings) {
$match = "false";
foreach $key (keys %matches) {
if ( /$key/ =~ $string) {
&do_my_thing($string,$matches{$key});
$match = "true";
break;
}
}
( $match eq "false" ) && raise_unmatched_error($string);
}
This does what I want, identifies which mask (key) matched the string and also flags unmatched strings, but it just looks inelegant. It seems to me that it should be possible to use map, maybe with grep, to a) match the hash keys with the strings, and b) identify unmatched strings.
I might be over thinking it though; it works, it's probably maintainable, maybe it doesn't need to be anything else.
Take all of the keys in your hash and turn them into a regex (using the alternation operator so that you can match on any of them). You can then do one regex match against each string in #strings. If you put capturing parentheses around the regex, the string that matches will end up in $1. And if the string doesn't match the regex (as with "blue") you can display the required warning.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #strings = qw[red orange yellow blue];
my %matches = (
re => 'apple',
or => 'orange',
ye => 'banana',
);
my $match_re = join '|', keys %matches;
say "DEBUG: $match_re";
for (#strings) {
if (/($match_re)/) {
say "$_ -> $matches{$1}";
} else {
say "No match for '$_'";
}
}

How can I compare two arrays with alphanumeric elements?

I have two arrays that I want to compare
#array1 = ( aaa, bbb, aaabbb, aaa23bbb, ddd555, 430hd9789);
#array2 = ( 34322hh2, jjfjr78, uuu7shv, ddd555, hjkdjroo);
I have to compare these two arrays and find duplicate and do something about it.
Conditions:
Length of each element in array can be different. There is no such fixed pattern.
Elements can be just numeric i.e. 334343, or just char i.e. "somewordexample", or it can alphanumeric i.e. wewe83493
There can be more such elements in the array.
Now I know the following about comparison operators == and eq:
== is for comparing numbers
eq is for string comparison
How can I compare alphanumeric values?
This is my code so far
for (my $i = 0 ; $i <= $#array1 ; $i++ ) {
for (my $j = 0 ; $j <= $#array2 ; $j++ ) {
if ( $array1[$i] == $arra2[$j] ) {
print "duplicate";
}
}
}
You manner is indolent, and you seem to be looking for a quick fix without caring whether you understand the solution. The posts on Stack Overflow are primarily for people other than the originator who may have a similar problem.
You should read perlfaq4. Specifically:
perldoc -q intersection - "How do I compute the difference of two arrays? How do I compute the intersection of two arrays?"
perldoc -q contained - "How can I tell whether a certain element is contained in a list or array?"
perldoc -q duplicate - "How can I remove duplicate elements from a list or array?"
Thank you for posting your misbehaving code.
There are a few problems
You must always use strict and use warnings at the top of every Perl program, and declare each variable as close as possible to its first point of use. That simple measure will reveal many faults for you that you may otherwise overlook
I have used qw to define the array data
It is much better to use the Perl foreach than the C-style for
As you appear to have discovered, the == operator is for comparing numbers. You have strings so you need eq
Apart from that, all I have changed in your code is to mention the text of the duplicate entry instead of just printing "duplicate"
use strict;
use warnings;
my #array1 = qw( aaa bbb aaabbb aaa23bbb ddd555 430hd9789 );
my #array2 = qw( 34322hh2 jjfjr78 uuu7shv ddd555 hjkdjroo );
for my $i (0 .. $#array1) {
for my $j (0 .. $#array2) {
if ( $array1[$i] eq $array2[$j] ) {
print "Duplicate '$array1[$i]'\n";
}
}
}
output
Duplicate 'ddd555'
Your alphanumeric values can still be treated as strings. If you want to find elements that are in both your lists, you can use the get_intersection function provided by the List::Compare module:
use strict;
use warnings;
use List::Compare;
my #array1 = qw(aaa bbb aaabbb aaa23bbb ddd555 430hd9789);
my #array2 = qw(34322hh2 jjfjr78 uuu7shv ddd555 hjkdjroo);
my $comp = List::Compare->new(\#array1, \#array2);
my #duplicates = $comp->get_intersection();
if (#duplicates > 0) {
print "#duplicates\n";
}
Output:
ddd555
Alphanumeric values are just strings. Numeric values are a subset of those that Perl considers to be numeric (i.e. Scalar::Util::looks_like_number() returns true). In this case, you could use eq or any other string-related function for comparison (such as the less commonly used index).
To find exact duplicates in O(n) time
my %seen;
for my $duplicate (grep { ++$seen{$_} > 1 } (#array1, #array2))
{
# Do what you need to do to the duplicates
}
If you just want to get rid of the elements of #array1 that are duplicated in #array2,
my %seen = map { $_ => 1 } #array2;
#array1 = grep { not $seen{$_} } #array1;
you can do this using exact matching regex:
if("4lph4" =~ /^4lph4$/)
{ .... }

Simplest way to match array of strings to search in perl?

What I want to do is check an array of strings against my search string and get the corresponding key so I can store it. Is there a magical way of doing this with Perl, or am I doomed to using a loop? If so, what is the most efficient way to do this?
I'm relatively new to Perl (I've only written 2 other scripts), so I don't know a lot of the magic yet, just that Perl is magic =D
Reference Array: (1 = 'Canon', 2 = 'HP', 3 = 'Sony')
Search String: Sony's Cyber-shot DSC-S600
End Result: 3
UPDATE:
Based on the results of discussion in this question, depending on your intent/criteria of what constitutes "not using a loop", the map based solution below (see "Option #1) may be the most concise solution, provided that you don't consider map a loop (the short version of the answers is: it's a loop as far as implementation/performance, it's not a loop from language theoretical point of view).
Assuming you don't care whether you get "3" or "Sony" as the answer, you can do it without a loop in a simple case, by building a regular expression with "or" logic (|) from the array, like this:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
Result from my test run: Sony
The regular expression will (once the variable $combined_search is interpolated by Perl) take the form /(Canon|HP|Sony)/ which is what you want.
This will NOT work as-is if any of the strings contain regex special characters (such as | or ) ) - in that case you need to escape them
NOTE: I personally consider this somewhat cheating, because in order to implement join(), Perl itself must do a loop somewhere inside the interpeter. So this answer may not satisfy your desire to remain loop-less, depending on whether you wanted to avoid a loop for performance considerations, of to have cleaner or shorter code.
P.S. To get "3" instead of "Sony", you will have to use a loop - either in an obvious way, by doing 1 match in a loop underneath it all; or by using a library that saves you from writing the loop yourself but will have a loop underneath the call.
I will provide 3 alternative solutions.
#1 option: - my favorite. Uses "map", which I personally still consider a loop:
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
print "$which_found[0]\n";
die "Not found" unless #which_found;
my $strings_index = 0;
my %strings_indexes = map {$_ => $strings_index++} #strings;
my $index = 1 + $strings_indexes{ $which_found[0] };
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#2 option: Uses a loop hidden behind a nice CPAN library method:
use List::MoreUtils qw(firstidx);
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
my $combined_search = join("|",#strings);
my #which_found = ($search_in =~ /($combined_search)/);
die "Not Found!"; unless #which_found;
print "$which_found[0]\n";
my $index_of_found = 1 + firstidx { $_ eq $which_found[0] } #strings;
# Need to add 1 since arrays in Perl are zero-index-started and you want "3"
#3 option: Here's the obvious loop way:
my $found_index = -1;
my #strings = ("Canon", "HP", "Sony");
my $search_in = "Sony's Cyber-shot DSC-S600";
foreach my $index (0..$#strings) {
next if $search_in !~ /$strings[$index]/;
$found_index = $index;
last; # quit the loop early, which is why I didn't use "map" here
}
# Check $found_index against -1; and if you want "3" instead of "2" add 1.
Here is a solution that builds a regular expression with embedded code to increment the index as perl moves through the regex:
my #brands = qw( Canon HP Sony );
my $string = "Sony's Cyber-shot DSC-S600";
use re 'eval'; # needed to use the (?{ code }) construct
my $index = -1;
my $regex = join '|' => map "(?{ \$index++ })\Q$_" => #brands;
print "index: $index\n" if $string =~ $regex;
# prints 2 (since Perl's array indexing starts with 0)
The string that is prepended to each brand first increments the index, and then tries to match the brand (escaped with quotemeta (as \Q) to allow for regex special characters in the brand names).
When the match fails, the regex engine moves past the alternation | and then the pattern repeats.
If you have multiple strings to match against, be sure to reset $index before each. Or you can prepend (?{$index = -1}) to the regex string.
An easy way is just to use a hash and regex:
my $search = "your search string";
my %translation = (
'canon' => 1,
'hp' => 2,
'sony' => 3
);
for my $key ( keys %translation ) {
if ( $search =~ /$key/i ) {
return $translation{$key};
)
}
Naturally the return can just as easily be a print. You can also surround the entire thing in a while loop with:
while(my $search = <>) {
#your $search is declared = to <> and now gets its values from STDIN or strings piped to this script
}
Please also take a look at perl's regex features at perlre
and take a look at perl's data structures at perlref
EDIT
as was just pointed out to me you were trying to steer away from using a loop. Another method would be to use perl's map function. Take a look here.
You can also take a look at Regexp::Assemble, which will take a collection of sub-regexes and build a single super-regex from them that can then be used to test for all of them at once (and gives you the text which matched the regex, of course). I'm not sure that it's the best solution if you're only looking at three strings/regexes that you want to match, but it's definitely the way to go if you have a substantially larger target set - the project I initially used it on has a library of some 1500 terms that it's matching against and it performs very well.

Converting code to perl sub, but not sure I'm doing it right

I'm working from a question I posted earlier (here), and trying to convert the answer to a sub so I can use it multiple times. Not sure that it's done right though. Can anyone provide a better or cleaner sub?
I have a good deal of experience programming, but my primary language is PHP. It's frustrating to know how to execute in one language, but not be able to do it in another.
sub search_for_key
{
my ($args) = #_;
foreach $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
my $thiskey = NULL;
my #result = map { $args->{search_ary}[$_][0] } # Get the 0th column...
grep { #$args->{search_in} =~ /$args->{search_ary}[$_][1]/ } # ... of rows where the
0 .. $#array; # first row matches
$thiskey = #result;
print "\nReturning: " . $thiskey . "\n";
return $thiskey;
}
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
---Edit---
From the answers so far, I've cobbled together the function below. I'm new to Perl, so I don't really understand much of the syntax. All I know is that it throws an error (Not an ARRAY reference at line 26.) about that grep line.
Since I seem to not have given enough info, I will also mention that:
I am calling this function like this (which may or may not be correct):
search_for_key({
'search_ary' => $ref_cam_make,
'search_in' => 'Canon EOS Rebel XSi'
});
And $ref_cam_make is an array I collect from a database table like this:
$ref_cam_make = $sth->fetchall_arrayref;
And it is in the structure like this (if I understood how to make the associative fetch work properly, I would like to use it like that instead of by numeric keys):
Reference Array
Associative
row[1][cam_make_id]: 13, row[1][name]: Sony
Numeric
row[1][0]: 13, row[1][1]: Sony
row[0][0]: 19, row[0][1]: Canon
row[2][0]: 25, row[2][1]: HP
sub search_for_key
{
my ($args) = #_;
foreach my $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
print grep { $args->{search_in} =~ #$args->{search_ary}[$_][1] } #$args->{search_ary};
}
You are moving in the direction of a 2D array, where the [0] element is some sort of ID number and the [1] element is the camera make. Although reasonable in a quick-and-dirty way, such approaches quickly lead to unreadable code. Your project will be easier to maintain and evolve if you work with richer, more declarative data structures.
The example below uses hash references to represent the camera brands. An even nicer approach is to use objects. When you're ready to take that step, look into Moose.
use strict;
use warnings;
demo_search_feature();
sub demo_search_feature {
my #camera_brands = (
{ make => 'Canon', id => 19 },
{ make => 'Sony', id => 13 },
{ make => 'HP', id => 25 },
);
my #test_searches = (
"Sony's Cyber-shot DSC-S600",
"Canon cameras",
"Sony HPX-32",
);
for my $ts (#test_searches){
print $ts, "\n";
my #hits = find_hits($ts, \#camera_brands);
print ' => ', cb_stringify($_), "\n" for #hits;
}
}
sub cb_stringify {
my $cb = shift;
return sprintf 'id=%d make=%s', $cb->{id}, $cb->{make};
}
sub find_hits {
my ($search, $camera_brands) = #_;
return grep { $search =~ $_->{make} } #$camera_brands;
}
This whole sub is really confusing, and I'm a fairly regular perl user. Here are some blanket suggestions.
Do not create your own undef ever -- use undef then return at the bottom return $var // 'NULL'.
Do not ever do this: foreach $row, because foreach my $row is less prone to create problems. Localizing variables is good.
Do not needlessly concatenate, for it offends the style god: not this, print "\nReturning: " . $thiskey . "\n";, but print "\nReturning: $thiskey\n";, or if you don't need the first \n: say "Returning: $thiskey;" (5.10 only)
greping over 0 .. $#array; is categorically lame, just grep over the array: grep {} #{$foo[0]}, and with that code being so complex you almost certainly don't want grep (though I don't understand what you're doing to be honest.). Check out perldoc -q first -- in short grep doesn't stop until the end.
Lastly, do not assign an array to a scalar: $thiskey = #result; is an implicit $thiskey = scalar #result; (see perldoc -q scalar) for more info. What you probably want is to return the array reference. Something like this (which eliminates $thiskey)
printf "\nReturning: %s\n", join ', ', #result;
#result ? \#result : 'NULL';
If you're intending to return whether a match is found, this code should work (inefficiently). If you're intending to return the key, though, it won't -- the scalar value of #result (which is what you're getting when you say $thiskey = #result;) is the number of items in the list, not the first entry.
$thiskey = #result; should probably be changed to $thiskey = $result[0];, if you want mostly-equivalent functionality to the code you based this off of. Note that it won't account for multiple matches anymore, though, unless you return #result in its entirety, which kinda makes more sense anyway.