How can I filter a Perl DBIx recordset with 2 conditions on the same column? - perl

I'm getting my feet wet in DBIx::Class — loving it so far.
One problem I am running into is that I want to query records, filtering out records that aren't in a certain date range.
It took me a while to find out how to do a <= type of match instead of an equality match:
my $start_criteria = ">= $start_date";
my $end_criteria = "<= $end_date";
my $result = $schema->resultset('MyTable')->search(
{
'status_date' => \$start_criteria,
'status_date' => \$end_criteria,
});
The obvious problem with this is that since the filters are in a hash, I am overwriting the value for "status_date", and am only searching where the status_date <= $end_date. The SQL that gets executed is:
SELECT me.* from MyTable me where status_date <= '9999-12-31'
I've searched CPAN, Google and SO and haven't been able to figure out how to apply 2 conditions to the same column. All documentation I've been able to find shows how to filter on more than 1 column, but not 2 conditions on the same column.
I'm sure I'm missing something obvious. Can someone here point it out to me?

IIRC, you should be able to pass an array reference of multiple search conditions (each in its own hashref.) For example:
my $result = $schema->resultset('MyTable')->search(
[ { 'status_date' => \$start_criteria },
{ 'status_date' => \$end_criteria },
]
);
Edit: Oops, nervermind. That does an OR, as opposed to an AND.
It looks like the right way to do it is to supply a hashref for a single status_date:
my $result = $schema->resultset('MyTable')->search(
{ status_date => { '>=' => $start_date, '<=' => $end_date } }
);
This stuff is documented in SQL::Abstract, which DBIC uses under the hood.

There is BETWEEN in SQL and in DBIx::Class it's supported:
my $result = $schema->resultset('MyTable')
->search({status_date => {between => [$start_date,$end_date]}});

Related

How to take the inverse results of a MongoDB query?

I have a query,
SomeCollection.collection.find(:$or => [{tags:'cool'},{tags:'awesome'},{tags:'neat'},...])
I'd like to modify the opposite collection. So all the elements except the ones found in the previous statement. So in this example, I'd like all the results that are not tags:'cool' || tags:'awesome' || tags:'neat' || ....
I tried SomeCollection.collection.find(:$not => {:$or => [{tags:'cool'},{tags:'awesome'},{tags:'neat'},...]}) which gave syntax errors. I am also considering SomeCollection.collection.find(:$and => [{tags:{:$ne => 'cool'}}, {tags:{:$ne => 'awesome'}}, {tags:{:$ne => 'neat'}}]) which is the explicit inverse and gives me the correct answer. Is there a short hand way to get the inverse collection?
You can use $ne in the query.
SomeCollection.collection.find($or : [ {tags: {$ne:'cool}}, ..]
Since the values you are comparing values for the same field, you can use even $nin. The query would now look like,
SomeCollection.collection.find($or : [ {tags : {$nin : ['cool','awesome',..]} .. ]

How do I sort an array of strings given an arbitrary ordering of those strings?

I wish to sort an array of strings so that the strings wind up in the following order:
#set = ('oneM', 'twoM', 'threeM', 'sixM', 'oneY', 'twoY', 'oldest');
As you may notice, these represent time periods so oneM is the first month, etc. My problem is that I want to sort by the time period, but with the strings as they are I can't just use 'sort', so I created this hash to express how the strings should be ordered:
my %comparison = (
oneM => 1,
twoM => 2,
threeM => 3,
sixM => 6,
oneY => 12,
twoY => 24,
oldest => 25,
);
This I was hoping would make my life easier where I can do something such as:
foreach my $s (#set) {
foreach my $k (%comparison) {
if ($s eq $k) {
something something something
I'm getting the feeling that this is a long winded way of doing things and I wasn't actually sure how I would actually sort it once I've found the equivalent... I think I'm missing my own plot a bit so any help would be appreciated
As requested the expected output would be like how it is shown in #set above. I should have mentioned that the values in #set will be part of that set, but not necessarily all of them and not in the same order.
You've choose good strategy in precomputing data to form easy to sort. You can calculate this data right inside sorting itself, but then you'd be wasting time for recalculation each time sort needs to compare value, which happens more than once through process. On the other hand, the drawback of cache is, obviously, that you'd need additional memory to store it and it might slow down your sort under low memory condition, despite doing less calculations overall.
With your current set up sorting is as easy as:
my #sorted = sort { $comparison{$a} <=> $comaprison{$b} } #set;
While if you want to save memory at expense of CPU it'd be:
my #sorted = sort { calculate_integer_based_on_input{$a} <=> calculate_integer_based_on_input{$b} } #set;
with separate calculate_integer_based_on_input function that would convert oneY and the like to 12 or other corresponding value on the fly or just inline conversion of input to something suitable for sorting.
You might also want to check out common idioms for sorting with caching computations, like Schwartzian transform and Guttman Rosler Transform.
Giving an example with the input and you expected result would help. I guess that this is what you are looking for:
my #data = ( ... );
my %comparison = (
oneM => 1, twoM => 2, threeM => 3,
sixM => 6, oneY => 12, twoY => 24,
oldest => 25,
);
my #sorted = sort { $comparison{$a} <=> $comaprison{$b} } #data;
There are plenty of examples in the documentation for the sortfunction in the perlfunc manual page. ("perldoc -f sort")

Cassandra: get_range_slices of TimeUUID super column?

I have a schema of Row Keys 1-n. In each row there are a variable number of supercolumns with a TimeUUID 'name'. Im hoping to be able to query this data by a time range.
Two issues have come up:
in KeyRange -> the values that I put in for 'start_key' and 'end_key' are getting misunderstood (for lack of a better term) by Thrift. Experimenting with different groups of values Im not seeing what I expect and often get back something completely unexpected.
Example: my row keys are running from 1-1000 with lots of random gaps. I put start_key = 50 and end_key = 20 .. and I get back rows with keys ranging from 99 to 414.
Example: I have a known row with key = 13. Putting this value into start_key and end_key gives me no results.
Second issue: even when I do get results the 'columns' portion of the 'keyslice' is always empty. I have checked via cassandra-cli and I know there is data.
Im using Perl as follows:
my $slice_range = new Cassandra::SliceRange();
$slice_range->{ start } = create_UUID( UUID::Tiny::UUID_TIME, "2010-12-24 00:00:00" );
$slice_range->{ finish } = create_UUID( UUID::Tiny::UUID_TIME, "2011-12-25 00:00:00" );
my $slice_predicate = new Cassandra::SlicePredicate();
$slice_predicate->{ slice_range } = $slice_range;
my $key_range = new Cassandra::KeyRange();
$key_range->{ start_key } = 13;
$key_range->{ end_key } = 13;
my $result = $client->get_range_slices( $column_parent, $slice_predicate, $key_range, $consistency_level );
print Dumper( $result );
Clearly Im misunderstanding some basic precept.
EDIT: It turns out that the Perl library Im using is not properly documented. The UUID creation was not working as advertised. I opened it up, fixed it, and now its all going a bit more as I was expecting. I can slice my supercolumns by date/time range. Still working on getting the key range portion to work.
http://wiki.apache.org/cassandra/FAQ#range_rp covers why you're not seeing what you expect with key ranges.
You need to specify a SlicePredicate that contains the actual range of what you're trying to select. The default of no column_names and no slice_range will result in the empty columns list that you see.

In Linq to EF 4.0, I want to return rows matching a list or all rows if the list is empty. How do I do this in an elegant way?

This sort of thing:
Dim MatchingValues() As Integer = {5, 6, 7}
Return From e in context.entity
Where MatchingValues.Contains(e.Id)
...works great. However, in my case, the values in MatchingValues are provided by the user. If none are provided, all rows ought to be returned. It would be wonderful if I could do this:
Return From e in context.entity
Where (MatchingValues.Length = 0) OrElse (MatchingValues.Contains(e.Id))
Alas, the array length test cannot be converted to SQL. I could, of course, code this:
If MatchingValues.Length = 0 Then
Return From e in context.entity
Else
Return From e in context.entity
Where MatchingValues.Contains(e.Id)
End If
This solution doesn't scale well. My application needs to work with 5 such lists, which means I'd need to code 32 queries, one for every situation.
I could also fill MatchingValues with every existing value when the user doesn't want to use the filter. However, there could be thousands of values in each of the five lists. Again, that's not optimal.
There must be a better way. Ideas?
Give this a try: (Sorry for the C# code, but you get the idea)
IQueryable<T> query = context.Entity;
if (matchingValues.Length < 0) {
query = query.Where(e => matchingValues.Contains(e.Id));
}
You could do this with the other lists aswell.

Best way to store data in to hash for flexible "pivot-table" like calculations

I have a data set with following fields.
host name, model, location, port number, activated?, up?
I would convert them into a hash structure (perhaps similar to below)
my %switches = (
a => {
"hostname" => "SwitchA",
"model" => "3750",
"location" => "Building A"
"total_ports" => 48,
"configured_ports" => 30,
"used_ports" => 24,
},
b => {
"hostname" => "SwitchB",
"model" => "3560",
"location" => "Building B"
"total_ports" => 48,
"configured_ports" => 36,
"used_ports" => 20,
},
},
);
In the end I want to generate statistics such as:
No. of switches per building,
No. of switches of each model per building
Total no. of up ports per building
The statistics may not be just restricted to building wise, may be even switch based (i.e, no. of switches 95% used etc.,). With the given data structure how can I enumerate those counters?
Conversely, is there a better way to store my data? I can think of at least one format:
<while iterating over records>
{
hash{$location}->{$model_name}->count++;
if ($State eq 'Active') {hash{$location}->{up_ports}->count++};
What would be the better way to go about this? If I chose the first format (where all information is intact inside the hash) how can I mash the data to produce different statistics? (some example code snippets would be of great help!)
If you want querying flexibility, a "database" strategy is often good. You can do that directly, by putting the data into something like SQLite. Under that approach, you would be able to issue a wide variety of queries against the data without much coding of your own.
Alternatively, if you're looking for a pure Perl approach, the way to approximate a database table is by using an array-of-arrays or, even better for code readability, an array-of-hashes. The outer array is like the database table. Each hash within that array is like a database record. Your Perl-based queries would end up looking like this:
my #query_result = grep {
$_->{foo} == 1234 and
$_->{bar} eq 'fubb'
} #data;
If you have so many rows that query performance becomes a bottleneck, you can create your own indexes, using a hash.
%data_by_switch = (
'SwitchA' => [0, 4, 13, ...], # Subscripts to #data.
'SwitchB' => [1, 12, ...],
...
);
My answer is based on answers I received for this question, which has some similarities with your question.
As far as I can see you have a list of tuples, for the sake of the discussion it is enough to consider objects with 2 attributes, for example location and ports_used. So, for example:
(["locA", 23], ["locB", 42], ["locA", 13]) # just the values as tuples, no keys
And you want a result like:
("locA" => 36, "locB" => 42)
Is this correct? If so, what is the problem you are facing?