returning multiple dissimilar data structures from R function in PL/R - perl

I have been looking at various discussions here on SO and other places, and the general consensus seems that if one is returning multiple non-similar data structures from an R function, they are best returned as a list(a, b) and then accessed by the indexes 0 and 1 and so on. Except, when using an R function via PL/R inside a Perl program, the R list function flattens the list, and also stringifies even the numbers. For example
my $res = $sth->fetchrow_arrayref;
# now, $res is a single, flattened, stringified list
# even though the R function was supposed to return
# list([1, "foo", 3], [2, "bar"])
#
# instead, $res looks like c(\"1\", \""foo"\", \"3\", \"2\", \""bar"\")
# or some such nonsense
Using a data.frame doesn't work because the two arrays being returned are not symmetrical, and the function croaks.
So, how do I return a single data structure from an R function that is made up of an arbitrary set of nested data structures, and still be able to access each individual bundle from Perl as simply $res->[0], $res->[1] or $res->{'employees'}, $res->{'pets'}? update: I am looking for an R equiv of Perl's [[1, "foo", 3], [2, "bar"]] or even [[1, "foo", 3], {a => 2, b => "bar"}]
addendum: The main thrust of my question is how to return multiple dissimilar data structures from a PL/R function. However, the stringification, as noted above, and secondary, is also problematic because I convert the data to JSON, and all those extra quotes just add to useless data transferred between the server and the user.

I think you have a few problems here. The first is you can't just return an array in this case because it won't pass PostgreSQL's array checks (arrays must be symmetric, all of the same type, etc). Remember that if you are calling PL/R from PL/Perl across a query interface, PostgreSQL type constraints are going to be an issue.
You have a couple of options.
You could return setof text[], with one data type per row.
you could return some sort of structured data using structures PostgreSQL understands, like:
CREATE TYPE ab AS (
a text,
b text
);
CREATE TYPE r_retval AS (
labels text[],
my_ab ab
);
This would allow you to return something like:
{labels => [1, "foo", 3], ab => {a => 'foo', b => 'bar'} }
But at any rate you have to put it into a data structure that the PostgreSQL planner can understand and that is what I think is missing in your example.

Related

getting list of values with list of keys for dictionary in Matlab

Suppose I use containers map to create a dictionary in MATLAB which has the following map:
1-A;
2-B;
3-C;
Denote the dictionary as D.
Now I have an input list [2,1,3], and what I am expecting is [B,A,C]. The problem is, I can't just use [2,1,3] as the input list for D, but only input 2,1 and 3 one by one for D and get B, A, C each time.
This can get the job done but as you can see, it's a bit less efficient.
So my question is: is there anything else I can do to let the dictionary return the whole list at the same time?
As far as I can find there is no one-step solution like python's dict.items. You can, however, get in a few lines. mydict.keys() gives you the keys of the dict as a cell array, and mydict.values() gives you the values as a cell array, so you can (in theory) combine those:
>> mykeys = mydict.keys();
>> myvals = mydict.values();
>> mypairs = [mykeys',myvals']
mypairs =
3×2 cell array
'A' [1]
'B' [2]
'C' [3]
However, in principle maps are unordered, and I can't find anything in the MATLAB documentation that says that the order returns by keys and the order returned by values is necessarily consistent (unlike Python). So if you want to be extra safe, you can call values with a cell array of the keys you want, which in this case would be all the keys:
>> mykeys = mydict.keys();
>> myvals = mydict.values(mykeys);
>> mypairs = [mykeys',myvals']
mypairs =
3×2 cell array
'A' [1]
'B' [2]
'C' [3]

How do I turn the Result type into something useful?

I wanted a list of numbers:
auto nums = iota(0, 5000);
Now nums is of type Result. It cannot be cast to int[], and it cannot be used as a drop-in replacement for int[].
It's not very clear from the docs how to actually use an iota as a range. Am I using the wrong function? What's the way to make a "range" in D?
iota, like many functions in Phobos, is lazy. Result is a promise to give you what you need when you need it but no value is actually computed yet. You can pass it to a foreach statement for example like so:
import std.range: iota;
foreach (i ; iota(0, 5000)) {
writeln(i);
}
You don't need it for a simple foreach though:
foreach (i ; 0..5000) {
writeln(i);
}
That aside, it is hopefully clear that iota is useful by itself. Being lazy also allows for costless chaining of transformations:
/* values are computed only once in writeln */
iota(5).map!(x => x*3).writeln;
// [0, 3, 6, 9, 12]
If you need a "real" list of values use array from std.array to delazify it:
int[] myArray = iota(0, 5000).array;
As a side note, be warned that the word range has a specific meaning in D that isn't "range of numbers" but describes a model of iterators much like generators in python. iota is a range (so an iterator) that produced a range (common meaning) of numbers.

output a structure from a input cell

I have a cell that has different data types (cell, logical, double, char) except structure. Now I have to write a function that will sort out different data types and output a structure with the field of those data types. The fields have to appear according to their appearance in the cell. So, if the first 'n' element(s) of the cell is double and the (n+1)th element is a char then the first field of the output structure will be double and second field will be char.
Below is an example where buildStructure is the function header. sa is the output structure.
ca = {'Moriarty', [true, false], false, {'Pink Suitcase'}}
sa = buildStructure(ca)
sa=>
char: {'Moriarty'}
logical: {[true, false] [false]}
cell: {{'Pink Suitcase'}}
I tried it writing a for loop to store different data types in different cells. However, then I am feeling so lost. How can I figure out which data type appeared when? To do that I stored all the classes in a huge string then used 'strfind' to find the place (thus time) of particular data type. But it is making things only complex. Any help will be appreciated! Thanks.
There are tests for all the data types. see: iscell, ischar, islogial and so on. Their results can be used to index the input.
you can complete this example code:
function out = magicfun(varargin)
il = cellfun(#islogical,varargin);
out = struct('logical',{varargin(il)});
You can use class(), isa(), and unique() to do it generically. It's like bdecaf's approach, but that'll require you to write a test for every type and use a variety of functions. Using class and isa will generalize to data of any type using a single test, and will be shorter to write.
Exact Types Only
By comparing class names from class(), you can partition the input in to types based on the exact (most specific) type of each input. The 'stable' option for unique() keeps the output fields in the order of the first occurrences of the types in the input. (In production code I would probably omit the 'stable' so the output ordering is canonicalized based on the type name, but it depends on your requirements.)
function out = break_types(in)
%BREAK_TYPES Partition a cell array based on the types of its contents
inTypes = cellfun(#class, in, 'UniformOutput',false);
[types,ax,bx] = unique(inTypes, 'stable');
out = struct;
for i = 1:numel(types)
ix = (bx == i);
out.(types{i}) = in(ix);
end
This is pretty complete and should work with anything that didn't do something silly like override class() or isa().
>> ca = {'Moriarty', [true, false], false, {'Pink Suitcase'}};
>> break_types(ca)
ans =
char: {'Moriarty'}
logical: {[1 0] [0]}
cell: {{1x1 cell}}
>>
Considering Inheritance
If you use isa(), you'll also pick up inheritance relationships for classes. For basic Matlab types, this will give you the same answer as the other implementation. But for classes that inherit from other types, it will categorize them in to all the types they match in the input and required lists.
function out = break_types(in)
%BREAK_TYPES Partition a cell array based on the types of its contents
inTypes = cellfun(#class, in, 'UniformOutput',false);
types = unique(inTypes, 'stable');
out = struct;
for i = 1:numel(types)
ix = cellfun(#(x) isa(x, types{i}), in);
out.(types{i}) = in(ix);
end
If you want to ensure that the output struct has an entry for some types even if there are no inputs of that type (so its field would contain an empty array), just append those type names to types before passing them to unique:
requiredTypes = { 'cell' 'int8', 'double', 'float' };
types = unique([inTypes requiredTypes], 'stable');

How to get a comma-separated list directly from table.Properties.VariableNames?

Example:
>> A = table({}, {}, {}, {}, {}, ...
'VariableNames', {'Foo', 'Bar', 'Baz', 'Frobozz', 'Quux'});
>> vn = A.Properties.VariableNames;
>> isequal(vn, A.Properties.VariableNames)
ans =
1
So far so good, but even though vn and A.Properties.VariableNames appear to be the same, they behave very differently when one attempts to get a "comma-separated list" from them (using {:}):
>> {'Frobnitz', vn{:}}
ans =
'Frobnitz' 'Foo' 'Bar' 'Baz' 'Frobozz' 'Quux'
>> {'Frobnitz', A.Properties.VariableNames{:}}
ans =
'Frobnitz' 'Foo'
Is there a way to get a "comma-separated list" from A.Properties.VariableNames directly (that is, without having to create an intermediate variable like vn)?
(Also, is there a more reliable function than isequal to test for equality of cell arrays? In the example above vn and A.Properties.VariableNames are clearly not equal enough!)
For those who don't have a version of MATLAB that supports the (rather new) table objects, it's the same story if one uses dataset objects (from the Statistics toolbox) instead. The example above would then translate to:
clear('A', 'vn');
A = dataset({}, {}, {}, {}, {}, ...
'VarNames', {'Foo', 'Bar', 'Baz', 'Frobozz', 'Quux'});
vn = A.Properties.VarNames;
isequal(vn, A.Properties.VarNames)
{'Frobnitz', vn{:}}
{'Frobnitz', A.Properties.VarNames{:}}
(Note the change from VariableNames to VarNames; output omitted: it's identical to the output shown above):
There's no problem with isequal. vn and A.Properties.VariableNames are in fact equal. The problem is something else...
If you type help dataset.subsref, you will get an explanation of why this is happening which should be the same explanation as for the table class:
LIMITATIONS:
Subscripting expressions such as A.CellVar{1:2}, A.StructVar(1:2).field,
or A.Properties.ObsNames{1:2} are valid, but result in subsref
returning multiple outputs in the form of a comma-separated list. If
you explicitly assign to output arguments on the LHS of an assignment,
for example, [cellval1,cellval2] = A.CellVar{1:2}, those variables will
receive the corresponding values. However, if there are no output
arguments, only the first output in the comma-separated list is
returned.
In short, when you invoke the line A.Properties.VarNames{:}, you are making a call to the dataset.subsref method and the curly-brace subscript {:} is being passed to it right along with the other . subscripts, as opposed to being applied separately after the call to the dataset.subsref method.
Because of this, it doesn't look like you can get a comma-separated list directly from A without using an intermediate variable. However, if your goal (as in your example) is to concatenate the strings together with another string into a new cell array, you could do this:
>> [{'Frobnitz'} A.Properties.VarNames]
ans =
'Frobnitz' 'Foo' 'Bar' 'Baz' 'Frobozz' 'Quux'
No, I don't think there is anything you can do except create the temporary variable vn. It's long been a troubling shortcoming of user-defined classes that they cannot do comma separated list expansion. I do find it strange, though, that TMW chose to implement the table class in the user-defined class framework.
As for isequal, there is no issue there. The behavior you see has nothing to do with vn and A.Properties.VariableNames not being equal.

What is the best way to store 1key - 3 value in Perl?

I have a situation where I have 3 different values for each key. I have to print the data like this:
K1 V1 V2 V3
K2 V1 V2 V3
…
Kn V1 V2 V3
Is there any alternate efficient & easier way to achieve this other that that listed below? I am thinking of 2 approaches:
Maintain 3 hashes for 3 different values for each key.
Iterate through one hash based on the key and get the values from other 2 hashes
and print it.
Hash 1 - K1-->V1 ...
Hash 2 - K1-->V2 ...
Hash 3 - K1-->V3 ...
Maintain a single hash with key to reference to array of values.
Here I need to iterate and read only 1 hash.
K1 --> Ref{V1,V2,V3}
EDIT1:
The main challenge is that, the values V1, V2, V3 are derived at different places and cannot be pushed together as the array. So if I make the hash value as a reference to array, I have to dereference it every time I want to add the next value.
E.g., I am in subroutine1 - I populated Hash1 - K1-->[V1]
I am in subroutine2 - I have to de-reference [V1], then push V2. So now the hash
becomes K1-->[V1 V2], V3 is added in another routine. K1-->[V1 V2 V3]
EDIT2:
Now I am facing another challenge. I have to sort the hash based on the V3.
Still is it feasible to store the hash with key and list reference?
K1-->[V1 V2 V3]
It really depends on what you want to do with your data, although I can't imagine your option 1 being convenient for anything.
Use a hash of arrays if you are happy referring to your V1, V2, V3 using indexes 0, 1, 2 or if you never really want to handle their values separately.
my %data;
$data{K1}[0] = V1;
$data{K1}[1] = V2;
$data{K1}[2] = V3;
or, of course
$data{K1} = [V1, V2, V3];
As an additional option, if your values mean something nameable you could use a hash of hashes, so
my %data;
$data{K1}{name} = V1;
$data{K1}{age} = V2;
$data{K1}{height} = V3;
or
$data{K1}{qw/ name age height /} = (V1, V2, V3);
Finally, if you never need access to the individual values, it would be fine to leave them as they are in the file, like this
my %data;
$data{K1} = "V1 V2 V3";
But as I said, the internal storage is mostly dependent on how you want to access your data, and you haven't told us about that.
Edit
Now that you say
The main challenge is that, the values V1, V2, V3 are derived at
different places and cannot be pushed together as the array
I think perhaps the hash of hashes is more appropriate, but I wouldn't worry at all about dereferencing as it is an insignificant operation as far as execution time is concerned. But I wouldn't use push as that restricts you to adding the data in the correct order.
Depending which you prefer, you have the alternatives of
$data{K1}[2] = V3;
or
$data{K1}{height} = V3;
and clearly the latter is more readable.
Edit 2
As requested, to sort a hash of hashes by the third value (height in my example) you would write
use strict;
use warnings;
my %data = (
K1 => { name => 'ABC', age => 99, height => 64 },
K2 => { name => 'DEF', age => 12, height => 32 },
K3 => { name => 'GHI', age => 56, height => 9 },
);
for (sort { $data{$a}{height} <=> $data{$b}{height} } keys %data) {
printf "%s => %s %s %s\n", $_, #{$data{$_}}{qw/ name age height / };
}
or, if the data was stored as a hash of arrays
use strict;
use warnings;
my %data = (
K1 => [ 'ABC', 99, 64 ],
K2 => [ 'DEF', 12, 32 ],
K3 => [ 'GHI', 56, 9 ],
);
for (sort { $data{$a}[2] <=> $data{$b}[2] } keys %data) {
printf "%s => %s %s %s\n", $_, #{$data{$_}};
}
The output for both scripts is identical
K3 => GHI 56 9
K2 => DEF 12 32
K1 => ABC 99 64
In terms of readability/maintainability the second seems superior to me. The danger with the first is that you could end up with keys present in one hash but not the others. Also, if I came across the first approach, I'd have to think about it for a while, whereas the first seems "natural" (or a more common idiom, or more practical, or something else which means I'd understand it more readily).
The second approach (one array reference for each key) is:
In my experience, far more common,
Easier to maintain, since you only have one data structure floating around instead of three, and
More in line with the DRY principle: "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system." Represent a key once, not three times.
Sure, it's better to mantain only one data structure:
%data = ( K1=>[V1, V2, V3], ... );
You can use Data::Dump for a fast view/debug of your data structure.
The choice really depends on the usage pattern. Specifically, it depends on whether you use procedural program or object-oriented programming.
This is a philosophical difference, and it's unrelated to whether language-level classes and objects are used or not. Procedural programming is organised around work flow; procedures accesses and transforms whatever data it needs. OOP is organised around records of data; methods access and transform one particular record only.
The second approach is closely aligned with object-oriented programming. Object-oriented programming is by far the most common programming style in Perl, so the second approach is almost universally the preferred structure these days (even though it takes more memory).
But your edit implied you might be using a more a procedural approach. As you discovered, the first approach is more convenient for procedural programming. It was very commonly used when procedural programming was in vogue.
Take whatever suits your code's organisation best.