Exact type information about multidimensional arrays in Postgres - postgresql

Exact type information can be found in Postgres
I noticed that the type information is a bit inaccurate regarding miltidimensional arrays:
create table test (xid int[][] primary key);
insert into test values (array [[2, 4, 2, 7], [1, 5, 6, 0]])
select pg_typeof(xid) from test -- returns integer[]
How do I get back the exact type integer[][]?

This is the standard bevahior according to the manual :
The current implementation does not enforce the declared number of
dimensions either. Arrays of a particular element type are all
considered to be of the same type, regardless of size or number of
dimensions. So, declaring the array size or number of dimensions in
CREATE TABLE is simply documentation; it does not affect run-time
behavior.

Related

Swift Set - Difference between "randomElement" and "first"

As Set is "An unordered collection of unique elements.", the "first" element is kind of random. So, what is the difference between the functions randomElement and first ?
While the interface of Set is an unordered collection, internally it has an order that depends on its implementation. This is partially because you cant store anything on a computer truly unordered. It is also because Set conforms to Collection, it has the following property
Iterating over the elements of a collection by their positions yields the same elements in the same order as iterating over that collection using its iterator.
That means that it needs to have some kind of internal ordering to allow the consistency between different methods of iterating.
So while it's defined what value you'll get back from first, it will be consistent until a value is inserted or removed from the Set. randomElement will always return a randomly chosen element independent of what the underlying order is.
In this case, The main difference is getting the first element of a certain set you will always get the same value for the same execution. On the other hand, randomElement should return "random" element.
"same execution" means that the set should keep the same elements sorting once it's declared, however, if somehow the set declaring is re-executed, it might have a different sort. Example:
let mySet: Set<Int> = [1, 2, 3, 4, 5]
print(mySet) // let's consider it's: [5, 1, 2, 3, 4]
At this point, the sort of elements should be the same each time iterate through it; first should always give 5 in this case, but randomElement should give a random integer from 1 to 5. When the code for declaring mySet is re-executed, it might have a different sort, but it will keep returning the same first element.
As an example of re-executing the code, working with an iOS app and declared a set in a certain view controller, each popping/pushing from/to the navigation stack should cause declaring the set to be executed.

How to store a multi array of tuples in PostgreSQL

I have an array that looks like this [[(Double,Double)]]. It's a multi-dimensional array of tuples.
This is data that I will never query on, as it doesn't need to be queried. It only makes sense if it's like that on the client side. I'm thinking of storing the entire thing as string and then parsing it back to multi array.
Would that be a good approach and would the parsing be very expensive considering I can have a max of 20 arrays with 4 max inner array each with a tuple of 2 Double?
How would I check to see which is a better approach and if storing it as multi-dimensional array in PostgreSQL is the better approach?
How would I store it?
To store an array of composite type (with any nesting level), you need a registered base type to work with. You could have a table defining the row type, or just create the type explicitly:
CREATE TYPE dd AS (a float8, b float8);
Here are some ways to construct that 2-dimensional array of yours:
SELECT ARRAY [['(1.23,23.4)'::dd]]
, (ARRAY [['(1.23,23.4)']])::dd[]
, '{{"(1.23,23.4)"}}'::dd[]
, ARRAY[ARRAY[dd '(1.23,23.4)']]
, ARRAY(SELECT ARRAY (SELECT dd '(1.23,23.4)'));
Related:
How to pass custom type array to Postgres function
Pass array from node-postgres to plpgsql function
Note that the Postgres array type dd[] can store values with any level of nesting. See:
Mapping PostgreSQL text[][] type and Java type
Whether that's more efficient than just storing the string literal as text very much depends on details of your use case.
Arrays types occupy an overhead of 24 bytes plus the usual storage size of element values.
float8 (= double precision) occupies 8 bytes. The text string '1' occupies 2 bytes on disk and 4 bytes in RAM. text '123.45678' occupies 10 bytes on disk and 12 bytes in RAM.
Simple text will be read and written a bit faster than an array type of equal size.
Large text values are compressed (automatically), which can benefit storage size (especially with repetitive patterns) - but adds compression / decompression cost.
An actual Postgres array is cleaner in any case, as Postgres does not allow illegal strings to be stored.

Do MATLAB tables remove the need for dictionaries?

MATLAB tables let you index into any column/field using the row name, e.g., MyTable.FourthColumn('SecondRowName'). Compared to this, dictionaries (containers.Map) seem primitive, e.g., it serves the role of a 1-column table. It also has its own dedicated syntax, which slows down the thinking about how to code.
I'm beginning to think that I can forget the use of dictionaries. Are there typical situations for which that would not be advisable?
TL;DR: No. containers.Map has uses that cannot be replaced with a table. And I would not choose a table for a dictionary.
containers.Map and table have many differences worth noting. They each have their use. A third container we can use to create a dictionary is a struct.
To use a table as a dictionary, you'd define only one column, and specify row names:
T = table(data,'VariableNames',{'value'},'RowNames',names);
Here are some notable differences between these containers when used as a dictionary:
Speed: The struct has the fastest access by far (10x). containers.Map is about twice as fast as a table when used in an equivalent way (i.e. a single-column table with row names).
Keys: A struct is limited to keys that are valid variable names, the other two can use any string as a key. The containers.Map keys can be scalar numbers as well (floating-point or integer).
Data: They all can contain heterogeneous data (each value has a different type), but a table changes how you index if you do this (T.value(name) for homogeneous data, T.value{name} for heterogeneous data).
Syntax: To lookup the key, containers.Map provides the most straight-forward syntax: M(name). A table turned into a dictionary requires the pointless use of the column name: T.value(name). A struct, if the key is given by the contents of a variable, looks a little awkward: S.(name).
Construction: (See the code below.) containers.Map has the most straight-forward method for building a dictionary from given data. The struct is not meant for this purpose, and therefore it gets complicated.
Memory: This is hard to compare, as containers.Map is implemented in Java and therefore whos reports only 8 bytes (i.e. a pointer). A table can be more memory efficient than a struct, if the data is homogeneous (all values have the same type) and scalar, as in this case all values for one column are stored in a single array.
Other differences:
A table obviously can contain multiple columns, and has lots of interesting methods to manipulate data.
A stuct is actually a struct array, and can be indexed as S(i,j).(name). Of course name can be fixed, rather than a variable, leading to S(i,j).name. Of the three, this is the only built-in type, which is the reason it is so much more efficient.
Here is some code that shows the difference between these three containers for constructing a dictionary and looking up a value:
% Create names
names = cell(1,100);
for ii=1:numel(names)
names{ii} = char(randi(+'az',1,20));
end
name = names{1};
% Create data
values = rand(1,numel(names));
% Construct
M = containers.Map(names,values);
T = table(values.','VariableNames',{'value'},'RowNames',names);
S = num2cell(values);
S = [names;S];
S = struct(S{:});
% Lookup
M(name)
T.value(name)
S.(name)
% Timing lookup
timeit(#()M(name))
timeit(#()T.value(name))
timeit(#()S.(name))
Timing results (microseconds):
M: 16.672
T: 23.393
S: 2.609
You can go simpler, you can access structs using string field:
clear
% define
mydata.('vec')=[2 4 1];
mydata.num=12.58;
% get
select1='num';
value1=mydata.(select1); %method 1
select2='vec';
value2=getfield(mydata,select2) %method 2

Recurrence relation, how to handle fractional terms?

So I need to find a_30 for a recurrence relation defined by:
a_n=2*a_n/2 + 1
a_1=1
Underscores dictate subscripts.
The dilemma I run into: in order to find a_30, I must find a_15, but to find that I need a_7.5, which simply doesn't exist. How do I handle this? I also tried running it in Matlab, but it predictably terminated on a_3, citing the same type of nonexistent index.
If you assume that your domain is all natural numbers I think that your recurrence relation only exists on n values that are powers of 2, otherwise you will get an index that is fractional and cannot be defined. If you define your relation to be a_n//2, which is the floor(a_n/2) your relation would exist for all n that are natural numbers: (0 or 1, 2, 3, 4, ...)

SQLAlchemy postgresql.ARRAY size

How to add size of the postgres array in SQLAlchemy?
Like SQL type Integer[2]:
column = Column(postgresql.ARRAY(Integer), size=2)
In PostgreSQL you just define the column as ARRAY of some base type, like integer[]. You can include dimensions in the type declaration, like integer[3][3], but they are without effect, as they are not enforced. I quote the manual here:
However, the current implementation ignores any supplied array size
limits, i.e., the behavior is the same as for arrays of unspecified
length.