I have an array that looks like this [[(Double,Double)]]. It's a multi-dimensional array of tuples.
This is data that I will never query on, as it doesn't need to be queried. It only makes sense if it's like that on the client side. I'm thinking of storing the entire thing as string and then parsing it back to multi array.
Would that be a good approach and would the parsing be very expensive considering I can have a max of 20 arrays with 4 max inner array each with a tuple of 2 Double?
How would I check to see which is a better approach and if storing it as multi-dimensional array in PostgreSQL is the better approach?
How would I store it?
To store an array of composite type (with any nesting level), you need a registered base type to work with. You could have a table defining the row type, or just create the type explicitly:
CREATE TYPE dd AS (a float8, b float8);
Here are some ways to construct that 2-dimensional array of yours:
SELECT ARRAY [['(1.23,23.4)'::dd]]
, (ARRAY [['(1.23,23.4)']])::dd[]
, '{{"(1.23,23.4)"}}'::dd[]
, ARRAY[ARRAY[dd '(1.23,23.4)']]
, ARRAY(SELECT ARRAY (SELECT dd '(1.23,23.4)'));
Related:
How to pass custom type array to Postgres function
Pass array from node-postgres to plpgsql function
Note that the Postgres array type dd[] can store values with any level of nesting. See:
Mapping PostgreSQL text[][] type and Java type
Whether that's more efficient than just storing the string literal as text very much depends on details of your use case.
Arrays types occupy an overhead of 24 bytes plus the usual storage size of element values.
float8 (= double precision) occupies 8 bytes. The text string '1' occupies 2 bytes on disk and 4 bytes in RAM. text '123.45678' occupies 10 bytes on disk and 12 bytes in RAM.
Simple text will be read and written a bit faster than an array type of equal size.
Large text values are compressed (automatically), which can benefit storage size (especially with repetitive patterns) - but adds compression / decompression cost.
An actual Postgres array is cleaner in any case, as Postgres does not allow illegal strings to be stored.
Related
I have a Table with about 200Mio Rows and multiple Columns of Datatype DECIMAL(p,s) with varying precision/scales.
Now, as far as i understand, DECIMAL(p,s) is a fixed size column, with a size depending on the precision, see:
https://learn.microsoft.com/en-us/sql/t-sql/data-types/decimal-and-numeric-transact-sql?view=sql-server-ver16
Now, when altering the table and changing a column from DECIMAL(15,2) to DECIMAL(19,6), for example, i would have expected there to be almost no work to be done on the side of the SQL-Sever as the required bytes to store the value are the same, yet the altering itself does take a long time - so what exactly does the server do when i execute the alter statement?
Also, is there any benefit (other than having constraints on a column) to storing a DECIMAL(15,2) instead of, for example, a DECIMAL(19,2)? It seems to me the storage requirements would be the same, but i could store larger values in the latter.
Thanks in advance!
The precision and scale of a decimal / numeric type matters considerably.
As far as SQL Server is concerned, decimal(15,2) is a different data type to decimal(19,6), and is stored differently. You therefore cannot make the assumption that just because the overall storage requirements do not change, nothing else does.
SQL Server stores decimal data types in byte-reversed (little endian) format with the scale being the first incrementing value therefore changing the definition can require the data to be re-written, SQL Server will use an internal worktable to safely convert the data and update the values on every page.
I have the following table:
create type size_type as enum('tiny', 'small', 'tall');
CREATE TABLE IF NOT EXISTS people (
size size_type NOT NULL
);
Imagine it has tons of data. If I index the size field, the length of the strings on the enum will affect the performance of the database when executing queries? For example, ('ti','s',ta') will be more performatic than ('tiny', 'small', 'tall') or it doesn't matter?
It does not matter at all. Under the hood, enum values are stored as real (4-byte floating point numbers), regardless of the label text.
The documentation offers the following explanation:
An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.
The translations from internal enum values to textual labels are kept in the system catalog pg_enum. Querying this catalog directly can be useful.
I'm creating a custom data type in Postgresql. The column of this type is going to hold enormous amount of data in each row and the data will increase over time. If you know about the custom types in Postgresql, then you must also know that we have to specify internal length as variable in the create type statement if the type's size can't be determined beforehand. I have done the same for my type. So, if I create a column of this data type, how much data can be possibly stored in the column of a single row ?
Field Size: 1GB
TOASTed fields can be at most 1 GB.
This comes straight from the PostgreSQL documentation.
I want to store a 15 digit number in a table.
In terms of lookup speed should I use bigint or varchar?
Additionally, if it's populated with millions of records will the different data types have any impact on storage?
In terms of space, a bigint takes 8 bytes while a 15-character string will take up to 19 bytes (a 4 bytes header and up to another 15 bytes for the data), depending on its value. Also, generally speaking, equality checks should be slightly faster for numerical operations. Other than that, it widely depends on what you're intending to do with this data. If you intend to use numerical/mathematical operations or query according to ranges, you should use a bigint.
You can store them as BIGINT as the comparison using the INT are faster when compared with varchar. I would also advise to create an index on the column if you are expecting millions of records to make the data retrieval faster.
You can also check this forum:
Indexing is fastest on integer types. Char types are probably stored
underneath as integers (guessing.) VARCHARs are variable allocation
and when you change the string length it may have to reallocate.
Generally integers are fastest, then single characters then CHAR[x]
types and then VARCHARS.
I am creating a new table on a KDB database as a parted splay (parted by date), the new table schema has a column called CCYY, which has a lot of repeating values. I am unsure if I should save it as char or symbols. My main goal is to use least amount of memory.
As a result which one should I use? What is the benefit/disadvantage of saving repeating values as either a char array or a symbol in a parted splayed setup?
It sounds like you should use symbol.
There's a guide to symbols/enumerations here:http://www.timestored.com/kdb-guides/strings-symbols-enumeration#when-to-use quote:
Typically you should follow the guidelines:
If the column is used in where clause equality comparisons e.g.
select from t where sym in AB -> Symbol
Short, often repeated strings -> Symbol
Else Long, Non-repeated strings -> String
When evaluating whether or not to use symbol for a column, cardinality of that column is key. Length of individual values matters less and, if anything, longer values might be better off as symbol, as they will be stored only once in the sym file, but repeated in the char vector. That consideration is pretty much moot if you compress you data on disk though.
If your values are short enough, don't forget about the possibility of using .Q.j10, .Q.x10, .Q.j12 and .Q.x12. This will use less space than a char vector. And it doesn't rely on a sym file, which in complex environments can save you from having to re-enumerate if you are, say, copying tables between hdbs who's sym files are not in sync.
If space is a concern, always compress the data on disk.