What are Pros and Cons in using prefixes and suffixes in PostgreSQL dialect for timestamp columns - postgresql

I have analysed several articles about naming conventions for Date/Time types in SQL data models.
Most of them suggest implementing a database design where a timestamp type is used for some registered even values only, literally timestamping the event case just when it happens. And naturally they suggest datetime type for any other time instanting needs. And they suggest to avoid using suffixes and prefixes which match known data types, like date and time, at all costs, to avoid confusion with data types where only the purpose of the column name is expected.
But PostgreSQL dialect does not have that datetime type at all, so there is only the timestamp type for all cases when just date and time are not enough for the column which is expected to store a value of past or future instant of time.
So, basically, what prefixes or suffixes if any would you suggest for PostgreSQL dialect columns, known that some of them would store past and present and future time instants? And why, for what benefits or because of what limitations?
Should we use timestamp and datetime as prefixes or suffixes to distinguish the purpose of different timestamp type columns by their names? Or would that be a bad practice since there is actually a data type named timestamp and no data type named datetime in PostgreSQL dialect?
Or should we maybe use something very neurtal like an instant noun as a prefix or suffix to denote the purpose of the column?

Related

Binary to binary cast with JSONb

How to avoid the unnecessary CPU cost?
See this historic question with failure tests. Example: j->'x' is a JSONb representing a number and j->'y' a boolean. Since the first versions of JSONb (issued in 2014 with 9.4) until today (6 years!), with PostgreSQL v12... Seems that we need to enforce double conversion:
Discard j->'x' "binary JSONb number" information and transforms it into printable string j->>'x';discard j->'y' "binary JSONb boolean" information and transforms it into printable string j->>'y'.
Parse string to obtain "binary SQL float" by casting string (j->>'x')::float AS x; parse string to obtain "binary SQL boolean" by casting string (j->>'y')::boolean AS y.
Is there no syntax or optimized function to a programmer enforce the direct conversion?
I don't see in the guide... Or it was never implemented: is there a technical barrier to it?
NOTES about typical scenario where we need it
(responding to comments)
Imagine a scenario where your system need to store many many small datasets (real example!) with minimal disk usage, and managing all with a centralized control/metadata/etc. JSONb is a good solution, and offer at least 2 good alternatives to store in the database:
Metadata (with schema descriptor) and all dataset in an array of arrays;
Separating Metadata and table rows in two tables.
(and variations where metadata is translated to a cache of text[], etc.) Alternative-1, monolitic, is the best for the "minimal disk usage" requirement, and faster for full information retrieval. Alternative-2 can be the choice for random access or partial retrieval, when the table Alt2_DatasetLine have also more one column, like time, for time series.
You can create all SQL VIEWS in a separated schema, for example
CREATE mydatasets.t1234 AS
SELECT (j->>'d')::date AS d, j->>'t' AS t, (j->>'b')::boolean AS b,
(j->>'i')::int AS i, (j->>'f')::float AS f
FROM (
select jsonb_array_elements(j_alldata) j FROM Alt1_AllDataset
where dataset_id=1234
) t
-- or FROM alt2...
;
And CREATE VIEW's can by all automatic, running the SQL string dynamically ... we can reproduce the above "stable schema casting" by simple formating rules, extracted from metadata:
SELECT string_agg( CASE
WHEN x[2]!='text' THEN format(E'(j->>\'%s\')::%s AS %s',x[1],x[2],x[1])
ELSE format(E'j->>\'%s\' AS %s',x[1],x[1])
END, ',' ) as x2
FROM (
SELECT regexp_split_to_array(trim(x),'\s+') x
FROM regexp_split_to_table('d date, t text, b boolean, i int, f float', ',') t1(x)
) t2;
... It's a "real life scenario", this (apparently ugly) model is surprisingly fast for small traffic applications. And other advantages, besides disk usage reduction: flexibility (you can change datataset schema without need of change in the SQL schema) and scalability (2, 3, ... 1 billion of different datasets on the same table).
Returning to the question: imagine a dataset with ~50 or more columns, the SQL VIEW will be faster if PostgreSQL offers a "bynary to bynary casting".
Short answer: No, there is no better way to extract a jsonb number as PostgreSQL than (for example)
CAST(j ->> 'attr' AS double precision)
A JSON number happens to be stored as PostgreSQL numeric internally, so that wouldn't work “directly” anyway. But there is no principal reason why there could not be a more efficient way to extract such a value as numeric.
So, why don't we have that?
Nobody has implemented it. That is often an indication that nobody thought it worth the effort. I personally think that this would be a micro-optimization – if you want to go for maximum efficiency, you extract that column from the JSON and store it directly as column in the table.
It is not necessary to modify the PostgreSQL source to do this. It is possible to write your own C function that does exactly what you envision. If many people thought this was beneficial, I'd expect that somebody would already have written such a function.
PostgreSQL has just-in-time compilation (JIT). So if an expression like this is evaluated for a lot of rows, PostgreSQL will build executable code for that on the fly. That mitigates the inefficiency and makes it less necessary to have a special case for efficiency reasons.
It might not be quite as easy as it seems for many data types. JSON standard types don't necessarily correspond to PostgreSQL types in all cases. That may seem contrived, but look at this recent thread in the Hackers mailing list that deals with the differences between the numeric types between JSON and PostgreSQL.
All of the above are not reasons that such a feature could never exist, I just wanted to give reasons why we don't have it.

Postgresql jsonb vs datetime

I need to store two dates valid_from, and valid_to.
Is it better to use two datetime fields like valid_from:datetime and valid_to:datatime.
Would be it better to store data in jsonb field validity: {"from": "2001-01-01", "to": "2001-02-02"}
Much more reads than writes to database.
DB: PostgresSQL 9.4
You can use daterange type :
ie :
'[2001-01-01, 2001-02-02]'::daterange means from 2001-01-01 to 2001-02-02
bound inclusive
'(2001-01-01, 2001-02-05)'::daterange means from 2001-01-01
to 2001-02-05 bound exclusive
Also :
Special value like Infinite can be use
lower(anyrange) => lower bound of range
and many other things like overlap operator, see the docs ;-)
Range Type
Use two timestamp columns (there is no datetime type in Postgres).
They can efficiently be indexed and they protect you from invalid timestamp values - nothing prevents you from storing "2019-02-31 28:99:00" in a JSON value.
If you very often need to use those two values to check if another tiemstamp values lies in between, you could also consider a range type that stores both values in a single column.

Storing ZonedDateTime in Postgres database

I am using ZonedDateTime in my JPA entities. I am also using Postgres db. By default postgres creates table schema as bytea column type.
Two question I have
Is it OK to store this info as bytea format?
If I were to do SQL comparison, let’s say I want entries greater than given date, how do I write this comparison in SQL, meaning how do I convert bytea back to Date?
First of all JPA just map basic and most common types.
Bytea is the type that postgres end up using when you declare a field of a type not supported, and there isn't an annotation to say how to treat it.
See https://www.postgresql.org/docs/9.0/static/datatype-binary.html.
The reasoning for using bytea is that when the type can't be mapped it assumes you want the object to be stored serialized.
At this time JPA doesn't support Java 8 types, more info on JPA standard at:
https://vladmihalcea.com/whats-new-in-jpa-2-2-java-8-date-and-time-types/
You can create a custom datatype with hibernate, some time ago i posted on this in a play framework group:
https://groups.google.com/d/msg/play-framework/3AtNiMf_WBM/LBMeztlXBAAJ
Bye
Hans

Are there disadvantages on using as partition column a non-primitive column (date) in Hive?

Is there any reason why I shouldn't use a column formatted as date as the partitioning column in a table in Apache Hive?
The official documentation says:
Although currently there is not restriction on the data type of the partitioning column, allowing non-primitive columns to be partitioning column probably doesn't make sense. The dynamic partitioning column's type should be derived from the expression. The data type has to be able to be converted to a string in order to be saved as a directory name in HDFS.
https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions#DynamicPartitions-Designissues
I don't see why columns formatted as date would create any issue, since by design these could be converted into string.

Best type for JPA version field for Optimistic locking

I have doubts about which is the best type for a field annotated with #Version for optimistic locking in JPA.
The API javadoc (http://docs.oracle.com/javaee/7/api/javax/persistence/Version.html) says:
"The following types are supported for version properties: int, Integer, short, Short, long, Long, java.sql.Timestamp."
In other page (http://en.wikibooks.org/wiki/Java_Persistence/Locking#Optimistic_Locking) says:
"JPA supports using an optimistic locking version field that gets updated on each update. The field can either be numeric or a timestamp value. A numeric value is recommended as a numeric value is more precise, portable, performant and easier to deal with than a timestamp."
"Timestamp locking is frequently used if the table already has a last updated timestamp column, and is also a convenient way to auto update a last updated column. The timestamp version value can be more useful than a numeric version, as it includes the relevant information on when the object was last updated."
The questions I have are:
Is better a Timestamp type if you are going to have a lastUpdated field or is better to have a numeric version field and the timestamp in other field?
Between numeric types (int, Integer, short, Short, long, Long) which is the best to choose (considering the length of each type)? I mean, I think the best is Long but it requires a lot of space for each row.
What happens when the version field gets to the last number of a numeric type (for example 32,767 in a Short field)? Will it start from 1 again in the next increment?
Just go with Long or Integer.
BUT don't go with int or long.
In opposite to other comment here, null value is expected when entity was never persisted yet.
Having int or long might make Hibernate to think that entity is already persisted and in detached state as version value will be 0 when unset.
Just finished debugging a FK violation where "int" was the cause, so save your time and just go with Long or Integer.
First, know that locking is used to managed concurrent transactions.
1.Separate your concerns. If lastupdated field is business model specific, it should be separate from your versioning field - which is for - versioning.
2.Primitives and objects are usually mapped to your db as the same type. Except for the fact that Boolean by default will be nullable and boolean will be 'not nullable'. However, enforce nullability explicitly. In this case you want to use a primitive as the version field can't be nullable.
Integer or long are better than timestamp. Hibernate recommends numeric versionig and they don't take that much space.
If you use long, you might not live to find out.
Use this and you should be fine.
private long version;
#Version
public long getVersion() {
return version;
}
public void setVersion(long version) {
this.version = version;
}
Don't use a time value like Timestamp (or derivates like Instant or LocalDateTime etc...).
Especially if you have a Java < 15 application and hope to ever migrate to Java >= 15. They changed the precision of timestamps within Java to nano-seconds, but your database probably only stores up to microseconds, so it truncates the value, which will make you run into an OptimisticLockException all the time (1).
Neither use a primitive value, see the answer from #Piotr: The Version field must be null for new entities.
Just go with Long.