Reasons to use COUNT instead of COUNT_BIG

Reasons to use COUNT instead of COUNT_BIG - tsql

According to the documentation, COUNT_BIG behaves exactly the same as COUNT, but has 2 advantages:
It returns a bigger data type; meaning COUNT_BIG won't fail when COUNT fails.
It allows creation of clustered indices on views.
Why ever use COUNT? Why not just always use COUNT_BIG?

Probably for the same reason as why you would declare a variable as tiny-, small- or int instead of bigint?

Related

Use Spark when function without otherwise but keep column values

I frequently find myself replacing values in columns using
when($"myCol".isNull,myCrazyFunction).otherwise($"myCol")
To me the .otherwise($"myCol") feels kind of redundant.
Is there a better way to replace some values under some condition and otherwise just leave everything as it is without using the otherwise?

I think you can use coalesce() for that.
select(coalesce($"myCol", myCrazyFunction))
Just remember that myCrazyFunction should return a Column type.

PostgreSql Queries treats Int as string datatypes

I store the following rows in my table ('DataScreen') under a JSONB column ('Results')
{"Id":11,"Product":"Google Chrome","Handle":3091,"Description":"Google Chrome"}
{"Id":111,"Product":"Microsoft Sql","Handle":3092,"Description":"Microsoft Sql"}
{"Id":22,"Product":"Microsoft OneNote","Handle":3093,"Description":"Microsoft OneNote"}
{"Id":222,"Product":"Microsoft OneDrive","Handle":3094,"Description":"Microsoft OneDrive"}
Here, In this JSON objects "Id" amd "Handle" are integer properties and other being string properties.
When I query my table like below
Select Results->>'Id' From DataScreen
order by Results->>'Id' ASC
I get the improper results because PostgreSql treats everything as a text column and hence does the ordering according to the text, and not as integer.
Hence it gives the result as
11,111,22,222
instead of
11,22,111,222.
I don't want to use explicit casting to retrieve like below
Select Results->>'Id' From DataScreen order by CAST(Results->>'Id' AS INT) ASC
because I will not be sure of the datatype of the column due to the fact that JSON structure will be dynamic and the keys and values may change next time. and Hence could happen the same with another JSON that has Integer and string keys.
I want something so that Integers in Json structure of JSONB column are treated as integers only and not as texts (string).
How do I write my query so that Id And Handle are retrieved as Integer Values and not as strings , without explicit casting?

I think your assumtions about the id field don't make sense. You said,
(a) Either id contains integers only or
(b) it contains strings and integers.
I'd say,
If (a) then numerical ordering is correct.
If (b) then lexical ordering is correct.
But if (a) for some time and then (b) then the correct order changes, too. And that doesn't make sense. Imagine:
For the current database you expect the order 11,22,111,222. Then you add a row
{"Id":"aa","Product":"Microsoft OneDrive","Handle":3095,"Description":"Microsoft OneDrive"}
and suddenly the correct order of the other rows changes to 11,111,22,222,aa. That sudden change is what bothers me.
So I would either expect a lexical ordering ab intio, or restrict my id field to integers and use explicit casting.
Every other option I can think of is just not practical. You could, for example, create a custom < and > implementation for your id field which results in 11,111,22,222,aa. ("Order all integers by numerical value and all strings by lexical order and put all integers before the strings").
But that is a lot of work (it involves a custom data type, a custom cast function and a custom operator function) and yields some counterintuitive results, e.g. 11,111,22,222,0a,1a,2a,aa (note the position of 0a and so on. They come after 222).
Hope, that helps ;)

If Id always integer you can cast it in select part and just use ORDER BY 1:
select (Results->>'Id')::int From DataScreen order by 1 ASC

Is there a MAX_INT constant in Postgres?

In Java I can say Integer.MAX_VALUE to get the largest number that the int type can hold.
Is there a similar constant/function in Postgres? I'd like to avoid hard-coding the number.
Edit: the reason I am asking is this. There is a legacy table with an ID of type integer, backed by a sequence. There is a lot of incoming rows into this table. I want to calculate how much time before the integer runs out, so I need to know "how many IDs are left" divided by "how fast we are spending them".

There's no constant for this, but I think it's more reasonable to hard-code the number in Postgres than it is in Java.
In Java, the philosophical goal is for Integer to be an abstract value, so it makes sense that you'd want to behave as if you don't know what the max value is.
In Postgres, you're much closer to the bare metal and the definition of the integer type is that it is a 4-byte signed integer.

There is a legacy table with an ID of type integer, backed by a sequence.
In that case, you can get the max value of the sequence by:
select seqmax from pg_sequence where seqrelid = 'your_sequence_name'::regclass.
This might be better than getting the MAX_INT, because sequence may have been created/altered with a specific max value that is different from MAX_INT.

Calculate hash for java.sql.ResultSet

I need to know if the results of SQL query has been changed between two queries.
The solution a came up with is to calculate and compare some hash value based on ResultSet content.
What is the preferred way?

There are no such special hashCode method, for ResultSet that is calculated based on all retrieved data. Definetly you can not use default hashCode method.
To be 100% sure that you will take into account all the changes in the data,
you have to retrieve all columns from all the rows from ResultSet one by one and calculate hash code for them with any possible way. (Put everything into single String and get it's hashCode).
But it's very time consumption operation. I would propose you to execute extra query that calculate hash sum by itself. For example it can return count of rows and sum of all columns/rows... or smth like that..

Is it possible to use a stable function in an index in Postgres?

I've been working on a project at work and have come to the realization that I must invoke a function in several of the queries' WHERE clauses. The performance isn't terrible exactly, but I would love to improve it. So I looked at the docs for indexes which mentioned that:
An index field can be an expression computed from the values of one or more columns of the table row.
Awesome. So I tried creating an index:
CREATE INDEX idx_foo ON foo_table (stable_function(foo_column));
And received an error:
ERROR: functions in index expression must be marked IMMUTABLE
So then I read about Function Volatility Categories which had this to say about stable volatility:
In particular, it is safe to use an expression containing such a function in an index scan condition.
Based on the phrasing "index scan condition" I'm guessing it doesn't mean an actual index. So what does it mean? Is it possible to utilize a stable function in an index? Or do we have to go all the way and ensure this would work as an immutable function?
We're using Postgres v9.0.1.

An "index scan condition" is a search condition, and can use a volatile function, which will be called for each row processed. An index definition can only use a function if it is immutable -- that is, that function will always return the same value when called with any given set of arguments, and has no user-visible side effects. If you think about it a little, you should be able to see what kind of trouble you could get into if the function might return a different value than what it did when the index entry was created.
You might be tempted to lie to the database and declare a function as immutable which isn't really; but if you do, the database will probably do surprising things that you would rather it didn't.
9.0.1 has bugs for which fixes are available. Please upgrade to 9.0.somethingrecent.
http://www.postgresql.org/support/versioning/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Reasons to use COUNT instead of COUNT_BIG - tsql

According to the documentation, COUNT_BIG behaves exactly the same as COUNT, but has 2 advantages: It returns a bigger data type; meaning COUNT_BIG won't fail when COUNT fails. It allows creation of clustered indices on views. Why ever use COUNT? Why not just always use COUNT_BIG?

Probably for the same reason as why you would declare a variable as tiny-, small- or int instead of bigint?

Related

Use Spark when function without otherwise but keep column values

PostgreSql Queries treats Int as string datatypes

Is there a MAX_INT constant in Postgres?

Calculate hash for java.sql.ResultSet

Is it possible to use a stable function in an index in Postgres?

Categories

Resources