Postgresql (Aurora) Sum from jsonb gives odd error - postgresql

I'm trying to sum values from a jsonb type column in a table in an Aurora/Postgres database but it doesn't seem to work.
select (payload->>'loanAmount')::int from rfqs limit 1;
Gives a results of 10000 (int4).
select sum((payload->>'loanAmount')::int) from rfqs limit 1;
Gives a result of: ERROR: invalid input syntax for integer: "2000.5"
It seems like this is something to do with the way the ->> operator converts the json to a string, but it's like something is wrong with that string which prevents it from being correctly typecast to an int.
As a test I did select SUM(('10000'::int)); which worked fine and returned 10000 as expected.
Any ideas?

This will allow you to understand (you will see what the problem is with "::int")
select sum(payload->>'loanAmount') from rfqs
which is the same as:
select sum(payload->>'loanAmount') from rfqs limit 1
(An aggregate without group by returns only on row, so "limit 1" is a bit superfluous)
Try
SELECT sum(to_number((payload->>'loanAmount'),'999999999D9999')) from rfqs
see http://www.sqlfiddle.com/#!17/9c30a/8

Some of your "loanAmount" properties does not have integer value. First record does though.
To find offenting records:
SELECT payload FROM rfqs WHERE (payload->>'loanAmount') <> trunc(payload->>'loanAmount')

Related

Why does this SQL unnest query result in 2 rows rather than 4?

Relatively new SQL user question....
If my postgresql query looks like this:
select
to_timestamp((unnest(enrolled_ranges) ->> 'start_time')::float) as start_time
, to_timestamp((unnest(enrolled_ranges) ->> 'end_time')::float) as end_time
from student_inclusions
where student_id = '123456'
And the initial enrolled_ranges json data is this:
{"{\"start_time\":1536652800.00007,\"end_time\":1563981839.966626}","{\"start_time\":1563982078.624668,\"end_time\":1563989693.830777}"}
Why does sql do this
instead of this
The first answer is what I want, I just don't understand how sql knows from the query to associate the matching start and end times. Do you have any insight?
The documentation on set-returning functions describes the behavior you observed:
For each row from the underlying query, there is an output row using the first result from each set-returning function, then an output row using the second result, and so on.
See also What is the expected behaviour for multiple set-returning functions in SELECT clause?

Why array_agg() is returning empty array in postgresql?

I have an integer type column named as start. I want to make an array by the values of this column. It seemed to be very easy and I used array_agg(), but it is giving empty array as output. Following is my column data
start
1
2
11
5
.
.
. (and so on)
And following is my query used to make the array:
select array_agg(start) as start_array from table1;
Why is it giving empty array?
It's not
There is no way that this can return empty unless there are no rows. Perhaps a JOIN or a WHERE clause is wrong and you have 0-rows?
Also as a micro-optimization if your query is this simple,
select array_agg(start) as start_array from table1;
Then it's probably better written with the ARRAY() constructor...
SELECT ARRAY(SELECT start FROM table1) AS start_array;

PostgreSQL use function result in ORDER BY

Is there a way to use the results of a function call in the order by clause?
My current attempt (I've also tried some slight variations).
SELECT it.item_type_id, it.asset_tag, split_part(it.asset_tag, 'ASSET', 2)::INT as tag_num
FROM serials.item_types it
WHERE it.asset_tag LIKE 'ASSET%'
ORDER BY split_part(it.asset_tag, 'ASSET', 2)::INT;
While my general assumption is that this can't be done, I wanted to know if there was a way to accomplish this that I wasn't thinking of.
EDIT: The query above gives the following error [22P02] ERROR: invalid input syntax for integer: "******"
Your query is generally OK, the problem is that for some row the result of split_part(it.asset_tag, 'ASSET', 2) is the string ******. And that string cannot be cast to an integer.
You may want to remove the order by and the cast in the select list and add a where split_part(it.asset_tag, 'ASSET', 2) = '******', for instance, to narrow down that data issue.
Once that is resolved, having such a function in the order by list is perfectly fine. The quoted section of the documentation in the comments on the question is referring to applying an order by clause to the results of UNION, INTERSECTION, etc. queries. In other words, the order by found in this query:
(select column1 as result_column1 from table1
union
select column2 from table 2)
order by result_column1
can only refer to the accumulated result columns, not to expressions on individual rows.

SqlAlchemy: count of distinct over multiple columns

I can't do:
>>> session.query(
func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)
I can do:
session.query(
func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()
Which is fine (count of unique users in a 'pageload' db table).
This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:
col_a | col_b
----------------
xx | yy
xxy | y
Is there any way to generate the following SQL (which is valid in postgresql at least)?
SELECT count(distinct (col_a, col_b)) FROM my_table;
distinct() accepts more than one argument when appended to the query object:
session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()
It should generate something like:
SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1
which is even a bit closer to what you wanted.
The exact query can be produced using the tuple_() construct:
session.query(
func.count(distinct(tuple_(Hit.ip_address, Hit.user_agent)))).scalar()
Looks like sqlalchemy distinct() accepts only one column or expression.
Another way around is to use group_by and count. This should be more efficient than using concat of two columns - with group by database would be able to use indexes if they do exist:
session.query(Hit.ip_address, Hit.user_agent).\
group_by(Hit.ip_address, Hit.user_agent).count()
Generated query would still look different from what you asked about:
SELECT count(*) AS count_1
FROM (SELECT hittable.user_agent AS hittableuser_agent, hittable.ip_address AS sometable_column2
FROM hittable GROUP BY hittable.user_agent, hittable.ip_address) AS anon_1
You can add some variables or characters in concat function in order to make it distinct. Taking your example as reference it should be:
session.query(
func.count(distinct(func.concat(Hit.ip_address, "-", Hit.user_agent))).first()

T-SQL conversion failed error message

I am getting the following error:
Conversion failed when converting the varchar value '2010-01-10' to data type int.
While running the following command in a query window in Management Studio:
SELECT * FROM (
SELECT CAST(group_id AS INT) AS qqq from view_jct_snapshot_group_items
) AS X
WHERE qqq = CAST(14290 AS INT)
However, when I run just this portion of the query, I get results and no errors:
SELECT CAST(group_id AS INT) AS qqq from view_jct_snapshot_group_items
Similarly with the following:
SELECT * FROM (
SELECT CAST(group_id AS INT) AS qqq from view_jct_snapshot_group_items
) AS X
What is going on, and how can I use my where clause without getting an error?
The table that is the source data for the view most likely has the value '2010-01-10' in the group_id column and this is excluded by a filter in the View itself.
Views are inline constructs expanded out by the query optimiser and it looks like your query is pushing the cast down to before the filter occurs.
Try
SELECT *
FROM (SELECT CAST(CASE
WHEN group_id NOT LIKE '%[^0-9]%' THEN group_id
END AS INT) AS qqq
FROM view_jct_snapshot_group_items) AS X
WHERE qqq = 14290
SELECT CAST(group_id AS INT) AS qqq from view_jct_snapshot_group_items
This will need to be able to cast all values in the view to int. Clearly at least one of the values is not an int. Try putting your WHERE clause in the derived table not outside of it.
In general it is bad idea to store data that is not the same datatype in one field. This cause many problems in writing queries and requires you to do a lot of what should be unnecessary work (at a performance cost system wide) to cast the data to the correct type. Personally I would revisit your desing to see why integers and dates are stored in the same field.