My DB table has one primary key and a number of integer columns and one boolean column called paused which is not in the primary key. This table will only ever hold a few hundred rows but I need to query the boolean column very regularly. I need to know if any row in the boolean paused column is true, if one row is true I will return true if all are false I will return false.
Should I create an index on the boolean column and what would that syntax look like or is there any other way to optimize that query?
CREATE TABLE IF NOT EXISTS pause_metrics (
consumer TEXT NOT NULL,
timstamp TIMESTAMP NOT NULL,
idle_counter INTEGER NOT NULL,
paused BOOLEAN DEFAULT FALSE NOT NULL,
PRIMARY KEY(consumer)
);
To support the following query:
SELECT paused
from pause_metrics
where paused
limit 1;
A filtered index would be the most efficient thing:
create index idx_paused on pause_metrics(paused)
where paused;
The actual column in the index doesn't really matter, the important part is the where paused which only indexes the rows that have paused = true.
To find out if all rows have paused = false, you can use an exists query:
select not exists (SELECT 1 from pause_metrics where paused limit 1) as all_active
This will make use of the filtered query and should be quite quick.
Related
I'm populating a database from a .csv file and somewhere in my code I'm referring to a foreign key that might be invalid ( so when I'm going to select that id it returns null). At this point I want to log the row that returns null value and ignore insertion of that row but transaction don't stop and continue. How do I achieve this?
for index, row in df.iterrows():
insert_query = f"""insert into {table_name}
(travel_date, train_id, delay)
values ({row[0]},
(select id from {train_table} where {train_table}.train_no={row[1]} limit 1),
{row[2]});"""
cursor.execute(insert_query)
I have a composite type:
CREATE TYPE mydata_t AS
(
user_id integer,
value character(4)
);
Also, I have a table, uses this composite type as an array of mydata_t.
CREATE TABLE tbl
(
id serial NOT NULL,
data_list mydata_t[],
PRIMARY KEY (id)
);
Here I want to update the mydata_t in data_list, where mydata_t.user_id is 100000
But I don't know which array element's user_id is equal to 100000
So I have to make a search first to find the element where its user_id is equal to 100000 ... that's my problem ... I don't know how to make the query .... in fact, I want to update the value of the array element, where it's user_id is equal to 100000 (Also where the id of tbl is for example 1) ... What will be my query?
Something like this (I know it's wrong !!!)
UPDATE "tbl" SET "data_list"[i]."value"='YYYY'
WHERE "id"=1 AND EXISTS (SELECT ROW_NUMBER() OVER() AS i
FROM unnest("data_list") "d" WHERE "d"."user_id"=10000 LIMIT 1)
For example, this is my tbl data:
Row1 => id = 1, data = ARRAY[ROW(5,'YYYY'),ROW(6,'YYYY')]
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'YYYY')]
Now i want to update tbl where id is 2 and set the value of one of the tbl.data elements to 'XXXX' where the user_id of element is equal to 11
In fact, the final result of Row2 will be this:
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'XXXX')]
If you know the value value, you can use the array_replace() function to make the change:
UPDATE tbl
SET data_list = array_replace(data_list, (11, 'YYYY')::mydata_t, (11, 'XXXX')::mydata_t)
WHERE id = 2
If you do not know the value value then the situation becomes more complex:
UPDATE tbl SET data_list = data_arr
FROM (
-- UPDATE doesn't allow aggregate functions so aggregate here
SELECT array_agg(new_data) AS data_arr
FROM (
-- For the id value, get the data_list values that are NOT modified
SELECT (user_id, value)::mydata_t AS new_data
FROM tbl, unnest(data_list)
WHERE id = 2 AND user_id != 11
UNION
-- Add the values to update
VALUES ((11, 'XXXX')::mydata_t)
) x
) y
WHERE id = 2
You should keep in mind, though, that there is an awful lot of work going on in the background that cannot be optimised. The array of mydata_t values has to be examined from start to finish and you cannot use an index on this. Furthermore, updates actually insert a new row in the underlying file on disk and if your array has more than a few entries this will involve substantial work. This gets even more problematic when your arrays are larger than the pagesize of your PostgreSQL server, typically 8kB. All behind the scene so it will work, but at a performance penalty. Even though array_replace sounds like changes are made in-place (and they indeed are in memory), the UPDATE command will write a completely new tuple to disk. So if you have 4,000 array elements that means that at least 40kB of data will have to be read (8 bytes for the mydata_t type on a typical system x 4,000 = 32kB in a TOAST file, plus the main page of the table, 8kB) and then written to disk after the update. A real performance killer.
As #klin pointed out, this design may be more trouble than it is worth. Should you make data_list as table (as I would do), the update query becomes:
UPDATE data_list SET value = 'XXXX'
WHERE id = 2 AND user_id = 11
This will have MUCH better performance, especially if you add the appropriate indexes. You could then still create a view to publish the data in an aggregated form with a custom type if your business logic so requires.
CREATE TABLE orders
(
id bigint NOT NULL,
...
created_on date NOT NULL,
quantity int NOT NULL,
...
CONSTRAINT orders_pkey PRIMARY KEY (id)
)
SELECT DATE(o.created_on) AS date, sum(quantity)
FROM orders o
GROUP BY date
ordersItemsQuery.groupBy(_.createdOn).map{
case (created, group) => (created, group.map(_.quantity).sum)
}
notice quantity is not null column, group.map(_.quantity).sum returns Rep[Option[Int]] but not Rep[Int] why?
The Slick method sum evaluates Option[T], and shouldn't be confused with the standard Scala collections method sum that returns a non-optional value.
Slick's sum is optional because a query may produce no results. That is, if you run SELECT SUM(column) FROM table and there are no rows, you do not get back zero from the database. Instead, you get back no rows. Slick is being consistent with this behaviour. Or rather: the sum is happening in SQL, on the database server, and doesn't produce a result when there are no rows.
In contrast to the way a database works, Scala's sum does allow you to sum an empty list (List[Int]().sum) and get back zero.
No version of a character varying field will return anything but NULL if there isn't a value... it's really frustrating
CASE WHEN COALESCE(NULLIF(e.name,''),'unassigned') IS NULL THEN 'unassigned' ELSE a.name END
was my final test and it still simply returns NULL unless the field has a value
it's character varying(255)
COALESCE(a.name,'unassigned') // won't work
NULLIF(a.name,'') // won't work
NULLIF(a.name,NULL) // won't work
COALESCE(NULLIF(a.name,''),'unassigned') // won't work
however the instant i use 0 it works..
what's up with that?
it's a character varying(255) field and it is set to default to null
as a matter of point the build of the table column is
name varying character(255) DEFAULT(NULL)
so I know it's entering NULL
and I've already done a
SELECT * FROM <tbl> WHERE name IS NULL; and of course, I return all the NULL rows with a.name... so what's the deal with this?
ok... to everyone deciding to answer me with:
COALESCE(NULLIF(e.name,''),'unassigned') IS NULL...
This method will never work on a return of "no records" and as this is a stored procedure which creates a materialized view, where I'm polling via nested queries - where it is possible for a column to have 0 (as the default id = other_id) the nested query would simply return no rows. When no row is returned, the functions of COALESCE or NULLIF would never execute. A row would have to be returned in order for those functions to act upon the row values... As I've never heard of a table with a PK auto-incremented field starting at 0 and generally starting at 1 the resultset of "no records returned" will always return a NULL value into the materialized view column.
The query I run afterward to poll rows from that materialized view will however function properly as COALESCE(etc etc) because now there is an actual NULL value in that column.
I have a table with goods:
CREATE TABLE public.goods (
"id" bigserial NOT NULL,
title varchar(250) NOT NULL,
cost numeric(10,2),
PRIMARY KEY ("id")
);
Now I want to sort this table by title but put all goods with cost 0 at the end of the list. Is this possible?
If I try to use:
ORDER BY
cost DESC,
title ASC
I get incorrect order by title
One way to do this is to use a CASE expression when ordering which places the block of records having a zero cost at the bottom. Then, within each block (either zero cost or non-zero cost), the records can be sorted alphabetically by the title.
SELECT cost, title
FROM public.goods
ORDER BY CASE WHEN cost = 0 THEN 1 ELSE 0 END,
title