postgresql SELECT DISTINCT ON running very slow - postgresql

This query ran instantly:
mydb=# SELECT reports.* FROM reports WHERE reports.id = 9988 ORDER BY time DESC LIMIT 1;
This query took 33 seconds to run (and I only selected report with unit_id 9988 here. I will potentially have hundreds, if not thousands.):
(UPDATE: This is results of using EXPLAIN ANALYZIE):
mydb=# EXPLAIN ANALYZE SELECT DISTINCT ON (unit_id) r.* FROM reports r WHERE r.unit_id IN (3007, 3011, 6193) ORDER BY unit_id, time DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1377569.23..1381106.10 rows=11 width=155) (actual time=97175.381..97710.369 rows=3 loops=1)
-> Sort (cost=1377569.23..1379337.66 rows=707375 width=155) (actual time=97175.379..97616.039 rows=764509 loops=1)
Sort Key: unit_id, "time"
Sort Method: external merge Disk: 92336kB
-> Bitmap Heap Scan on reports r (cost=20224.85..1142005.76 rows=707375 width=155) (actual time=12396.930..94097.890 rows=764509 loops=1)
Recheck Cond: (unit_id = ANY ('{3007,3011,6193}'::integer[]))
-> Bitmap Index Scan on index_reports_on_unit_id (cost=0.00..20048.01 rows=707375 width=0) (actual time=12382.176..12382.176 rows=764700 loops=1)
Index Cond: (unit_id = ANY ('{3007,3011,6193}'::integer[]))
Total runtime: 97982.363 ms
(9 rows)
The schema of reports table is as follows:
mydb=# \d+ reports
Table "public.reports"
Column | Type | Modifiers | Storage | Description
----------------+-----------------------------+------------------------------------------------------+----------+-------------
id | integer | not null default nextval('reports_id_seq'::regclass) | plain |
unit_id | integer | not null | plain |
time_secs | integer | not null | plain |
time | timestamp without time zone | | plain |
latitude | numeric(15,10) | not null | main |
longitude | numeric(15,10) | not null | main |
speed | integer | | plain |
io | integer | | plain |
msg_type | integer | | plain |
msg_code | integer | | plain |
signal | integer | | plain |
cellid | integer | | plain |
lac | integer | | plain |
processed | boolean | default false | plain |
created_at | timestamp without time zone | | plain |
updated_at | timestamp without time zone | | plain |
street | character varying(255) | | extended |
county | character varying(255) | | extended |
state | character varying(255) | | extended |
postal_code | character varying(255) | | extended |
country | character varying(255) | | extended |
distance | numeric | | main |
gps_valid | boolean | default true | plain |
city | character varying(255) | | extended |
street_number | character varying(255) | | extended |
address_source | integer | | plain |
source | integer | default 0 | plain |
driver_id | integer | | plain |
Indexes:
"reports_pkey" PRIMARY KEY, btree (id)
"reports_uniqueness_index" UNIQUE, btree (unit_id, "time", latitude, longitude)
"index_reports_on_address_source" btree (address_source DESC)
"index_reports_on_driver_id" btree (driver_id)
"index_reports_on_time" btree ("time")
"index_reports_on_time_secs" btree (time_secs)
"index_reports_on_unit_id" btree (unit_id)
Foreign-key constraints:
"reports_driver_id_fkey" FOREIGN KEY (driver_id) REFERENCES drivers(id)
"reports_unit_id_fkey" FOREIGN KEY (unit_id) REFERENCES units(id)
Referenced by:
TABLE "alerts" CONSTRAINT "alerts_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports(id)
TABLE "pressure_transmitters" CONSTRAINT "pressure_transmitters_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports(id)
TABLE "thermoking" CONSTRAINT "thermoking_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports(id)
Has OIDs: no
Why is SELECT DISTINCT ON running so slow?

There is relative slow Bitmap Heap Scan with RECHECK - pls, try to increase a work_mem

Related

Any way to find and delete almost similar records with SQL?

I have a table in Postgres DB, that has a lot of almost identical rows. For example:
1. 00Zicky_-_San_Pedro_Danilo_Vigorito_Remix
2. 00Zicky_-_San_Pedro__Danilo_Vigorito_Remix__
3. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx__
4. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx_
5. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
6. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
I think about to writing a little golang script to remove duplicates, but maybe SQL can do it?
Table definition:
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
Tried several methods to find duplicates, but that methods search only for exact similarity.
There is no operator which is essentially A almost = B. (Well there is full text search, but that seems to be a little excessive here.) If the only difference is the number of - and _ then just get rid of them and compare the the resulting difference. If they are equal, then one is a duplicate. You can use the replace() function to remove them. So something like: (see demo)
delete
from songs s2
where exists ( select null
from songs s1
where s1.song_id < s2.song_id
and replace(replace(s1.name, '_',''),'-','') =
replace(replace(s2.name, '_',''),'-','')
);
If your table is large this will not be fast, but a functional index may help:
create index song_name_idx on songs
(replace(replace(name, '_',''),'-',''));

Postgresql index is not used for slow queries >30s

POSTGRESQL VERSION: 10
HARDWARE: 4 workers / 16GBRAM / 50% used
I'm not a Postgresql expert. I have just read a lot of documentation and did a lot of tests.
I have some postgresql queries whick take a lot of times > 30s because of 10 millions of rows on a table.
Column | Type | Collation | Nullable | Default
------------------------------+--------------------------+-----------+----------+----------------------------------------------------------
id | integer | | not null |
cveid | character varying(50) | | |
summary | text | | not null |
published | timestamp with time zone | | |
modified | timestamp with time zone | | |
assigner | character varying(128) | | |
vulnerable_products | character varying(250)[] | | |
cvss | double precision | | |
cvss_time | timestamp with time zone | | |
cvss_vector | character varying(250) | | |
access | jsonb | | not null |
impact | jsonb | | not null |
score | integer | | not null |
is_exploitable | boolean | | not null |
is_confirmed | boolean | | not null |
is_in_the_news | boolean | | not null |
is_in_the_wild | boolean | | not null |
reflinks | jsonb | | not null |
reflinkids | jsonb | | not null |
created_at | timestamp with time zone | | |
history_id | integer | | not null | nextval('vulns_historicalvuln_history_id_seq'::regclass)
history_date | timestamp with time zone | | not null |
history_change_reason | character varying(100) | | |
history_type | character varying(1) | | not null |
Indexes:
"vulns_historicalvuln_pkey" PRIMARY KEY, btree (history_id)
"btree_varchar" btree (history_type varchar_pattern_ops)
"vulns_historicalvuln_cve_id_850876bb" btree (cve_id)
"vulns_historicalvuln_cwe_id_2013d697" btree (cwe_id)
"vulns_historicalvuln_history_user_id_9e25ebf5" btree (history_user_id)
"vulns_historicalvuln_id_773f2af7" btree (id)
--- TRUNCATE
Foreign-key constraints:
"vulns_historicalvuln_history_user_id_9e25ebf5_fk_custusers" FOREIGN KEY (history_user_id) REFERENCES custusers_user(id) DEFERRABLE INITIALLY DEFERRED
Example of queries:
SELECT * FROM vulns_historicalvuln WHERE history_type <> '+' order by id desc fetch first 10000 rows only; -> 30s without cache
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..31878.33 rows=10000 width=1736) (actual time=0.173..32839.474 rows=10000 loops=1)
-> Index Scan Backward using vulns_historicalvuln_id_773f2af7 on vulns_historicalvuln (cost=0.43..26346955.92 rows=8264960 width=1736) (actual time=0.172..32830.958 rows=10000 loops=1)
Filter: ((history_type)::text <> '+'::text)
Rows Removed by Filter: 296
Planning time: 19.514 ms
Execution time: 32845.015 ms
SELECT DISTINCT "vulns"."id", "vulns"."uuid", "vulns"."feedid", "vulns"."cve_id", "vulns"."cveid", "vulns"."summary", "vulns"."published", "vulns"."modified", "vulns"."assigner", "vulns"."cwe_id", "vulns"."vulnerable_packages_versions", "vulns"."vulnerable_products", "vulns"."vulnerable_product_versions", "vulns"."cvss", "vulns"."cvss_time", "vulns"."cvss_version", "vulns"."cvss_vector", "vulns"."cvss_metrics", "vulns"."access", "vulns"."impact", "vulns"."cvss3", "vulns"."cvss3_vector", "vulns"."cvss3_version", "vulns"."cvss3_metrics", "vulns"."score", "vulns"."is_exploitable", "vulns"."is_confirmed", "vulns"."is_in_the_news", "vulns"."is_in_the_wild", "vulns"."reflinks", "vulns"."reflinkids", "vulns"."created_at", "vulns"."updated_at", "vulns"."id" AS "exploit_count", false AS "monitored", '42' AS "org" FROM "vulns" WHERE ("vulns"."score" >= 0 AND "vulns"."score" <= 100) ORDER BY "vulns"."updated_at" DESC LIMIT 10
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=315191.32..315192.17 rows=10 width=1691) (actual time=3013.964..3013.990 rows=10 loops=1)
-> Unique (cost=315191.32..329642.42 rows=170013 width=1691) (actual time=3013.962..3013.986 rows=10 loops=1)
-> Sort (cost=315191.32..315616.35 rows=170013 width=1691) (actual time=3013.961..3013.970 rows=10 loops=1)
Sort Key: updated_at DESC, id, uuid, feedid, cve_id, cveid, summary, published, modified, assigner, cwe_id, vulnerable_packages_versions, vulnerable_products, vulnerable_product_versions, cvss, cvss_time, cvss_version, cvss_vector, cvss_metrics, access, impact, cvss3, cvss3_vector, cvss3_version, cvss3_metrics, score, is_exploitable, is_confirmed, is_in_the_news, is_in_the_wild, reflinks, reflinkids, created_at
Sort Method: external merge Disk: 277648kB
-> Seq Scan on vulns (cost=0.00..50542.19 rows=170013 width=1691) (actual time=0.044..836.597 rows=169846 loops=1)
Filter: ((score >= 0) AND (score <= 100))
Planning time: 3.183 ms
Execution time: 3070.346 ms
I have created a btree varchar index btree_varchar" btree (history_type varchar_pattern_ops) like this:
CREATE INDEX CONCURRENTLY btree_varchar ON vulns_historicalvuln (history_type varchar_pattern_ops);
I have also created a index for vulns score for my second queries:
CREATE INDEX CONCURRENTLY ON vulns (score);
I read a lot of post and documentation about slow queries and index. I'am sure it's the solution about slow queries but the query plan of Postgresql doesn't use the index I have created. It estimates that it processes faster with seq scan than using the index...
SELECT relname, indexrelname, idx_scan FROM pg_catalog.pg_stat_user_indexes;
relname | indexrelname | idx_scan
-------------------------------------+-----------------------------------------------------------------+------------
vulns_historicalvuln | btree_varchar | 0
Could you tell me if my index is well designed ? How I can debug this, feel free to ask more information if needed.
Thanks
After some research, I understand that index is not the solution of my problem here.
Low_cardinality (repeated value) of this field make the index useless.
The time of the query postgresql here is normal because of 30M rows matched.
I close this issue because there is no problem with index here.

Postgresql - not using parallelism

I am executing a select query using full outer join across 2 tables which are in 2 different databases. I'm using Postgresql 9.6.
The query is not going with parallelism even if we set the below parameters:
work_mem=256MB,
max_worker_process=40,
force_parallel_mode=on,
max_parallel_workers_per_gather=4,
parallel_tuple_cost=0.1,
parallel_setup_cost=1000,
min_parallel_relation_size=8MB
This is the query:
SELECT mea.ocs_cdr_type,
mea.ocs_time_stamp,
mea.sum_ocs_call_cost,
ctr.ctr_name
FROM mea_req_54 mea
FULL OUTER JOIN country ctr ON mea.ocs_imei = ctr.ctr_name;
This is the definition of mea_req_54:
Table "public.mea_req_54"
Column | Type | Modifiers
----------------------------+-----------------------------+-----------
mer_id | numeric(19,0) | not null
mer_from_dttm | timestamp without time zone | not null
mer_to_dttm | timestamp without time zone | not null
fng_id | numeric(19,0) |
ocs_imsi_number_norm | character varying(255) |
ocs_account_number | character varying(255) |
ocs_charging_id | character varying(255) |
ocs_cdr_type | character varying(255) |
ocs_bit_description | character varying(255) |
ocs_time_stamp_raw | timestamp without time zone |
ocs_time_stamp | timestamp without time zone |
ocs_duration | numeric(10,0) |
ocs_duration_str | character varying(255) |
ocs_upload_volume | numeric(19,0) |
ocs_download_volume | numeric(19,0) |
sum_ocs_total_volume | numeric(19,0) |
sum_ocs_call_cost | numeric(19,0) |
ocs_plmn_identifier | character varying(255) |
ocs_imei | character varying(255) |
ocs_user_loc_info | character varying(255) |
ocs_bp_id | character varying(255) |
ocs_ref_spec_from_contract | character varying(255) |
ocs_subapp_in_contract_acc | character varying(255) |
ocs_baseline_date_bill | timestamp without time zone |
ocs_target_date_bill | timestamp without time zone |
ocs_date_of_origin_bill | timestamp without time zone |
ctr_id | numeric(10,0) |
ctr_iso_cd | character varying(255) |
ctr_name | character varying(255) |
dblink_run | numeric(10,0) |
Indexes:
"mea_req_54_pk" UNIQUE, btree (mer_id)
This is the definition of country:
Table "public.country"
Column | Type | Modifiers
---------------------+------------------------+-----------
ctr_id | numeric(10,0) | not null
ctr_iso_cd | character varying(255) | not null
ctr_name | character varying(255) | not null
system_generated_fl | character(1) |
ctr_delete_fl | character(1) | not null
ctr_dial_code | character varying(255) | not null
ctr_version_id | numeric(10,0) | not null
ptn_id | numeric(10,0) | not null
Indexes:
"country_ak" UNIQUE, btree (ctr_name)
"country_pk" UNIQUE, btree (ctr_id)
"country_ss1" UNIQUE, btree (ctr_iso_cd)
This is the execution plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Hash Full Join (cost=482.50..14564568.50 rows=300000000 width=29) (actual time=8.810..305863.949 rows=300015000 loops=1)
Hash Cond: ((mea.ocs_imei)::text = (ctr.ctr_name)::text)
-> Seq Scan on mea_req_54 mea (cost=0.00..10439086.00 rows=300000000 width=19) (actual time=0.005..131927.791 rows=300000000 loops=1)
-> Hash (cost=295.00..295.00 rows=15000 width=13) (actual time=8.784..8.784 rows=15000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 791kB
-> Seq Scan on country ctr (cost=0.00..295.00 rows=15000 width=13) (actual time=0.008..4.138 rows=15000 loops=1)
Planning time: 0.085 ms
Execution time: 355065.791 ms
(8 rows)
The documentation is silent about that, but in backend/optimizer/path/joinpath.c, function hash_inner_and_outer, I find the following enlightening comment:
/*
* If the joinrel is parallel-safe, we may be able to consider a
* partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
* because the outer path will be partial, and therefore we won't be
* able to properly guarantee uniqueness. Similarly, we can't handle
* JOIN_FULL and JOIN_RIGHT, because they can produce false null
* extended rows. Also, the resulting path must not be parameterized.
*/
This makes sense – a parallel worker that scans part of mea_req_54 has no way to know if there is a row in country that does not match any of the rows in mea_req_54.
Now nested loop joins cannot be used for full outer joins, so all that remains is a parallel merge join.
I can't say if a merge join is an option here, but you may try and create an index on mea_req_54(ocs_imei) and see if that helps the optimizer choose a parallel plan.
Otherwise, you are probably out of luck.
Post reducing the parallel_tuple_cost and parallel_setup_cost parameters from the default values, the query ran with parallelism.
But I wanted to know about these parameters what exactly they are ?

Cannot modify PostgreSql query, how to speed it up

I'm looking for speeding up a query (PostgreSql 9.5), but I cannot change it, because is executed by an application I cannot modify.
So, I captured the query from the PostgreSql logs, here it is:
SELECT Count(*)
FROM (SELECT ti.idturnosistemaexterno,
ti.historiaclinica_hp,
p.patientname,
CASE
WHEN ( ti.creationdate :: VARCHAR IS NOT NULL ) THEN
Date_trunc('SECOND', ti.creationdate) :: VARCHAR
ELSE 'NO EXISTE' :: VARCHAR
END AS creationdate,
CASE
WHEN ( st.idstudy :: VARCHAR IS NOT NULL ) THEN 'SI' :: VARCHAR
ELSE 'NO' :: VARCHAR
END AS idstudy,
st.institutionname,
CASE
WHEN ( st.created_time :: VARCHAR IS NOT NULL ) THEN
Date_trunc('SECOND', st.created_time) :: VARCHAR
ELSE 'NO EXISTE' :: VARCHAR
END AS created_time,
ti.enviado,
st.accessionnumber,
st.modality
FROM study st
right join turnointegracion ti
ON st.accessionnumber = ti.idturnosistemaexterno
left join patient p
ON st.idpatient = p.idpatient
ORDER BY ti.creationdate DESC) AS foo;
The explain analyze output is this:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=231136.16..231136.17 rows=1 width=0) (actual time=32765.883..32765.883 rows=1 loops=1)
-> Sort (cost=230150.04..230314.39 rows=65741 width=8) (actual time=32754.992..32761.780 rows=64751 loops=1)
Sort Key: ti.creationdate DESC
Sort Method: external merge Disk: 1648kB
-> Hash Right Join (cost=219856.39..224889.28 rows=65741 width=8) (actual time=26653.007..32714.961 rows=64751 loops=1)
Hash Cond: ((st.accessionnumber)::text = (ti.idturnosistemaexterno)::text)
-> Seq Scan on study st (cost=0.00..4086.26 rows=77126 width=12) (actual time=12.983..6032.251 rows=77106 loops=1)
-> Hash (cost=219048.95..219048.95 rows=64595 width=16) (actual time=26639.722..26639.722 rows=64601 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 3602kB
-> Seq Scan on turnointegracion ti (cost=0.00..219048.95 rows=64595 width=16) (actual time=17.259..26611.806 rows=64601 loops=1)
Planning time: 25.519 ms
Execution time: 32766.710 ms
(12 rows)
Here are the related table definitions:
Table "public.turnointegracion"
Column | Type | Modifiers
---------------------------+-----------------------------+--------------------------------------------------------------------
idturnosistemaexterno | character varying(50) |
historiaclinica_hp | integer |
matriculaprofrealiza | character varying(10) |
matriculaprofinforma | character varying(10) |
obrasocial | character varying(20) |
planobrasocial | character varying(20) |
nroafiliado | character varying(20) |
nroautorizacion | character varying(20) |
matriculaprofprescribe | character varying(10) |
codigonomenclador | character varying(10) |
cantidadcodigonomenclador | integer |
importeunitariohonorarios | money |
importeunitarioderechos | money |
nrodefacturacion | character varying(15) |
informe | bytea |
titulodelestudio | character varying(250) |
fechaprescripcion | timestamp without time zone |
fechahora | timestamp without time zone |
enviado | boolean | not null default false
enviadofechahora | timestamp without time zone |
procesado_hp | timestamp without time zone |
modalidad | character varying(6) |
orden | integer | not null default nextval('turnointegracion_orden_seq'::regclass)
idturno | integer | not null default nextval('seq_turnointegracion_idturno'::regclass)
creationdate | timestamp without time zone | default now()
informetxt | text |
informedisponible | timestamp without time zone |
informeprocesado | timestamp without time zone |
Indexes:
"turnointegracion_pkey" PRIMARY KEY, btree (idturno)
"idx_fechahora" btree (fechahora)
"idx_historiaclinicahp" btree (historiaclinica_hp)
"idx_idturnosistemaexterno" btree (idturnosistemaexterno)
"idx_informedisponible" btree (informedisponible)
"idx_turnointegracion_creationdate" btree (creationdate DESC)
"idx_turnointegracion_idturnosistext_text" btree ((idturnosistemaexterno::text))
Table "public.study"
Column | Type | Modifiers
------------------------------+-----------------------------+---------------------------------------------------------
idstudy | integer | not null default nextval('study_idstudy_seq'::regclass)
studydate | date |
studytime | time without time zone |
studyid | character varying(20) |
studydescription | character varying(255) |
modality | character varying(2) |
modalityaetitle | character varying(50) |
nameofphysiciansreadingstudy | character varying(255) |
accessionnumber | character varying(20) |
performingphysiciansname | character varying(255) |
referringphysiciansname | character varying(255) |
studyinstanceuid | character varying(255) |
status | status_ |
institutionname | character varying(100) |
idpatient | integer |
created_time | timestamp without time zone |
Indexes:
"study_pkey" PRIMARY KEY, btree (idstudy)
"study_studyinstanceuid_key" UNIQUE CONSTRAINT, btree (studyinstanceuid)
"idx_study_accession_text" btree ((accessionnumber::text))
"idx_study_accessionnumber" btree (accessionnumber)
"idx_study_idstudy" btree (idstudy)
Foreign-key constraints:
"study_idpatient_fkey" FOREIGN KEY (idpatient) REFERENCES patient(idpatient)
Referenced by:
TABLE "series" CONSTRAINT "series_idstudy_fkey" FOREIGN KEY (idstudy) REFERENCES study(idstudy)
As you can see, I've added indexes on the affected columns but the planner is still doing sequential scans. Is there a way to improve this?.
There is no WHERE condition, due to this join:
right join turnointegracion ti
ON st.accessionnumber = ti.idturnosistemaexterno
you're reading all records from turnointegracion, adding an index for `creationdate' you can accelerate the sort function, but again all records are returned.
Filtering by creationdate can reduce the final time.

Left outer join - how to return a boolean for existence in the second table?

In PostgreSQL 9 on CentOS 6 there are 60000 records in pref_users table:
# \d pref_users
Table "public.pref_users"
Column | Type | Modifiers
------------+-----------------------------+--------------------
id | character varying(32) | not null
first_name | character varying(64) | not null
last_name | character varying(64) |
login | timestamp without time zone | default now()
last_ip | inet |
(... more columns skipped...)
And another table holds around 500 ids of users which are not allowed to play anymore:
# \d pref_ban2
Table "public.pref_ban2"
Column | Type | Modifiers
------------+-----------------------------+---------------
id | character varying(32) | not null
first_name | character varying(64) |
last_name | character varying(64) |
city | character varying(64) |
last_ip | inet |
reason | character varying(128) |
created | timestamp without time zone | default now()
Indexes:
"pref_ban2_pkey" PRIMARY KEY, btree (id)
In a PHP script I am trying to display all 60000 users from pref_users in a jQuery-dataTable. And I would like to mark the banned users (the users found in pref_ban2).
Which means I need a column named ban for each record in my query holding true or false.
So I am trying a left outer join query:
# select
b.id, -- how to make this column a boolean?
u.id,
u.first_name,
u.last_name,
u.city,
u.last_ip,
to_char(u.login, 'DD.MM.YYYY') as day
from pref_users u left outer join pref_ban2 b on u.id=b.id
limit 10;
id | id | first_name | last_name | city | last_ip | day
----+----------+-------------+-----------+-------------+-----------------+------------
| DE1 | Alex | | Bochum | 2.206.0.224 | 21.11.2014
| DE100032 | Княжна Мэри | | London | 151.50.61.131 | 01.02.2014
| DE10011 | Aлександр Ш | | Симферополь | 37.57.108.13 | 01.01.2014
| DE10016 | Semen10 | | usa | 69.123.171.15 | 25.06.2014
| DE10018 | Горловка | | Горловка | 178.216.97.214 | 25.09.2011
| DE10019 | -Дмитрий- | | пермь | 5.140.81.95 | 21.11.2014
| DE10047 | Василий | | Cумы | 95.132.42.185 | 25.07.2014
| DE10054 | Maedhros | | Чикаго | 207.246.176.110 | 26.06.2014
| DE10062 | ssergw | | москва | 46.188.125.206 | 12.09.2014
| DE10086 | Вадим | | Тула | 109.111.26.176 | 26.02.2012
(10 rows)
As you can see the b.id column above is empty - because these 10 users aren't banned.
How to get a false value in that column instead of a String?
And I am not after some coalesceor case expression, but am looking for "the proper" way to do such a query.
"IS NULL" and "IS NOT NULL" return a boolean, so this should make it easy.
I think this is all you need?
SELECT
b.id IS NOT NULL as is_banned, -- The value of "is_banned" will be a boolean
Not sure if you need the "NOT" or not, but you'll get a bool either way.
A CASE or COALESCE statement with an outer join IS the proper way to do this.
select
CASE
WHEN b.id IS NULL THEN true
ELSE false
END AS banned,
u.id,
u.first_name,
u.last_name,
u.city,
u.last_ip,
to_char(u.login, 'DD.MM.YYYY') as day
from pref_users u
left outer join pref_ban2 b
on u.id=b.id
limit 10;