Why Tableau full outer join is behaving like left join?

Why Tableau full outer join is behaving like left join? - amazon-redshift

I want to join two tables on Tableau. I don't want to lose any entries so I'm using a full outer join:
The data looks like this
Table 1:
Name1 Status Storage Certificate
gbo001 Running 16GB on
gbo003 Running 16GB on
gbo005 Running 16GB on
gbo006 Running 16GB on
Table2
Name2 Price
gbo001 10000
gbo002 12000
gbo003 12000
gbo004 16000
gbo006 11000
gbo007 14000
So I am using a full outer join, this is my query:
SELECT "Table_1"."name1" AS "name1",
"Table_1"."Status" AS "Status",
"Table_1"."Storage" AS "Storage",
"Table_1"."Certificate" AS "Certificate",
"Table_2"."Name2" AS "Name2",
"Table_2"."Price" AS "Price",
FROM "public"."Table1" "Table1"
FULL JOIN "public"."Table2" "Table2" ON ("Table1"."Name1" = "Table2"."Name2")
Strangely enough it is giving me same results as left join would:
Name1 Status Storage Certificate Name2 Price
gbo001 Running 16GB on gbo001 10000
gbo003 Running 16GB on gbo001 12000
gbo005 Running 16GB on null null
gbo006 Running 16GB on gbo006 11000
And these are the results I'd be expecting with a full outer join:
Name1 Status Storage Certificate Name2 Price
gbo001 Running 16GB on gbo001 10000
null null null null gbo002 12000
gbo003 Running 16GB on gbo003 12000
null null null null gbo004 16000
gbo005 Running 16GB on null null
gbo006 Running 16GB on gbo006 11000
null null null null gbo007 14000
Is it possible to adjust my query accordingly so I can see all existing entries from both tables?

In case somebody is having a similar problem on Tableau, this happened to me a few days ago and after several hours trying to figure out why tableau was creating a left join instead of a full join I decided to just close and reopen Tableau and that actually worked. So I guess there's some kind of bug around this.
Hope this saves someone some time.

Related

Speed query very slow on TimescaleDB

I created a table with the command:
CREATE TABLE public.g_dl (
id bigint NOT NULL,
id_tram integer,
site_id character varying(40),
user_id character varying(40),
app_code character varying(40),
time TIMESTAMP WITHOUT TIME ZONE NOT NULL
);
SELECT create_hypertable('g_dl','time');
Then I insert an amount of about 34 million records.
The query speed is very good, the amount of Ram used by the docker container is about 500MB-1.2GB. But query speed gets a problem after I insert amount of 1.8 million records, the time field is out of order before I had inserted it.
I use DBbeaver and get the message "You might need to increase max_locks_per_transaction". Then I change that value up
increase max_locks_per_transaction = 1000
Query speed is very slow and the amount of ram used by docker is 10GB - 12GB. What I'm doing wrong way. Pls let me know.
Outout when explain:
EXPLAIN analyze select * from g_dlquantracts gd where id_tram = 300
Raw JSON explain:
https://drive.google.com/file/d/1bA5EwcWMUn5oTUirHFD25cSU1cYGwGLo/view?usp=sharing
Formatted execution plan: https://explain.depesz.com/s/tM57

Weird behavior with Google Cloud PostgreSQL

I'm running the Google Cloud SQL Proxy to connect to a remote PostgreSQL (v9.6) instance on Google Cloud (GCP).
I'm having a very weird issue with the number of returned rows from these 2 queries:
select "id", "original_image"
from "tableA"
where "image_id" is NULL
order by "id" asc
limit 581
returns 0 rows
select "id", "original_image"
from "tableA"
where "image_id" is NULL
order by "id" asc
limit 580
returns 73 rows
If I remove the WHERE clause (where "image_id" is NULL) both return the limit amount (581 and 580, respectively).
Any ideas of what could be causing this?

pgAdmin slow when displaying data?

I am seeing a huge degradation in performance after moving some tables from SQL Server 2008 to Postgres, and I'm wondering if I'm missing a configuration step, or it is normal for postgres to behave this way.
The query used is a simple SELECT from the table. No joins, no ordering, nothing.
The table itself has only about 12K rows.
I have tried this on 3 machines:
Machine A hardware: 50GB RAM, SSD disks, CPU: Xeon® E5-2620v3 (OS:
Ubuntu Server 16), DBMS: Postgres 9.5
Machine B hardware: 8GB RAM, Sata disks, CPU: Xeon E5-4640 (OS:
Ubuntu Server 12), DBMS: Postgres 9.4
Machine C hardware: 4GB RAM, IDE disks, CPU: Xeon E3-1220v2 (OS:
Windows Server 2008), DBMS: SQL Server 2008 R2
The performance I am seeing is similar between the 2 Postgres databases, despite the vast difference in hardware and configuration. How can this be?
Machine A query. Notice that I'm excluding the geometry column in order to work with "pure" datatypes:
EXPLAIN ANALYZE VERBOSE SELECT id, "OID", type, name, orofos, xrisi_orofoy, area_sqm,
perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc,
xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY"
FROM public."korydallos_Land_Uses";
Results:
"Seq Scan on public."korydallos_Land_Uses" (cost=0.00..872.41 rows=12841 width=209) (actual time=0.025..13.450 rows=12841 loops=1)"
" Output: id, "OID", type, name, orofos, xrisi_orofoy, area_sqm, perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc, xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY""
"Planning time: 0.137 ms"
"Execution time: 14.788 ms"
This is 14 seconds for a simple select!! Wtf? Compare this with SQL Server:
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements 0
Rows affected by INSERT, DELETE, or UPDATE statements 0
Number of SELECT statements 1
Rows returned by SELECT statements 12840
Number of transactions 0
Network Statistics
Number of server roundtrips 1
TDS packets sent from client 1
TDS packets received from server 1040
Bytes sent from client 1010
Bytes received from server 2477997
Time Statistics
Client processing time 985
Total execution time 1022
Wait time on server replies 37
I am at a loss at what could be happening. I also tried:
Checking for dead rows: 0
Vacuuming
Simply querying the primary key (!). This takes 500ms to execute.
With each column I add to the select, around 500 more ms are added to
the query.
Machine A Postgres performance settings:
max_connections = 200
shared_buffers = 12800MB
effective_cache_size = 38400MB
work_mem = 32MB
maintenance_work_mem = 2GB
min_wal_size = 4GB
max_wal_size = 8GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
Machine B Postgres performance settings:
max_connections = 200
shared_buffers = 128MB
#effective_cache_size = 4GB
#work_mem = 4MB
#maintenance_work_mem = 64MB
#min_wal_size = 80MB
#max_wal_size = 1GB
#checkpoint_completion_target = 0.5
#wal_buffers = -1
#default_statistics_target = 100
Table definition in Postgres:
CREATE TABLE public."korydallos_Land_Uses"
(
id integer NOT NULL DEFAULT nextval('"korydallos_Land_Uses_id_seq"'::regclass),
wkb_geometry geometry(Polygon,4326),
"OID" integer,
type character varying(255),
name character varying(255),
orofos character varying(255),
xrisi_orofoy character varying(255),
area_sqm numeric,
perimeter_m numeric,
syn1_ numeric,
syn1_id numeric,
str_name character varying(255),
str_no character varying(255),
katanomh numeric,
linkc numeric,
xrcode character varying(255),
kat numeric,
ot character varying(255),
use character varying(255),
syn numeric,
notes character varying(255),
"MinX" numeric,
"MinY" numeric,
"MaxX" numeric,
"MaxY" numeric,
CONSTRAINT "korydallos_Land_Uses_pkey" PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public."korydallos_Land_Uses"
OWNER TO root;
CREATE INDEX "sidx_korydallos_Land_Uses_wkb_geometry"
ON public."korydallos_Land_Uses"
USING gist
(wkb_geometry);
EDIT: Removed the irrelevant SQL Server definition as suggested in the comments. Keeping the time as I think it's still relevant.
As per the comments, more info using:
explain (analyze, verbose, buffers, timing) SELECT id, "OID", type, name, orofos, xrisi_orofoy, area_sqm,
perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc,
xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY"
FROM public."korydallos_Land_Uses"
Results:
"Seq Scan on public."korydallos_Land_Uses" (cost=0.00..872.41 rows=12841 width=209) (actual time=0.019..11.207 rows=12841 loops=1)"
" Output: id, "OID", type, name, orofos, xrisi_orofoy, area_sqm, perimeter_m, syn1_, syn1_id, str_name, str_no, katanomh, linkc, xrcode, kat, ot, use, syn, notes, "MinX", "MinY", "MaxX", "MaxY""
" Buffers: shared hit=744"
"Planning time: 1.073 ms"
"Execution time: 12.269 ms"
PG Admin shows me this in the "Explain tab":
How I measure the 14 seconds:
Status window of PG Admin 3, bottom right corner, when running the query. (It says 14.3 secs for the trolls here).

https://www.postgresql.org/docs/current/static/using-explain.html
Note that the “actual time” values are in milliseconds of real time,
so in your case
actual time=0.019..11.207
means running query took 11 milliseconds.
pgadmin "explain tab" says the same... Now if you see 14.3 sec in right bottom corner and the time it took is indeed 14 seconds (measured with watches) I assume it is some awful delay on network level or pgadmin itself. Try running this in psql for instance:
select clock_timestamp();
explain analyze select * FROM public."korydallos_Land_Uses";
select clock_timestamp();
this will show time intervals server side + time needed to send command from psql to server - if it takes still 14 seconds - talk to you network admin, if not, try upgrading pgadmin

Reading pg_buffercache output

I am using postgres-9.3 (in CenOS 6.9) and trying to understand the pg_buffercache table output.
I ran this:
SELECT c.relname,count(*) AS buffers FROM pg_class c INNER JOIN
pg_buffercache b ON b.relfilenode=c.relfilenode INNER JOIN
pg_database d ON (b.reldatabase=d.oid AND
d.datname=current_database()) GROUP BY c.relname
ORDER BY 2 DESC LIMIT 5;
and the output below showed one of the tables using 6594 buffers. This was during when I had tons of INSERT followed by SELECT and UPDATE in the data_main table).
relname | buffers
------------------+---------
data_main | 6594
objt_main | 1897
objt_access | 788
idx_data_mai | 736
I also ran "select * from pg_buffercache where is dirty" which showed around 50 entries.
How should I interpret these numbers? Does the buffer count correspond to all the transactions since I created the extension or the recent ones. How can I find out if my specific operation using the proper amount of buffers?
Here's my setting:
# show shared_buffers;
shared_buffers
----------------
1GB
# show work_mem;
work_mem
----------
128kB
# show maintenance_work_mem;
maintenance_work_mem
----------------------
64GB
And the current free mem (I have 64GM memory in this machine). And I have a mixed workload machine with period bursts of INSERTS and lots of SELECTS. Currently the database and tables are small but will grow to at least 2 million rows.
$ free -m
total used free shared buffers cached
Mem: 64375 33483 30891 954 15 15731
/+ buffers/cache: 18097 46278
Swap: 32767 38 32729
Basically, I am trying to understand how to properly use this pg_buffercache table. Should I ran this query periodically? And do I need to change my shared_buffers accordingly.

I did some reading and testing and this is what I have found. Found a userful query here: How large is a "buffer" in PostgreSQL
Here are a few notes for others that have similar questions.
You will need to create the extension for each database. So "\c db_name" then "create extension pg_buffercache".
Same for running the queries.
Restarting the database clears the queries.

Postgres performance not increasing with increase in number of core

I was trying out postgres google-cloud-sql and loaded a simple school schema
CREATE TABLE school (
id SERIAL NOT NULL PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE class (
id SERIAL NOT NULL PRIMARY KEY,
name TEXT,
school_id INTEGER NOT NULL REFERENCES school
);
CREATE TABLE student (
id SERIAL NOT NULL PRIMARY KEY,
name TEXT,
class_id INTEGER NOT NULL REFERENCES class
);
-- ALL id and foreign keys have indexs
Loaded ~15 millions row in total with 1500 school, 500 class per school, 200 student per class.
After that create a simple pgbench script
\setrandom sId1 1 20000000
\setrandom sId2 1 20000000
\setrandom sId3 1 20000000
select count(*) from school s
join class c on s.id=c.school_id
join student stu on c.id=stu.class_id where s.id=:sId1;
select count(*) from school s
join class c on s.id=c.school_id
join student stu on c.id=stu.class_id where s.id=:sId2;
select count(*) from school s
join class c on s.id=c.school_id
join student stu on c.id=stu.class_id where s.id=:sId3;
Now running the the script with
pgbench -c 90 -f ./sql.sql -n -t 1000
2 cores, 7.5 GB, 90 client --
OUTPUT:
number of transactions actually processed: 90000/90000
tps = 1519.690555 (including connections establishing)
tps = 2320.408683 (excluding connections establishing
26 cores, 30 GB, 90 client-
number of transactions actually processed: 90000/90000
tps = 1553.721286 (including connections establishing)
tps = 2405.664795 (excluding connections establishing)
Question:
Why do we have only 80 tps increase from 2 core to 26 cores ?

I asked same question on the postgres irc.
Community was sure that i was maxing out the client pgbench , they suggested to use -j4 in pgbench and tps increased to 23k per sec.

Because an individual SELECT will only operate in one process running on one core. What adding extra cores will do is to allow multiple simultaneous operations to be performed. So if you were to throw (say) 1,000 simultaneous queries at the database, they would execute more quickly on 26 cores rather than 2 cores.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Why Tableau full outer join is behaving like left join? - amazon-redshift

Related

Speed query very slow on TimescaleDB

Weird behavior with Google Cloud PostgreSQL

pgAdmin slow when displaying data?

Reading pg_buffercache output

Postgres performance not increasing with increase in number of core

Categories

Resources