Large Queries Hanging with the MongoDB Bi Connector (and Tableau) - mongodb

I’m hoping that someone with experience using the MongoDB BI connector and Tableau will be able to help me out. I am totally lost and don’t have any idea how to debug this issue I’m having with the BI connector.
Currently I am running both MongoDB and Tableau inside of a kubernetes cluster. Inside the cluster I also have a pod running the BI Connector. Tableau is successfully connecting to MongoDB via the BI connector and I am able to create workbooks and visualizations with multiple collections from MongoDB inside of Tableau.
Where my problem starts is that some large queries simply hang and never complete or give any errors. I’ve noticed this in Tableau as well as via a mysql cli client I have connected to the BI connector. In both cases the BI connector never completes the request. I know that the query has valid SQL syntax so I am totally stumped here.
Is there perhaps some kind of limitation that I am facing because I am not using MongoDB’s Atlas product?
I will include some queries that definitely work and the query that does not work. Any help would be so greatly appreciated. Or if anyone has any insight as to what may cause the long hang.
Below are two example queries that work fine:
SELECT
market,
CONVERT(date, date) as date,
clicks,
conversions,
cost,
impressions
FROM ad_metrics
SELECT
go.customer_id,
go.id as lead_id,
CONVERT(DATE_SUB(go.created_at, INTERVAL 7 HOUR), date) as date,
go.market as market,
go.make,
go.model,
go.no_way,
go.repair_location,
go.source as website_source,
go.utm_content,
go.utm_source,
go.post_tax_amount_requested
FROM god_objects go
WHERE (go.is_a_test = 0 OR go.is_a_test IS NULL)
AND (go.carparts = 0 OR go.carparts IS NULL)
AND (go.no_way = 0 OR go.no_way IS NULL)
and below is the query that hangs:
WITH tableau_ads as (
SELECT
market,
CONVERT(date, date) as date,
clicks,
conversions,
cost,
impressions
FROM ad_metrics
), tableau_leads as (
SELECT
go.customer_id,
go.id as lead_id,
CONVERT(DATE_SUB(go.created_at, INTERVAL 7 HOUR), date) as date,
go.market as market,
go.make,
go.model,
go.no_way,
go.repair_location,
go.source as website_source,
go.utm_content,
go.utm_source,
go.post_tax_amount_requested
FROM god_objects go
WHERE (go.is_a_test = 0 OR go.is_a_test IS NULL)
AND (go.carparts = 0 OR go.carparts IS NULL)
AND (go.no_way = 0 OR go.no_way IS NULL)
), tableau_sales as (
SELECT
q.id as quote_id,
go.id as lead_id,
j.id as job_id,
go.market,
CONVERT(DATE_SUB(go.sold_at, INTERVAL 7 HOUR), date) as date,
CONVERT(DATE_SUB(go.initial_job_date, INTERVAL 7 HOUR), date) as initial_job_date,
go.post_tax_amount_requested,
go.amount_collected,
go.customer_id,
go.make,
go.model,
go.repair_location,
go.source as website_source,
go.utm_content,
go.utm_source,
q.balance_amount_due,
q.assigned_technician_id,
q.payment_status,
q.quote_grand_total,
q.total_transaction_amount,
j.is_active,
j.technician_id
FROM quotes q
LEFT JOIN god_objects go ON go.id = q.lead_id
LEFT JOIN jobs j ON go.id = j.lead_id
WHERE (go.is_a_test = 0 OR go.is_a_test IS NULL)
AND (go.carparts = 0 OR go.carparts IS NULL)
AND go.initial_job_date IS NOT NULL
AND go.post_tax_amount_requested >= 200.0
) SELECT
tableau_leads.market,
tableau_leads.date,
sum(tableau_ads.clicks) as ad_clicks,
sum(tableau_ads.conversions) as ad_conversions,
sum(tableau_ads.cost) as ad_cost,
sum(tableau_ads.impressions) as ad_impressions,
COUNT(tableau_leads.lead_id) as lead_count,
COUNT(tableau_sales.quote_id) as sale_count,
SUM(tableau_leads.post_tax_amount_requested) as lead_amount_requested,
SUM(tableau_sales.post_tax_amount_requested) as sale_amount_requested
FROM tableau_leads
LEFT JOIN tableau_ads ON (tableau_leads.market = tableau_ads.market AND tableau_leads.date = tableau_ads.date)
LEFT JOIN tableau_sales ON (tableau_leads.market = tableau_sales.market AND tableau_leads.date = tableau_sales.date)
GROUP BY market, date
Is there something wrong with my SQL? Is there any other kind of issue? Again, any help would be greatly appreciated!

Related

oracle merge query in postgres

I have this merge query in oracle and it was working fine. Now we are migrating to postgres 10 and trying to find equivalent for this in postgres.
MERGE INTO s.act_pack C USING((SELECT A.jid, A.pid, B.pcode,
B.mc, A.md, A.hd FROM s.act_pack A INNER JOIN s.act_pack B
ON A.pid = B.pid AND A.pcode = B.mc AND (A.hd <> B.hd
OR A.md<> B.md)) order by A.upd_ts desc) D ON(C.pid = D.pid AND
C.pcode = D.pcode AND C.jid = D.jid) WHEN MATCHED THEN UPDATE SET C.md =
D.md, C.hd= D.hd;
I see some forums on web says postgres doesnt support merge, and use INSERT ... ON CONFLICT
but with no background in postgres, I am not able to understand how this complex query can be written using that.
And some says postgres9.5 and above support merge statement. since we are using postgres 10 tried to use same oracle query in postgres but recieved ERROR: syntax error at or near "MERGE"
Any help is highly appreciated.
You don't need an "UPSERT" as you are not doing an INSERT, so a regular UPDATE is enough:
update act_pack C
SET C.md = D.md,
C.hd = D.h
from (
SELECT A.jid, A.pid, B.pcode, B.mc, A.md, A.hd
FROM s.act_pack A
INNER JOIN s.act_pack B
ON A.pid = B.pid
AND A.pcode = B.mc
AND (A.hd <> B.hd OR A.md<> B.md)
) d
where C.pid = D.pid
AND C.pcode = D.pcode
AND C.jid = D.jid
This is a direct "translation" of your code. But the fact that the same table is used three times is a bit strange. But without more information it's hard to know where exactly this could be made more efficient.

Can somebody help me translate this into postgresql?

I am very new to SQL and I do not know much about writing code in the different DBMS. I am trying to write a report in our school's MOODLE platform, which uses postgresql, using a configurable report found here. However, the code does not work in postgresql. In particular, how do I rewrite those lines with variable assignments like #prevtime := to make the code work in postgresql?
Here is the complete code from the link.
SELECT
l.id,
l.timecreated,
DATE_FORMAT(FROM_UNIXTIME(l.timecreated),'%d-%m-%Y') AS dTime,
#prevtime := (SELECT MAX(timecreated) FROM mdl_logstore_standard_log
WHERE userid = %%USERID%% AND id < l.id ORDER BY id ASC LIMIT 1) AS prev_time,
IF (l.timecreated - #prevtime < 7200, #delta := #delta + (l.timecreated-#prevtime),0) AS sumtime,
l.timecreated-#prevtime AS delta,
"User" AS TYPE
FROM prefix_logstore_standard_log AS l,
(SELECT #delta := 0) AS s_init
# CHANGE UserID
WHERE l.userid = %%USERID%% AND l.courseid = %%COURSEID%%
%%FILTER_STARTTIME:l.timecreated:>%% %%FILTER_ENDTIME:l.timecreated:<%%
This is supposed to report the time spent by students in courses in MOODLE.
I assume the original query was written for MySQL. You haven't explained what the query actually does, but the #prevtime hack is usually a workaround for missing window functions, so most probably this can be done using lag() in Postgres, something along the lines:
select l.id,
l.timecreated,
to_char(to_timestamp(l.timecreated), 'dd-mm-yyyy') as dtime,
lag(timecreated) over w as prev_time,
l.timecreated - lag(timecreated) over w as delta,
'User' as type,
FROM prefix_logstore_standard_log AS l
window w as (partition by userid order by id)
WHERE l.userid = %%USERID%%
AND l.courseid = %%COURSEID%%

Trying to isolate the hours given a date range (YYYY:MM:DD HH:MM:SS) in SQL and group them. by specific hour intervals regardless of the date

I am struggling trying to extract information out of the database I created in SQL. The views work great and all data is displayed but I am trying to isolate the following:
Isolate time frames from 07:00:00 to 09:00:00.
Still new to coding, so help is appreciated.
SELECT ch.name,
t.date,
t.amount,
t.card AS "Credit Card",
t.id_merchant,
m.name AS "Merchant",
mc.name AS "merchant category"
FROM transaction AS t
JOIN credit_card AS cc
ON (t.card = cc.card)
JOIN card_holder AS ch
ON (cc.cardholder_id = ch.id)
JOIN merchant AS m
ON (t.id_merchant = m.id
JOIN merchant_category AS mc
ON (m.id_merchant_category = mc.id);

Optimizing Postgres query with timestamp filter

I have a query:
SELECT DISTINCT ON (analytics_staging_v2s.event_type, sent_email_v2s.recipient, sent_email_v2s.sent) sent_email_v2s.id, sent_email_v2s.user_id, analytics_staging_v2s.event_type, sent_email_v2s.campaign_id, sent_email_v2s.recipient, sent_email_v2s.sent, sent_email_v2s.stage, sent_email_v2s.sequence_id, people.role, people.company, people.first_name, people.last_name, sequences.name as sequence_name
FROM "sent_email_v2s"
LEFT JOIN analytics_staging_v2s ON sent_email_v2s.id = analytics_staging_v2s.sent_email_v2_id
JOIN people ON sent_email_v2s.person_id = people.id
JOIN sequences on sent_email_v2s.sequence_id = sequences.id
JOIN users ON sent_email_v2s.user_id = users.id
WHERE "sent_email_v2s"."status" = 1
AND "people"."person_type" = 0
AND (sent_email_v2s.sequence_id = 1888) AND (sent_email_v2s.sent >= '2016-03-18')
AND "users"."team_id" = 1
When I run EXPLAIN ANALYZE on it, I get:
Then, if I change that to the following (Just removing the (sent_email_v2s.sent >= '2016-03-18')) as follows:
SELECT DISTINCT ON (analytics_staging_v2s.event_type, sent_email_v2s.recipient, sent_email_v2s.sent) sent_email_v2s.id, sent_email_v2s.user_id, analytics_staging_v2s.event_type, sent_email_v2s.campaign_id, sent_email_v2s.recipient, sent_email_v2s.sent, sent_email_v2s.stage, sent_email_v2s.sequence_id, people.role, people.company, people.first_name, people.last_name, sequences.name as sequence_name
FROM "sent_email_v2s"
LEFT JOIN analytics_staging_v2s ON sent_email_v2s.id = analytics_staging_v2s.sent_email_v2_id
JOIN people ON sent_email_v2s.person_id = people.id
JOIN sequences on sent_email_v2s.sequence_id = sequences.id
JOIN users ON sent_email_v2s.user_id = users.id
WHERE "sent_email_v2s"."status" = 1
AND "people"."person_type" = 0
AND (sent_email_v2s.sequence_id = 1888) AND "users"."team_id" = 1
when I run EXPLAIN ANALYZE on this query, the results are:
EDIT:
The results above from today are about as I expected. When I ran this last night, however, the difference created by including the timestamp filter was about 100x slower (0.5s -> 59s). The EXPLAIN ANALYZE from last night showed all of the time increase to be attributed to the first unique/sort operation in the query plan above.
Could there be some kind of caching issue here? I am worried now that there might be something else going on (transiently) that might make this query take 100x longer since it happened at least once.
Any thoughts are appreciated!

DB2 V9 ZOS - Performance tuning

Background
Currently I am using DB2 V9 version. One of my stored procedure is taking time to execute. I looked BMC apptune and found the following SQL.
There are three tables we were using to execute the following query.
ACCOUNT table is having 3413 records
EXCHANGE_RATE is having 1267K records
BALANCE is having 113M records
Someone has added recently following piece of code in the query. I think because of this we had a problem.
AND (((A.ACT <> A.EW_ACT)
AND (A.EW_ACT <> ' ')
AND (C.ACT = A.EW_ACT))
OR (C.ACT = A.ACT))
Query
SELECT F1.CLO_LED
INTO :H :H
FROM (SELECT A.ACT, A.BNK, A.ACT_TYPE,
CASE WHEN :H = A.CUY_TYPE THEN DEC(C.CLO_LED, 21, 2)
ELSE DEC(MULTIPLY_ALT(C.CLO_LED, COALESCE(B.EXC_RATE, 0)), 21, 2)
END AS CLO_LED
FROM ACCOUNT A
LEFT OUTER JOIN EXCHANGE_RATE B
ON B.EFF_DATE = CURRENT DATE - 1 DAY
AND B.CURCY_FROM = A.CURNCY_TYPE
AND B.CURCY_TO = :H
AND B.STA_TYPE = 'A'
, BALANCE C
WHERE A.CUSR_ID = :DCL.CUST-ID
AND A.ACT = :DCL.ACT
AND A.EIG_RTN = :WS-BNK-ID
AND A.ACT_TYPE = :DCL.ACT-TYPE
AND A.ACT_CAT = :DCL.ACT-CAT
AND A.STA_TYPE = 'A'
AND (((A.ACT <> A.EW_ACT)
AND (A.EW_ACT <> ' ')
AND (C.ACT = A.EW_ACT))
OR (C.ACT = A.ACT))
AND C.BNK = :WS-BNK-ID
AND C.ACT_TYPE = :DCL.ACT-TYPE
AND C.BUS_DATE = :WS-DATE-FROM) F1
WITH UR
There's a number of wierd things going on in this query. The most twitchy of which is mixing explicit joins with the implicit-join syntax; frankly, I'm not certain how the system interprets it. You also appear to be using the same host-variable for both input and output; please don't.
Also, why are your column names so short? DB2 (that version, at least) supports column names that are much longer. Please save people's sanity, if at all possible.
We can't completely say why things are slow - we may need to see access plans. In the meantime, here's your query, restructured to what may be a faster form:
SELECT CASE WHEN :inputType = a.cuy_type THEN DEC(b.clo_led, 21, 2)
ELSE DEC(MULTIPLY_ALT(b.clo_led, COALESCE(c.exc_rate, 0)), 21, 2) END
INTO :amount :amountIndicator -- if you get results, do you need the indiciator?
FROM Account as a
JOIN Balance as b -- This is assumed to not be a 'left', given coalesce not used
ON b.bnk = a.eig_rtn
AND b.act_type = a.act_type
AND b.bus_date = :ws-date-from
AND ((a.act <> a.ew_act -- something feels wrong here, but
AND a.ew_act <> ' ' -- without knowing the data, I don't
AND c.act = a.ew_act) -- want to muck with it.
OR c.act = a.act)
LEFT JOIN Exchange_Rate as c
ON c.eff_date = current_date - 1 day
AND c.curcy_from = a.curncy_type
AND c.sta_type = a.sta_type
AND c.curcy_to = :destinationCurrency
WHERE a.cusr_id = :dcl.cust-id
AND a.act = :dcl.act
AND a.eig_rtn = :ws-bnk-id
AND a.act_type = :dcl.act-type
AND a.act_cat = :dcl.act-cat
AND a.sta_type = 'A'
WITH UR
FECTCH FIRST 1 ROW ONLY
A few other notes:
Only specify exactly those columns needed - under certain circumstances, this permits index-only access, where otherwise a followup table-access may be needed. However, this probably won't help here.
COALESCE(c.exc_rate, 0) feels off somehow - if no exchange rate is present, you return an amount of 0, which could otherwise be a valid amount. You may need to return some sort of indicator, or make it a normal join, not an outer one.
Also, try both this version, and possibly a version where host variables are specified in addition to the conditions between tables. The optimizer should be able to automatically commute the values, but may not under some conditions (implementation detail).