AWS Athena Last 60 Days from Today Query - date

I am using the query below and it does not like it.
where captured_date (between to_number(dateadd(month,-3getdate()),'99999999') and to_number(dateadd(month,-1,getdate()),'99999999'))
Error Message: line 17:22: mismatched input 'between'. Expecting: <expression>
I am new to using Athena and I am not sure why it's not liking the queries I normally use.
I used the query below.
where captured_date (between to_number(dateadd(month,-3getdate()),'99999999') and to_number(dateadd(month,-1,getdate()),'99999999'))
I was expecting the last three months of data to be returned.

Related

Quicksight dataset incomplete after refresh

I have a dataset in Quicksight that is connected to a Redshift table but pulls from SPICE. The dataset is scheduled to refresh from Redshift to SPICE daily. It is a small table, I am using only a fraction of my SPICE capacity, and this method has been working fine for almost two years.
For some reason, suddenly the SPICE table it not refreshing completely & I can't figure out why.
There are 183 records in the table, but there are only 181 records in the SPICE dataset. If I query the table using SQL Workbench/J I get 183 recs but only 181 in the QS dataset.
I have tried to refresh multiple times & have also set the dataset to query directly to bypass SPICE and still cannot get those other two rows returned.
Nothing has changed in our permissions or anything about the Redshift-Quicksight IAM config.
Any ideas about what could possible be going on here?
Thanks for any help!
UPDATE: As I mentioned, if I select * from the table with SQL Workbench/J, I get the 183 rows that I expect. However, if I select * directly from the AWS query editor v2, I only get 181 rows. Can anyone explain to me what is causing this discrepancy?
SOLVED: The difference is that my processing now requires a COMMIT, where it did not require the COMMIT statement before.

Inaccurate COUNT DISTINCT Aggregation with Date dimension in Google Data Studio

When I aggregate values in Google Data Studio with a date dimension on a PostgreSQL Connector, I see buggy behaviour. The symptom is that performing COUNT(DISTINCT) returns the same value as COUNT():
My theory is that it has something to do with the aggregation on the data occurring after the count has already happened. If I attempt the exact same aggregation on the same data in an exported CSV instead of directly from a PostgreSQL Connector Data Source, the issue does not reproduce:
My PostgreSQL Connector is connecting to Amazon Redshift (jdbc:postgresql://*******.eu-west-1.redshift.amazonaws.com) with the following custom query:
SELECT
userid,
submissionid,
date
FROM mytable
Workaround
If I stop using the default date field for the Date Dimension and aggregate my own dates directly in within the SQL query (date_byweek), the COUNT(DISTINCT) aggregation works as expected:
SELECT
userid,
submissionid,
to_char(date,'YYYY-IW') as date_byweek
FROM mytable
While this workaround solves my immediate problem, it sucks because I miss out on all the date functionality provided by Data Studio (Hierarchy Drill Down, Date Range filtering, etc.). Not to mention reducing my confidence at what else may be "buggy" within the product 😞
How to Reproduce
If you'd like to re-create the issue, using the following data as a PostgreSQL Data Source should suffice:
> SELECT * FROM mytable
userid submissionid
-------- -------------
1 1
2 2
1 3
1 4
3 5
> COUNT(DISTINCT userid) -- ERROR: Returns 5 when data source is PostgreSQL
> COUNT(DISTINCT userid) -- EXPECTED: Returns 3 when data source is CSV (exported from same PostgreSQL query above)
I'm happy to report that as of Sep 17 2020, there's a workaround.
DataStudio added the DATETIME_TRUNC function (see here https://support.google.com/datastudio/answer/9729685?), that allows you to add a custom field that truncs the original date to whatever granularity you want, without causing the distinct bug.
Attempting to set the display granularity in the report still causes the bug (i.e., you'll still set Oct 1 2020 12:00:00 instead of Oct 2020).
This can be solved by creating a SECOND custom field, which just returns the first, and then you can add IT to the report, change the display granularity, and everything will work OK.
I have the same issue with MySQL Connector. But my problem is solved, when I change date field format in DB from DATETIME (YYYY-MM-DD HH:MM:SS) to INT (Unixtimestamp). After connection this table to the Googe Datastudio I set type for this field as Date (YYYYMMDD) and all works, as expected. Hope, this may help you :)
In this Google forum there is a curious solution by Damien Choizit that involves combining your data source with itself. It works well for me.
https://support.google.com/datastudio/thread/13600719?hl=en&msgid=39060607
It says:
I figured out a solution in my case: I used a Blend Data joining twice the same data source with corresponding join key(s), then I specified a data range dimension only on the left side and selected the columns I wanted to CTD aggregate as "dimensions" (and not metric!) on the right side.

$P{LoggedInUsername} and data from a user in a WITH-clause

I am using JasperSoft Reports v.6.2.1 and when running a report within the Studio preview the output comes after 2 seconds.
Running the same report (output xlsx) on the server takes > half a minute - though there is no data volume issue (crosstab, 500 lines, 17 columns in excel, "ignore pagination" = true).
I am using $P{LoggedInUsername} to filter data within the WHERE-part of a WITH-clause (based on the user's rights), run the report and realized, when using a fixed value (the user's id as a string) instead of the parameter in the query, the report execution speed is good.
Same against Oracle DB from SQL Developer - the query resultset with a user's id string is back in 2 sec.
Also the output of $P{LoggedInUsername} in a TextField produces a String.
Once switching back to the $P{LoggedInUsername}-parameter in the query, the report takes ages again or runs out of heap memory in the Studio/server.
What could be the issue?
Finally my problem was solved using the expression user_id = '$P!{LoggedInUsername}' instead of $P{LoggedInUsername} in the WHERE-part of my query.

KDB+/Q query too heavy to handle

I want to grab data from a KDB data base for a list of roughly 200 days within the last two years. The 200 days are in no particular pattern.
I only need the data from 09:29:00.000 to 09:31:00.000 everyday.
My first approach was to query all of the last two years data that have time stamp between 09:29:00.000 and 09:31:00.000, because I didn't see a way to just query the particular 200 days that I need.
However this proved to be too much for my server to handle.
Then I tried to summarize the 2 minute data for each date into an average and just print out the average, so now I will only have 200 rows of data as output. But somehow this still turns out to be too much. I'm not sure if this is because I'm not selecting the data correctly.
My other suspicion is that the query is garbing all the data first then averaging each date, which means averaging is not making it easier to handle.
Here's the code that I have:
select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty) by date from dms where date within(2015.01.01, 2016.06,10), time within(09:29:00.000, 09:31:00.000), sym = `ZF6
poms is the table that the data is in
ZFU6 is the symbol that im looking for
I tried adding the key word distinct after select.
I want to know if there's anyway to break up the query, or make the query lighter for the server to handle.
Thank you!
If you use 32-bit kdb+ and get infamous 'wsfull error then you may try processing one day at a time like this:
raze{select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty)
from dms where date=x,sym=`ZF6,time within 09:29:00.000 09:31:00.000}each 2015.01.01+1+til 2016.06.10-2015.01.01

User-defined date range for Access query pulling outside date range

I have an Access 2007 database that requires a query be run every week to gather every record that was worked on the prior week. The current setup is:
I have a query where the date range's WHERE criteria is: Between [Forms]![frm_Menu]![txt_fromdate] And [Forms]![frm_Menu]![txt_todate]+"1"
"fromdate" and "todate" text boxes in frm_Menu are unbound text boxes. "fromdate" has an AfterUpdate event that fills in "todate" with the date chosen in "fromdate" plus 6 days.
frm_Menu has a button for running the query after "fromdate" and "todate" are filled in.
The issue is, when I run the query for a week's worth of records, I get entries outside the selected date range. For example, if I have "3/1/2015" in the "fromdate" text box and "3/7/2015" in the "todate" text box, I'll receive the results from 3/1 to 3/7, but I'm also getting results from 3/10, 3/11, and 3/12.
At first, I thought it might be reading "3/1/2015" as "3/1x/2015," but that doesn't explain why I'm ONLY getting extra results from 3/10 through 3/12 and not 3/13 through 3/19 as well.
Does anyone know what might be causing this? To work around this problem, I've just been running a query that gathers EVERYTHING and then filtering out what's need in Excel before sending it over. Ideally, I'd like for the person who needs this report to be able to open the database themselves, pick the date range they need, and then export the query results from Access.
It turned out I had the data type for the Dates column as Text instead of Date/Time. Retried the same query after changing the column to Date/Time and it worked perfectly. Just a PEBKAC error.