AS transaction_date created in SELECT but not found in WHERE statement, why? - date

I'm looking to get order data from the past 30 rolling days. The goal, eventually, is to get this to pull some DISTINCTs so I can measure new orders/customers and order/customer churn along with one-time sales (there are some subscription and some onetime products in the database).
For starters, I'm just trying to pull all orders for the past 30 days.
Here's the query.
SELECT
CAST(creation_date_transactions_orders AS DATE) as transaction_date,
email_contact_transactions_orders,
title_transactions_orders,
total_paid_transactions_orders,
status_transactions_orders
FROM
`nla-analytics.NLA_Keap_Keap_Keap.transactions_orders`
WHERE total_paid_transactions_orders IS NOT NULL
AND status_transactions_orders LIKE "PAID"
AND transaction_date BETWEEN today() AND today() - 30
That's my query.
The problem is that BQ isn't recognizing "transaction_date" in the WHERE statement. "Unrecognized name: transaction_date
Why doesn't BQ recognize the field created a few lines previous, and how do I write this correctly?

The alias created at the SELECT is not available in the WHERE in the same statement. From the docs:
The WHERE clause only references columns available via the FROM clause; it cannot reference SELECT list aliases.
You can replicate the cast in WHERE or use a subquery:
WITH transactions AS (
SELECT
CAST(creation_date_transactions_orders AS DATE) as transaction_date,
email_contact_transactions_orders,
title_transactions_orders,
total_paid_transactions_orders,
status_transactions_orders
FROM
`nla-analytics.NLA_Keap_Keap_Keap.transactions_orders`
WHERE
total_paid_transactions_orders IS NOT NULL
AND status_transactions_orders LIKE "PAID"
)
SELECT
*
FROM transactions
WHERE transaction_date BETWEEN today() AND today() - 30

Related

How to execute SELECT DISTINCT ON query using SQLAlchemy

I have a requirement to display spend estimation for last 30 days. SpendEstimation is calculated multiple times a day. This can be achieved using simple SQL query:
SELECT DISTINCT ON (date) date(time) AS date, resource_id , time
FROM spend_estimation
WHERE
resource_id = '<id>'
and time > now() - interval '30 days'
ORDER BY date DESC, time DESC;
Unfortunately I can't seem to be able to do the same using SQLAlchemy. It always creates select distinct on all columns. Generated query does not contain distinct on.
query = session.query(
func.date(SpendEstimation.time).label('date'),
SpendEstimation.resource_id,
SpendEstimation.time
).distinct(
'date'
).order_by(
'date',
SpendEstimation.time
)
SELECT DISTINCT
date(time) AS date,
resource_id,
time
FROM spend
ORDER BY date, time
It is missing ON (date) bit. If I user query.group_by - then SQLAlchemy adds distinct on. Though I can't think of solution for given problem using group by.
Tried using function in distinct part and order by part as well.
query = session.query(
func.date(SpendEstimation.time).label('date'),
SpendEstimation.resource_id,
SpendEstimation.time
).distinct(
func.date(SpendEstimation.time).label('date')
).order_by(
func.date(SpendEstimation.time).label('date'),
SpendEstimation.time
)
Which resulted in this SQL:
SELECT DISTINCT
date(time) AS date,
resource_id,
time,
date(time) AS date # only difference
FROM spend
ORDER BY date, time
Which is still missing DISTINCT ON.
Your SqlAlchemy version might be the culprit.
Sqlalchemy with postgres. Try to get 'DISTINCT ON' instead of 'DISTINCT'
Links to this bug report:
https://bitbucket.org/zzzeek/sqlalchemy/issues/2142
A fix wasn't backported to 0.6, looks like it was fixed in 0.7.
Stupid question: have you tried distinct on SpendEstimation.date instead of 'date'?
EDIT: It just struck me that you're trying to use the named column from the SELECT. SQLAlchemy is not that smart. Try passing in the func expression into the distinct() call.

Calculate previous order date and status in Postgres

I have a simple table of orders, and I need to calculate some stats for each order. Essentially I have a Postgres db with fields:
Order_ID (unique), User_ID, Created_at (date), City, Total
I want to write a query that will generate, for each Order_ID:
1) the Created_at date of the user's most recent order prior to the current Order_ID (so if a customer placed order with Order_ID=200005b on 9/20/14, what is the date of that user's most recent previous order?)
2) another field showing a user's "Status" based on this date, given the following cases:
-- if this is user's first order, Status="new";
-- if most recent previous order date <= 60 days before the given/current order, Status="active";
-- if most recent previous order date > 60 days before the given/current order, Status="reactivated"
I think there's a way to write this query using some nested SELECTS, and maybe a self-join, but I don't know PostgreSQL well enough to understand the ordering of queries. I have been able to generate an "Order_N" field using the following query that I could use to lookup (Order_N)-1 to find the date, but I get stuck once trying to use that in nesting.
SELECT
user_id,
order_id,
created_at,
row_number() over (partition by user_id order by created_at ) as order_n
order by user_id, created_at;
Does anyone have any ideas?

postgres(redshift) query including to_char and group by returns some errors

Im using redshift now.
then Id like to run query like
SELECT to_char(created_at, 'HH24') AS hour , to_char(created_at, 'YYYY-MM-DD HH24') AS tmp FROM log GROUP BY tmp;
this returns error, when I do it in mysql, it seems to be good.
this error is
ERROR: column "log.created_at" must appear in the GROUP BY clause or be used in an aggregate function
when I changed group by clause like "group by created_at", it returns results, but it has duplicated list.
Is is due to redshift?
If you're using a GROUP BY clause, any column in your query must either appear in the clause or you have to specify how you want it to be aggregated.
In your case, you seem to be trying to aggregate your log entries by hour. I suggest using the postgres date manipulation functions, for example:
SELECT created_at::date AS date,
extract('HOUR' FROM created_at) as hour
FROM log
GROUP BY date, hour;

How to get the total count from each date in postgresql?

How to get the total count for particular date by selecting different dates.
For Example:
Record contains from '2014-04-01' to till date. Each date contains multiple records with different IST time.
How to get the total count from each date?
depending on your table structure and result you want, Query should look somewhat like this
SELECT DATE(date_column), COUNT(*)
FROM tablename
WHERE date_column IN (your_date_list)
GROUP BY date(date_column);
Have a look at the following sql (I have not tested)
SELECT date_column, COUNT(*)
FROM tablename
WHERE date_column BETWEEN date_column AND current_date
GROUP BY to_date(date_column::text,'YYYY-MM-DD')

How do I order my query by a field and still group by a subset of that field in db2?

Sorry if the title is confusing. Here is the query I have
Select MONTH(DATE(TIMESTAMP)), SUM(FIELD1), SUM(FIELD2) from TABLE WHERE TIMESTAMP BETWEEN '2009-07-26 00:00:00' AND '2010-02-24 23:59:59' GROUP BY MONTH(DATE(TIMESTAMP))
This will let me get the month number out of the query. The problem is that right now it is sorting the months 1,2,3,4.... when it spans two separate years. I need to be able to sort this query by year then month.
If I add "ORDER BY TIMESTAMP" at the end of my query I get this error:
Column TIMESTAMP or expression in SELECT list not valid. SQLCODE=-122
Also I changed the field names for this question to keep it clear the field isn't actually called TIMESTAMP
You need to group by year then month.:
SELECT YEAR(YourField),
Month(YourField),
SUM(Field1),
SUM(Field2)
FROM Table
WHERE...
GROUP BY
YEAR(YourField),
Month(YourField)
ORDER BY
YEAR(YourField),
Month(YourField)