How to select the latest entry from database with Ecto/Phoenix - postgresql

I'm just starting with databases and queries (coming from front-end) and I would like to select the latest entry from the fields that contain a specific title, say Page1. I know I have two fields that can help me:
1) created_at - I can order it an select only the latest one;
2) id - I know that every entry generates a new id so I can get the biggest id;
I want to apply it to the "standard" edit controller action:
def edit(conn, %{"id" => id}) do
user = Repo.get(User, id)
changeset = User.update_changeset(user)
render(conn, "edit.html", user: user, changeset: changeset)
end
So I would like to know which method is better and to see an example of the query.

Using the id field depends on your primary key being an integer. This may be the case currently, but will it always be the case? You can be sure that inserted_at will always be a time field - so I would use that. Regardless of which you use, be sure to order it in descending order so that the most recent item is first.
title = "Page 1"
query = Ecto.Query.from(e in Entry,
where: e.title == ^title,
order_by: [desc: e.inserted_at],
limit: 1)
most_recent = Repo.one(query)
It is worth noting that in Ecto - by default the insertion time is captured as inserted_at and not created_at - if you have a created_field then you can replace inserted_at in my example.

It's also possible to accomplish this with Ecto expressions syntax and Ecto.Query.last/2 like so:
alias App.Entry
import Ecto.Query
Entry |> where(title: "Page 1") |> last(:inserted_at) |> Repo.one
See docs for last/2 here!

Related

Aggregation on updating order data in Druid

I have streaming data using Kafka to Druid. It's an eCommerce de-normalized order event data where status and few fields get updated in every event.
I need to do aggregate query based on timestamp with the most updated entry only.
For example: If data sample is:
{"orderId":"123","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:02:33Z"}
{"orderId":"abc","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:03:33Z"}
{"orderId":"123","status":"Shipped","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-07T02:03:33Z"}
Now if I want to query on all orders stuck on "Initiated" status for more than 2 days then for above data it should only show orderId "abc".
But if I query something like
Select orderId,qty,paymentId from order where status = Initiated and WHERE "timestamp" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
This query will return both orders "123" and "abc", but 123 has another event received after 2 days so the previous events should not be included in result.
Is their any good and optimized way to handle this kind of scenarios in Apache druid?
One way I was thinking to use a separate lookup table to store orderId and latest status and perform a join with this lookup and above aggregation query on orderId and status
EDIT 1:
This query works but it joins on whole table, which can give resource limit exception for big datasets:
WITH maxOrderTime (orderId, "__time") AS
(
SELECT orderId, max("__time") FROM inline_data
GROUP BY orderId
)
SELECT inline_data.orderId FROM inline_data
JOIN maxOrderTime
ON inline_data.orderId = maxOrderTime.orderId
AND inline_data."__time" = maxOrderTime."__time"
WHERE inline_data.status='Initiated' and inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
EDIT 2:
Tried with:
SELECT
inline_data.orderID,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderID
HAVING last_status = 1
But gives this error:
Error: Unknown exception
Error while applying rule DruidQueryRule(AGGREGATE), args
[rel#1853:LogicalAggregate.NONE.,
rel#1863:DruidQueryRel.NONE.[](query={"queryType":"scan","dataSource":{"type":"table","name":"inline_data"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/2021-03-14T09:57:05.000Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"lookup("status",'status_as_number')","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"order":"none","filter":null,"columns":["orderID","v0"],"legacy":false,"context":{"sqlOuterLimit":100,"sqlQueryId":"fbc167be-48fc-4863-b3a8-b8a7c45fb60f"},"descending":false,"granularity":{"type":"all"}},signature={orderID:LONG,
v0:STRING})]
java.lang.RuntimeException
I think this can be done easier. If you replace the status to a numeric representation, you can use it more easy.
First use an inline lookup to replace the status. See this page how to define a lookup: https://druid.apache.org/docs/0.20.1/querying/lookups.html
Now, we have for example these values in a lookup named status_as_number:
Initiated = 1
Shipped = 2
Since we now have a numeric representation, you can simply do a group by query and see the max status number. A query like this would be sufficient:
SELECT
inline_data.orderId,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderId
HAVING last_status = 1
Note: this query is not tested. The HAVING part makes sure that you only see orders which are Initiated.
I hope this solves your problem.

SphinxQL - how to filter behind match

I'm working on a project where I use Sphinx searchengine. But - as I realized - the Sphinx documentation is big but hard to understand.
So I was not able to find any information on how to use the WHERE clause to filter behind a MATCH-statement. What I tried yet is:
"SELECT *, country FROM all_gebrauchte_products WHERE MATCH('#searchtext (".$searchQuery.")') AND country='".$where."' ORDER BY WEIGHT() DESC LIMIT ".$page.", ".$limit." OPTION ranker=expr('sum(lcs)')"
If I use it without the country=$where clause, I get back many GUIDs but from different countries. So somehow I have to filter the country column;
If I use the above statement, I get error:
Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42000]: Syntax error or access violation: 1064 index all_gebrauchte_products: parse error: unknown column: country'
But I set the index like this:
sql_query_range = SELECT MIN(gebr_id), MAX(gebr_id) FROM all_gebrauchte_products
sql_range_step = 10000
sql_query = \
SELECT a.gebr_id AS guid, 'products' AS data_type, a.gebr_products AS products, a.gebr_user AS username, a.gebr_date AS datadate, CONCAT(a.gebr_hersteller,' ', a.gebr_modell,' ', a.gebr_ukat,' ', a.gebr_kat,' ', a.gebr_bemerkung) AS searchtext, a.gebr_bild1 AS image1, a.gebr_bild2 AS image2, a.gebr_bild3 AS image3, a.gebr_bild4 AS image4, a.gebr_bild5 AS image5, b.h_land AS country, b.h_web AS weblink, b.h_firmenname AS company, b.h_strasse AS street, b.h_plz AS zipcode, b.h_ort AS city, a.gebr_aktiv AS active \
FROM all_gebrauchte_products a, all_haendler b \
WHERE a.gebr_user = b.h_loginname AND a.gebr_id>=$start AND a.gebr_id<=$end
sql_attr_uint = active
Can anybody tell me what is going wrong? Or how do I have to filter for country?
Thnx. in advance for your help.
Any columns in the sql_query you dont make an ATTRIBUTE, is automatically a FIELD (except the first column is always the document-id).
FIELDs are 'full-text' indexed, they are what you can match in the query - ie the MATCH(...) clause.
ATTRIBUTES are what can be 'filtered' in WHERE, sorted by in ORDER BY, grouped in GROUP BY, or retrieved in the SELECT (or even used in ranking expressions).
So you need country to be an ATTRIBUTE to be able use it in WHERE filter
You don't say but guess it's a string. You can use sql_field_string to make a column BOTH a FIELD and ATTRIBUTE, if you are still interested in being able to full-text query the column too.
(also because its a string, need a very recent version of sphinx. Sphinx only recently gained ability to filter by strings attributes)

excluding rows from resultset in postgres

This is my result set
i am returning this result set on the base of refid using WHERE refid IN.
Over here, i need to apply a logic without any kind of programming (means SQL query only).
if in result set, i am getting period for particular refid then other rows with the same refid must not returned.
for example, 2667105 having period then myid = 612084598 must not get returned in result set.
according to me this can be achieved using CASE but i have no idea how to use it, i mean that i don't know should i use the CASE statement in SELECT statement or WHERE clause...
EDIT:
This is how it suppose to work,
myid = 612084598 is the default row for refid = 2667105 but if specifically wants the refid for period = 6 then it must return all rows except myid = 612084598
but if i am looking for period = 12, for this period no specific refid present in database.. so for this it must return all rows except first one.. means all rows with the refid which is default one..
Not very clear definition of the problem, but try this:
with cte as (
select
*,
first_value(period) over(partition by refid order by myid) as fv
from test
)
select
myid, refid, period
from cte
where period is not null or fv is null
sql fiddle demo

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.

Updating records in Postgres using nested sub-selects

I have a table where I have added a new column, and I want to write a SQL statement to update that column based on existing information. Here are the two tables and the relevant columns
'leagues'
=> id
=> league_key
=> league_id (this is the new column)
'permissions'
=> id
=> league_key
Now, what I want to do, in plain English, is this
Set leagues.league_id to be permissions.id for each value of permissions.league_key
I had tried SQL like this:
UPDATE leagues
SET league_id =
(SELECT id FROM permissions WHERE league_key =
(SELECT distinct(league_key) FROM leagues))
WHERE league_key = (SELECT distinct(league_key) FROM leagues)
but I am getting an error message that says
ERROR: more than one row returned by a subquery used as an expression
Any help for this would be greatly appreciated
Based on your requirements of
Set leagues.league_id to be permissions.id for each value of permissions.league_key
This does that.
UPDATE leagues
SET league_id = permissions_id
FROM permissions
WHERE permissions.league_key = leagues.league_key;
When you do a subquery as an expression, it can't return a result set. Your subquery must evaluate to a single result. The error that you are seeing is because one of your subqueries returns more than one value.
Here is the relevant documentation for pg84: