I'm using Mongoengine in my flask application and I'm trying to filter a simple query.
My document name is Event and it has the following fields:
user (ReferenceDocument)
category (ReferenceDocument)
type (StringField)
start_time (DateField)
My goal is to get the last 30 events of given user, filter it based on category & type and then count the filtered results.
My attempt was:
grades = {}
events = Event.objects(user=user).order_by('-start_time')[:30]
for category in user.categories:
grade = 0
for event_type in eEvents:
grade += (events(category=category, type=event_type.name).count(
with_limit_and_skip=True)) * event_type.value
grades[category.name] = grade
So the problem is that the query (2nd line in code) doesn't limit the results and I get all the events of the user. When I use the count method after filtering the query, it does limit the count to 30.
By the way, any idea of a better way for doing it using mongoengine functionality?
Thanks
Related
I have streaming data using Kafka to Druid. It's an eCommerce de-normalized order event data where status and few fields get updated in every event.
I need to do aggregate query based on timestamp with the most updated entry only.
For example: If data sample is:
{"orderId":"123","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:02:33Z"}
{"orderId":"abc","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:03:33Z"}
{"orderId":"123","status":"Shipped","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-07T02:03:33Z"}
Now if I want to query on all orders stuck on "Initiated" status for more than 2 days then for above data it should only show orderId "abc".
But if I query something like
Select orderId,qty,paymentId from order where status = Initiated and WHERE "timestamp" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
This query will return both orders "123" and "abc", but 123 has another event received after 2 days so the previous events should not be included in result.
Is their any good and optimized way to handle this kind of scenarios in Apache druid?
One way I was thinking to use a separate lookup table to store orderId and latest status and perform a join with this lookup and above aggregation query on orderId and status
EDIT 1:
This query works but it joins on whole table, which can give resource limit exception for big datasets:
WITH maxOrderTime (orderId, "__time") AS
(
SELECT orderId, max("__time") FROM inline_data
GROUP BY orderId
)
SELECT inline_data.orderId FROM inline_data
JOIN maxOrderTime
ON inline_data.orderId = maxOrderTime.orderId
AND inline_data."__time" = maxOrderTime."__time"
WHERE inline_data.status='Initiated' and inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
EDIT 2:
Tried with:
SELECT
inline_data.orderID,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderID
HAVING last_status = 1
But gives this error:
Error: Unknown exception
Error while applying rule DruidQueryRule(AGGREGATE), args
[rel#1853:LogicalAggregate.NONE.,
rel#1863:DruidQueryRel.NONE.[](query={"queryType":"scan","dataSource":{"type":"table","name":"inline_data"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/2021-03-14T09:57:05.000Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"lookup("status",'status_as_number')","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"order":"none","filter":null,"columns":["orderID","v0"],"legacy":false,"context":{"sqlOuterLimit":100,"sqlQueryId":"fbc167be-48fc-4863-b3a8-b8a7c45fb60f"},"descending":false,"granularity":{"type":"all"}},signature={orderID:LONG,
v0:STRING})]
java.lang.RuntimeException
I think this can be done easier. If you replace the status to a numeric representation, you can use it more easy.
First use an inline lookup to replace the status. See this page how to define a lookup: https://druid.apache.org/docs/0.20.1/querying/lookups.html
Now, we have for example these values in a lookup named status_as_number:
Initiated = 1
Shipped = 2
Since we now have a numeric representation, you can simply do a group by query and see the max status number. A query like this would be sufficient:
SELECT
inline_data.orderId,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderId
HAVING last_status = 1
Note: this query is not tested. The HAVING part makes sure that you only see orders which are Initiated.
I hope this solves your problem.
Full Query:
{[tier;company;ccy; startdate; enddate] select Deal_Time, Deal_Date from DEALONLINE_REMOVED where ?[company = `All; 1b; COMPANY = company], ?[tier = `All;; TIER = tier], Deal_Date within(startdate;enddate), Status = `Completed, ?[ccy = `All;1b;CCY_Pair = ccy]}
Particular Query:
where ?[company = `All; 1b; COMPANY = company], ?[tier = `All; 1b; TIER = tier],
What this query is trying to do is to get the viewstate of a dropdown.
If there dropdown selection is "All", that where clause i.e. company or tier is invalidated, and all companies or tiers are shown.
I am unsure if the query above is correct as I am getting weird charts when displaying them on KDB dashboard.
What I would recommend is to restructure your function to make use of the where clause using functional qSQL.
In your case, you need to be able to filter based on certain input, if its "All" then don't filter else filter on that input. Something like this could work.
/Define sample table
DEALONLINE_REMOVED:([]Deal_time:10#.z.p;Deal_Date:10?.z.d;Company:10?`MSFT`AAPL`GOOGL;TIER:10?`1`2`3)
/New function which joins to where clause
{[company;tier]
wc:();
if[not company=`All;wc:wc,enlist (=;`Company;enlist company)];
if[not tier=`All;wc:wc,enlist (=;`TIER;enlist tier)];
?[DEALONLINE_REMOVED;wc;0b;()]
}[`MSFT;`2]
If you replace the input with `All you will see that everything is returned.
The full functional select for your query would be as follows:
whcl:{[tier;company;ccy;startdate;enddate]
wc:(enlist (within;`Deal_Date;(enlist;startdate;enddate))),(enlist (=;`Status;enlist `Completed)),
$[tier=`All;();enlist (=;`TIER;enlist tier)],
$[company=`All;()enlist (=;`COMPANY;enlist company)],
$[ccy=`All;();enlist (=;`CCY_Pair;enlist ccy)];
?[`DEALONLINE_REMOVED;wc;0b;`Deal_Time`Deal_Date!`Deal_Time`Deal_Date]
}
The first part specifies your date range and status = `Completed in the where clause
wc:(enlist (within;`Deal_Date;(enlist;startdate;enddate))),(enlist (=;`Status;enlist `Completed)),
Next each of these conditionals checks for `All for the TIER, COMPANY and CCY_Pair column filtering. It then joins these on to the where clause when a specific TIER, COMPANY or CCY_Pair are specified. (otherwise an empty list is joined on):
$[tier=`All;();enlist (=;`TIER;enlist tier)],
$[company=`All;();enlist (=;`COMPANY;enlist company)],
$[ccy=`All;();enlist (=;`CCY_Pair;enlist ccy)];
Finally, the select statement is called in its functional form as follows, with wc as the where clause:
?[`DEALONLINE_REMOVED;wc;0b;`Deal_Time`Deal_Date!`Deal_Time`Deal_Date]
I have a question on how I can extract data from Moodle based on a parameter thats "greater than" or "less than" a given value.
For instance, I'd like to do something like:
**$record = $DB->get_record_sql('SELECT * FROM {question_attempts} WHERE questionid > ?', array(1));**
How can I achieve this, cause each time that I try this, I get a single record in return, instead of all the rows that meet this certain criteria.
Also, how can I get a query like this to work perfectly?
**$sql = ('SELECT * FROM {question_attempts} qa join {question_attempt_steps} qas on qas.questionattemptid = qa.id');**
In the end, I want to get all the quiz question marks for each user on the system, in each quiz.
Use $DB->get_records_sql() instead of $DB->get_record_sql, if you want more than one record to be returned.
Thanks Davo for the response back then (2016, wow!). I did manage to learn this over time.
Well, here is an example of a proper query for getting results from Moodle DB, using the > or < operators:
$quizid = 100; // just an example param here
$cutoffmark = 40 // anyone above 40% gets a Moodle badge!!
$sql = "SELECT q.name, qg.userid, qg.grade FROM {quiz} q JOIN {quiz_grades} qg ON qg.quiz = q.id WHERE q.id = ? AND qg.grade > ?";
$records = $DB->get_records_sql($sql, [$quizid, $cutoffmark]);
The query will return a record of quiz results with all student IDs and grades, who have a grade of over 40.
I am using mongo's shell and want to do what is basically equivalent to "SQL's select col INTO var" and then use the value of var to look up other rows in the same table or others (Joins). For example, in PL/SQL I will declare a variable called V_Dno. I also have a table called Emp(EID, Name, Sal, Dno). I can access the value of Dno for employee 100 as, "Select Dno into V_Dno from Emp where EID = 100). In MongoDB, when I find the needed employee (using its _id), I end up with a document and not a value (a field). In a sense, I get equivalent to the entire row in SQL and not just a column. I am doing the following to find the given emp:
VAR V_Dno = db.emp.find ({Eid : 100}, {Dno : 1});
The reason I want to do this to traverse from one document into the other using the value of a field. I know I can do it using the DBRef, but I wanted to see if I could tie documents together using this method.
Can someone please shed some light on this?
Thanks.
find returns a cursor that lets you iterate over the matching documents. In this case you'd want to use findOne instead as it directly returns the first matching doc, and then use dot notation to access the single field.
var V_Dno = db.emp.findOne({Eid : 100}, {Dno : 1}).Dno;
Using your query as a starting point:
var vdno = db.emp.findOne({Eid: 100, Dno :1})
This returns a document from the emp collection where the Eid = 100 and the Dno = 1. Now that I have this document in the vdno variable I can "join" it to another collection. Lets say you have a Department collection, a document in the department collection has a manual reference to the _id field in the emp collection. You can use the following to filter results from the department collection based on the value in your variable.
db.department.find({"employee._id":vdno._id})
I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot
SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.