Orient DB - How to delete with a match statement?

Orient DB - How to delete with a match statement? - orientdb

Is there a good way to delete using a match statement that uses traversal?
i'm trying a delete statement like this, where it grabs students who belong in classroom 999. anyway it doesn't seem to work.
delete vertex from (select
from
(match
{class: student, as: student}
.in(){class: classroom, as: classroom, while:(true), where: (id = 999)}
return student))
This doesn't work as a subquery.

I originally got it working with a traverse. Then I noticed it wasn't returning back as a "node". Notice that i had to "expand" the student returned!
delete vertex from (select expand(student)
from
(match
{class: student, as: student}
.in(){class: classroom, as: classroom, while:(true), where: (id = 999)}
return student))

Related

Ecto JOIN complications for append-only table query

I'm trying to query an Ecto table with append-only semantics, so I'd like the most recent version of a complete row for a given ID. The technique is described here, but in short: I want to JOIN a table on itself with a subquery that fetches the most recent time for an ID. In SQL this would look like:
SELECT r.*
FROM rules AS r
JOIN (
SELECT id, MAX(inserted_at) AS inserted_at FROM rules GROUP BY id
) AS recent_rules
ON (
recent_rules.id = r.id
AND recent_rules.inserted_at = r.inserted_at)
I'm having trouble expessing this in Ecto. I tried something like this:
maxes =
from(m in Rule,
select: {m.id, max(m.inserted_at)},
group_by: m.id)
from(r in Rule,
join: m in ^maxes, on: r.id == m.id and r.inserted_at == m.inserted_at)
But trying to run this, I hit a restriction:
queries in joins can only have where conditions in query
suggesting maxes must just be a SELECT _ FROM _ WHERE form.
If I try switching maxes and Rule in the JOIN:
maxes =
from(m in Rule,
select: {m.id, max(m.inserted_at)},
group_by: m.id)
from(m in maxes,
join: r in Rule, on: r.id == m.id and r.inserted_at == m.inserted_at)
then I'm not able to SELECT the whole row, just id and MAX(inserted_at).
Does anyone know how to do this JOIN? Or a better way to query append-only in Ecto? Thanks 🙂

Doing m in ^maxes is not running a subquery but either query composition (if in a from) or converting the query to a join (in a join). In both cases, you are changing the same query. Given your initial query, I believe you want subqueries.
Also note that a subquery requires the select to return a map, so we can refer to the fields later on. Something along these lines should work:
maxes =
from(m in Rule,
select: %{id: m.id, inserted_at: max(m.inserted_at)},
group_by: m.id)
from(r in Rule,
join: m in ^subquery(maxes), on: r.id == m.id and r.inserted_at == m.inserted_at)
PS: I have pushed a commit to Ecto that clarifies the error message in cases like yours.
invalid query was interpolated in a join.
If you want to pass a query to a join, you must either:
1. Make sure the query only has `where` conditions (which will be converted to ON clauses)
2. Or wrap the query in a subquery by calling subquery(query)

Scala slick: filter using "in" on multiple columns

Suppose I have the following table structure:
create table PEOPLE (
ID integer not null primary key,
NAME varchar(100) not null
);
create table CHILDREN (
ID integer not null primary key,
PARENT_ID_1 integer not null references PERSON (id),
PARENT_ID_2 integer not null references PERSON (id)
);
and that I want to generate a list of the names of each person who is a parent. In slick I can write something like:
for {
parent <- people
child <- children if {
parent.id === child.parent_id_1 ||
parent.id === child.parent_id_2
}
} yield {
parent.name
}
and this generates the expected SQL:
select p.name
from people p, children c
where p.id = c.parent_id_1 or p.id = c.parent_id_2
However, this is not optimal: the OR part of the expression can cause horrendously slow performance in some DBMSes, which end up doing full-table scans to join on p.id even though there's an index there (see for example this bug report for H2). The general problem is that the query planner can't know if it's faster to execute each side of the OR separately and join the results back together, or simply do a full table scan [2].
I'd like to generate SQL that looks something like this, which then can use the (primary key) index as expected:
select p.name
from people p, children c
where p.id in (c.parent_id_1, c.parent_id_2)
My question is: how can I do this in slick? The existing methods don't seem to offer a way:
ColumnExtensionMethods.in takes Query as a parameter, but I don't have a Query I have a number or Rep[Long] for each of my ID columns
ColumnExtensionMethods.inSet is for binding existing (known) literal arrays, not for joining to sets of columns
What I'd like to be able to write is something like this:
for {
parent <- people
child <- children
if parent.id in (child.parent_id_1, child.parent_id_2)
} yield {
p.name
}
but that's not possible right now.
[1] My actual design is a little more complex than this, but it boils down to the same problem.
[2] Some DBMSes do have this optimisation for simple cases, e.g. OR-expansion in Oracle.

Turns out this isn't currently (as at slick 3.2.3) possible, so I've raised an issue on github and submitted a pull request to add this functionality.

OrientDB: select edge where out=(select ??) does not work

I have a problem. I think that this is supposed to work, otherwise someone else would have run into this problem.
The following command works perfectly:
// suppose my record id is #10:0
select from MyEdgeType where out=#10:0
This works.
select from MyNodeType where name="this"
> returns obj with #rid = #10:0
The following does not work:
select from MyEdgeType where out=(select from MyNodeType where name="this")
select from MyEdgeType where out=(select #rid from (select from MyNodeType where name="this")
select from MyEdgeType let $rec = (select fcom MyNodeType...) where out=$rec.rid
... etc.
Nothing works. Nothing. How do I select from edges such that I do not have to know the record id which is incident to the edges I would like to grab ahead of time?

You're comparing a single field on a resultset (it's like comparing a string to an array), try something like this:
select from MyEdgeType where out IN (select from MyNodeType where name="this")

I got this to work.
Since my nodes are unique (this is a constraint), I used the unique property to ID them during the filtration, rather than the record id from a subquery:
select from MyEdgeType where out.unique_identifier=...
worked.

Using functions with orientdb select query with edge filter

Schema
Customer -> (Edge)Ownes -> Vehicle {vehicle_number}
tried to query the customer record who "Ownes" a vehicle by its number like below and it worked. (both 'in' and 'contains' worked fine)
select from Customer where "KL-01-B-8898" in out("Ownes").vehicle_number
I want to do the same query but using a case insensitive search, like below, but returned '0' records
select from Customer where "kl-01-b-8898" in out("Ownes").vehicle_number.toLowerCase()
I changed the query like below and it returned the rows. Is it possible to use functions like 'toLowerCase' in the queries like above, with out sub select ?
select from Customer where #rid in (select in("Ownes").#rid from Vehicle where vehicle_number.toLowerCase() ="kl-01-b-8898")

You can use this:
select from Customer
let $a= ( select number.toUpperCase() from (select out("Ownes").vehicle_number as number from $parent.$current unwind number))
where "KL-01-B-8898" in first($a).number
This doesn't work:
select from Customer where "kl-01-b-8898" in out("Ownes").vehicle_number.toLowerCase()
because
out("Ownes").vehicle_number
return a list of String
This works:
select from Customer where #rid in (select in("Ownes").#rid from Vehicle where vehicle_number.toLowerCase() ="kl-01-b-8898")
because vehicle_number is a String
See the documentation: http://orientdb.com/docs/last/SQL-Methods.html#bundled-methods

Select most reviewed courses starting from courses having at least 2 reviews

I'm using Flask-SQLAlchemy with PostgreSQL. I have the following two models:
class Course(db.Model):
id = db.Column(db.Integer, primary_key = True )
course_name =db.Column(db.String(120))
course_description = db.Column(db.Text)
course_reviews = db.relationship('Review', backref ='course', lazy ='dynamic')
class Review(db.Model):
__table_args__ = ( db.UniqueConstraint('course_id', 'user_id'), { } )
id = db.Column(db.Integer, primary_key = True )
review_date = db.Column(db.DateTime)#default=db.func.now()
review_comment = db.Column(db.Text)
rating = db.Column(db.SmallInteger)
course_id = db.Column(db.Integer, db.ForeignKey('course.id') )
user_id = db.Column(db.Integer, db.ForeignKey('user.id') )
I want to select the courses that are most reviewed starting with at least two reviews. The following SQLAlchemy query worked fine with SQlite:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \ .order_by(func.count(models.Review.course_id).desc()).all()
But when I switched to PostgreSQL in production it gives me the following error:
ProgrammingError: (ProgrammingError) column "review.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT review.id AS review_id, review.review_date AS review_...
^
'SELECT review.id AS review_id, review.review_date AS review_review_date, review.review_comment AS review_review_comment, review.rating AS review_rating, review.course_id AS review_course_id, review.user_id AS review_user_id, count(review.course_id) AS count_1 \nFROM review GROUP BY review.course_id \nHAVING count(review.course_id) > %(count_2)s ORDER BY count(review.course_id) DESC' {'count_2': 1}
I tried to fix the query by adding models.Review in the GROUP BY clause but it did not work:
most_rated_courses = db.session.query(models.Review, func.count(models.Review.course_id)).group_by(models.Review.course_id).\
having(func.count(models.Review.course_id) >1) \.order_by(func.count(models.Review.course_id).desc()).all()
Can anyone please help me with this issue. Thanks a lot

SQLite and MySQL both have the behavior that they allow a query that has aggregates (like count()) without applying GROUP BY to all other columns - which in terms of standard SQL is invalid, because if more than one row is present in that aggregated group, it has to pick the first one it sees for return, which is essentially random.
So your query for Review basically returns to you the first "Review" row for each distinct course id - like for course id 3, if you had seven "Review" rows, it's just choosing an essentially random "Review" row within the group of "course_id=3". I gather the answer you really want, "Course", is available here because you can take that semi-randomly selected Review object and just call ".course" on it, giving you the correct Course, but this is a backwards way to go.
But once you get on a proper database like Postgresql you need to use correct SQL. The data you need from the "review" table is just the course_id and the count, nothing else, so query just for that (first assume we don't actually need to display the counts, that's in a minute):
most_rated_course_ids = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
all()
but that's not your Course object - you want to take that list of ids and apply it to the course table. We first need to keep our list of course ids as a SQL construct, instead of loading the data - that is, turn it into a derived table by converting the query into a subquery (change the word .all() to .subquery()):
most_rated_course_id_subquery = session.query(
Review.course_id,
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
order_by(func.count(Review.course_id).desc()).\
subquery()
one simple way to link that to Course is to use an IN:
courses = session.query(Course).filter(
Course.id.in_(most_rated_course_id_subquery)).all()
but that's essentially going to throw away the "ORDER BY" you're looking for and also doesn't give us any nice way of actually reporting on those counts along with the course results. We need to have that count along with our Course so that we can report it and also order by it. For this we use a JOIN from the "course" table to our derived table. SQLAlchemy is smart enough to know to join on the "course_id" foreign key if we just call join():
courses = session.query(Course).join(most_rated_course_id_subquery).all()
then to get at the count, we need to add that to the columns returned by our subquery along with a label so we can refer to it:
most_rated_course_id_subquery = session.query(
Review.course_id,
func.count(Review.course_id).label("count")
).\
group_by(Review.course_id).\
having(func.count(Review.course_id) > 1).\
subquery()
courses = session.query(
Course, most_rated_course_id_subquery.c.count
).join(
most_rated_course_id_subquery
).order_by(
most_rated_course_id_subquery.c.count.desc()
).all()
A great article I like to point out to people about GROUP BY and this kind of query is SQL GROUP BY techniques which points out the common need for the "select from A join to (subquery of B with aggregate/GROUP BY)" pattern.