2-steps query in OrientDB - orientdb

I'm evaluating OrientDB and Neo4j in this simple toy example composed by:
Employees, identified by eid
Meetings, identified by mid and having start and end attributes encoding their start and end DateTime.
Both entities are represented by different classes of vertices, namely Employee and CalendarEvent, which are connected by Involves edges specifying that CalendarEvent-[Involves]->Employee.
My task is to write a query that returns, for each pair of employees, the date/time of their first meeting and the number of meetings they co-attended.
In Cypher I would write something like:
MATCH (e0: Employee)<-[:INVOLVES]-(c:CalendarEvent)-[:INVOLVES]->(e1: Employee)
WHERE e0.eid > e1.eid
RETURN e0.eid, e1.eid, min(c.start) as first_met, count(*) as frequency
I wrote the following query for OrientDB:
SELECT eid, other, count(*) AS frequency, min(start) as first_met
FROM (
SELECT eid, event.start as start, event.out('Involves').eid as other
FROM (
SELECT
eid,
in('Involves') as event
FROM Employee UNWIND event
) UNWIND other )
GROUP BY eid, other
but it seems over-complicated to me.
Does anybody knows if there is an easier way to express the same query?

yes, your query is correct and this is what you have to do in current version (2.1.x).
From 2.2, with MATCH statement (https://github.com/orientechnologies/orientdb-docs/blob/master/source/SQL-Match.md), you will be able to write a query very similar to Cypher version:
select eid0, eid1, min(start) as firstMet, count(*) from (
MATCH {class:Person, as:e0}.in("Involves"){as: meeting}.out("Involves"){as:e1}
return e0.eid as eid0, e1.eid as eid1, meeting.start as start
) group by eid0, eid1
This feature is till in beta, probably in final version you will have more operators in the MATCH statement itself and the query will be even shorter

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

How to extend dynamic schema with views in Hasura and Postgres?

So I am trying and struggling for few days to extend the schema with the custom groupby using something like this
I have a table with few fields like id, country, ip, created_at.
Then I am trying to get them as groups. For example, group the data based on date, hourly of date, or based on country, and based on country with DISTINCT ip.
I am zero with SQLs honestly. But I tried to play around and get what I want. Here's an example.
SELECT Hour(created_at) AS date,
COUNT(*) AS count
FROM session where CAST(created_at AS date) = '2021-04-05'
GROUP BY Hour(created_at)
ORDER BY date;
SELECT country,
count(*) AS count from (SELECT * FROM session where CAST(created_at AS date) <= '2021-05-12' GROUP BY created_at) AS T1
GROUP BY country;
SELECT country, COUNT(*) as count
FROM (SELECT DISTINCT ip, country FROM session) AS T1
GROUP BY country;
SELECT DATE(created_at) AS date,
COUNT(*) AS count
FROM session
GROUP BY DATE(created_at)
ORDER BY date;
Now I am struggling with two things.
How do I make the date as variables? I mean, if I want to group them for a particular date range/ or today's data hourly, or per quarter gap (more of configurable), how do I add the variables in Hasura's Raw SQL?
Also for this approach I have to add schema for each one of them? Like this
CREATE
OR REPLACE VIEW "public"."unique_session_counts_date" AS
SELECT
date(session.created_at) AS date,
count(*) AS count
FROM
session
GROUP BY
(date(session.created_at))
ORDER BY
(date(session.created_at));
Is there a way to make it more generalized? What I mean is, if it
was in Nodejs I could have done something like
return rawQuery(
`
select ${field} x, count(*) y
from ${table}
where website_id=$1
and created_at between $2 and $3
${domainFilter}
${urlFilter}
group by 1
order by 2 desc
`,
params,
);
In this case, based on whatever field and where clause I send, one query would do the trick for me. Can do something similar in hasura?
Thank you so much in advance.
How do I make the date as variables? I mean, if I want to group them for a particular date range/ or today's data hourly, or per quarter gap (more of configurable), how do I add the variables in Hasura's Raw SQL?
My first thought is this. If you're thinking about passing in variables via a GraphQL for example, the GraphQL would look something like:
query MyQuery {
unique_session_counts_date(where: {created_at: {_gte: "<start date here>", _lte: "<end date here>"}}) {
<...any fields, rollups, etc here...>
}
}
The underlying view/query would follow the group by and order by that you've detailed. Then you'd be able to submit a query of the graphql query and just pass in the pertinent parameters like the $1, $2, and $3 in the raqQuery call.
Also for this approach I have to add schema for each one of them?
The schema? The view? I don't think a view specifically would be required, if a multilevel select or similar query can handle it and perform then a view wouldn't particularly be needed.
That's my first stab at the problem. I'm going to try to work through this problem in a few hours via a Twitch stream # HasuraHQ if you can join, happy to walk through it live.

Entity framework split DATETIME to sort by TIME only

I'm running a query on table which contains a DATETIME column where I want to sort results by TIME only and ignore the date. I've put together the following query;
SELECT DISTINCT s.Id, s.SubmittedDate, s.CheckId, s.RestaurantId, s.StaffName, s.CustomerEmail, s.TableNumber
FROM Survey s
ORDER BY DATEPART (hh,s.submittedDate) ASC, DATEPART(mi,s.submittedDate) ASC
The problem with this query is that it generates the error ORDER BY items must appear in the select list if SELECT DISTINCT is specified. However, I cannot add the order by fields to the field list in the query as it doesn't exist on the Survey Entity that Entity Framework maps the results to.
Is there a way to get around this?
Coming from a Java background, I assumed that Entity Framework would complain if the results returned fields that did not exist on the entity. I tried it and Entity Framework does not complain and it works as expected.
My final working query is;
SELECT DISTINCT s.Id, s.SubmittedDate, s.CheckId, s.RestaurantId, s.StaffName, s.CustomerEmail, s.TableNumber, DATEPART(hh, s.submittedDate) as hrs, DATEPART(mi, s.submittedDate) as mins
FROM Survey s
ORDER BY hrs, mins ASC

Druid query to get "latest" value from third column

I have a table in Druid, something like
Timestamp || UserId || Action
And I need to get the latest Action for each UserId. In MySQL I would do something like
Select * from users u1 inner join (
select UserId, max(Timestamp) as maxt from users group by UserId
) u2
on u1.UserId = u2.UserId and u1.Timestamp = u2.maxt
But Druid can't do joins and only very basic sub-selects.
I know the "right" answer is probably to denormalize the data at ingestion time, but unfortunately that's not an option as I don't "own" the ingestion part.
The only solution I have come up with so far is to retrieve all the results for both queries in Java code and do the join manually, but I will run into memory constraints when the dataset grows I would imagine.
I tried to look at materialized views, but that looks like it's still incubating and would require a hadoop cluster, so isn't really viable.
I tried to do something like
Select * from users u1 where concat(Timestamp, UserId) in (
select concat(UserId, max(Timestamp)) from users group by UserId
)
But it didn't like that either.
Any suggestions?
LATEST(expr)
Returns the latest value of expr, which must be numeric. If expr
comes from a relation with a timestamp column (like a Druid
datasource) then "latest" is the value last encountered with the
maximum overall timestamp of all values being aggregated. If expr
does not come from a relation with a timestamp, then it is simply the
last value encountered.
https://druid.apache.org/docs/0.20.0/querying/sql.html

Create a query to select two columns; (Company, No. of Films) from the database

I have created a database as part of university assignment and I have hit a snag with the question in the title.
More likely I am being asked to find out how many films each company has made. Which suggests to me a group by query. But I have no idea where to begin. It is only a two mark question but the syntax is not clicking in my head.
My schema is:
CREATE TABLE Movie
(movieID CHAR(3) ,
title CHAR(36),
year NUMBER,
company CHAR(50),
totalNoms NUMBER,
awardsWon NUMBER,
DVDPrice NUMBER(5,2),
discountPrice NUMBER(5,2))
There are other tables but at first glance I don't think they are relevant to this question.
I am using sqlplus10
The answer you need comes from three basic SQL concepts, I'll step through them with you. If you need more assistance to create an answer from these hints, let me know and I can try to keep guiding you.
Group By
As you mentioned, SQL offers a GROUP BY function that can help you.
A SQL Query utilizing GROUP BY would look like the following.
SELECT list, fields, aggregate(value)
FROM tablename
--WHERE goes here, if you need to restrict your result set
GROUP BY list, fields
a GROUP BY query can only return fields listed in the group by statement, or aggregate functions acting on each group.
Aggregate Functions
Your homework question also needs an Aggregate function called Count. This is used to count the results returned. A simple query like the following returns the count of all records returned.
SELECT Count(*)
FROM tablename
The two can be combined, allowing you to get the Count of each group in the following way.
SELECT list, fields, count(*)
FROM tablename
GROUP BY list, fields
Column Aliases
Another answer also tried to introduce you to SQL column aliases, but they did not use SQLPLUS syntax.
SELECT Count(*) as count
...
SQLPLUS column alias syntax is shown below.
SELECT Count(*) "count"
...
I'm not going to provide you the SQL, but instead a way to think about it.
What you want to do is select where the company matches and count the total rows returned. That count is the number of films made by the specified company.
Hope that points you in the right direction.
Select company, count(*) AS count
from Movie
group by company
select * group by company won't work in Oracle.