I'm trying to write in Ecto syntax following sql query, how to write subquery after FROM hierarchy, line, it's in from clause, but I doubt if it is possible in Ecto? I wonder if I can perform such query with use of table joins or even lateral joins without performance loss with the same effect?
SELECT routes.id, routes.name
FROM routes
WHERE routes.id IN
(SELECT DISTINCT hierarchy.parent
FROM hierarchy,
(SELECT DISTINCT unnest(segments.rels) AS rel
FROM segments
WHERE ST_Intersects(segments.geom, ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1866349.262143 6886808.978425)', -1), ST_GeomFromText('POINT(1883318.282423 6876413.542579)', -1)), 3857))) AS anon_1
WHERE hierarchy.child = anon_1.rel)
I've stuck on following code:
hierarchy_subquery =
Hierarchy
|> distinct([h], h.parent)
Route
|> select([r], r.id, r.name)
|> where([r], r.id in subquery(hierarchy_subquery))
Schemas:
defmodule MyApp.Hierarchy do
use MyApp.Schema
schema "hierarchy" do
field :parent, :integer
field :child, :integer
field :deph, :integer
end
end
defmodule MyApp.Route do
use MyApp.Schema
schema "routes" do
field :name, :string
field :intnames, :map
field :symbol, :string
field :country, :string
field :network, :string
field :level, :integer
field :top, :boolean
field :geom, Geo.Geometry, srid: 3857
end
end
defmodule MyApp.Segment do
use MyApp.Schema
schema "segments" do
field :ways, {:array, :integer}
field :nodes, {:array, :integer}
field :rels, {:array, :integer}
field :geom, Geo.LineString, srid: 3857
end
end
EDIT I've tested performance of various queries and this below is fastest:
from r in Route,
join: h in Hierarchy, on: r.id == h.parent,
join: s in subquery(
from s in Segment,
distinct: true,
where: fragment("ST_Intersects(?, ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857))", s.geom),
select: %{rel: fragment("unnest(?)", s.rels)}
),
where: s.rel == h.child,
select: {r.id, r.name}
Results:
Planning time: ~0.605 ms Execution time: ~37.232 ms
The same query as above but join replaced by inner_lateral_join for segments subquery:
Planning time: ~1.353 ms Execution time: ~38.518 ms
Subqueries from answer:
Planning time: ~1.017 ms Execution time: ~41.288 ms
I thought that inner_lateral_join would be faster but it isn't. Does anybody know how to speed up this query?
Here is what I would try. I haven't verified it works but it should point to the proper direction:
segments =
from s in Segment,
where: fragment("ST_Intersects(?, ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1866349.262143 6886808.978425)', -1), ST_GeomFromText('POINT(1883318.282423 6876413.542579)', -1)), 3857)))", s.geom),
distinct: true,
select: %{rel: fragment("unnest(?)", s.rel)}
hierarchy =
from h in Hierarchy,
join: s in subquery(segments),
where: h.child == s.rel,
distinct: true,
select: %{parent: h.parent}
routes =
from r in Route,
join: h in subquery(hierarchy),
where: r.top and r.id == h.parent
Things to keep in mind:
Start from the inner query and go to the outer one
To access the result of a subquery, you need to select a map in the subquery
Ecto only allows subqueries in from and join. The good news is that you can usually rewrite "x IN subquery" as a join
You can try to run each query individually and see if they work
Related
I have two databases, the main databases (PostgreSQL) + the statistics database (ClickHouse). Statistics database contains a subpart of data from the main database which is enough for performing calculations. All ids are similar (:binary_id) across both databases. I need to find a way of joining the results obtained from the statistic database with a query to the main database. In terms of pure SQL solution it could be something like this, where VALUES are data obtained from statistics database:
SELECT p0."id",
p0."name",
f1."average_count"
FROM "persons" AS p0
JOIN (VALUES (0.0, '906af2c0-cde2-4996-9a98-bdbf986fe687'::uuid),
(0.2857142857142857, 'aba7c694-3453-4a55-aab9-4b542dbb4ba9'::uuid),
(0.2857142857142857, '2dab3350-6149-4752-a55e-7477a6ad0dd3'::uuid))
as f1 (average_count, user_id)
on f1.user_id = p0.id;
My project actively uses Ecto and has a lot of on-the-fly constructed queries. That's why I cannot just perform pure SQL queries as I post above and should have Ecto based solution. Is there a way to do such a joining with Ecto?
It's not pretty, but you could take advantage of Postgres' UNNEST:
users = [
%{id: "906af2c0-cde2-4996-9a98-bdbf986fe687", average_count: 0.0},
%{id: "aba7c694-3453-4a55-aab9-4b542dbb4ba9", average_count: 0.2857142857142857},
%{id: "2dab3350-6149-4752-a55e-7477a6ad0dd3", average_count: 0.2857142857142857}
]
{ids, average_counts} =
users
|> Stream.map(&{&1.id, &1.average_count})
|> Enum.unzip()
dumped_ids =
for id <- ids do
{:ok, dumped} = Ecto.UUID.dump(id)
dumped
end
query =
from p in Person,
join: f in fragment("SELECT UNNEST(?::uuid[]) AS user_id, UNNEST(?::float[]) AS average_count", ^dumped_ids, ^average_counts),
on: f.user_id == p.id,
select: %{id: p.id, name: p.name, average_count: f.average_count}
Repo.all(query)
Maybe it's not the best way of doing it. I'm no DB expert. But that works for me in IEx.
I'm trying to query an Ecto table with append-only semantics, so I'd like the most recent version of a complete row for a given ID. The technique is described here, but in short: I want to JOIN a table on itself with a subquery that fetches the most recent time for an ID. In SQL this would look like:
SELECT r.*
FROM rules AS r
JOIN (
SELECT id, MAX(inserted_at) AS inserted_at FROM rules GROUP BY id
) AS recent_rules
ON (
recent_rules.id = r.id
AND recent_rules.inserted_at = r.inserted_at)
I'm having trouble expessing this in Ecto. I tried something like this:
maxes =
from(m in Rule,
select: {m.id, max(m.inserted_at)},
group_by: m.id)
from(r in Rule,
join: m in ^maxes, on: r.id == m.id and r.inserted_at == m.inserted_at)
But trying to run this, I hit a restriction:
queries in joins can only have where conditions in query
suggesting maxes must just be a SELECT _ FROM _ WHERE form.
If I try switching maxes and Rule in the JOIN:
maxes =
from(m in Rule,
select: {m.id, max(m.inserted_at)},
group_by: m.id)
from(m in maxes,
join: r in Rule, on: r.id == m.id and r.inserted_at == m.inserted_at)
then I'm not able to SELECT the whole row, just id and MAX(inserted_at).
Does anyone know how to do this JOIN? Or a better way to query append-only in Ecto? Thanks 🙂
Doing m in ^maxes is not running a subquery but either query composition (if in a from) or converting the query to a join (in a join). In both cases, you are changing the same query. Given your initial query, I believe you want subqueries.
Also note that a subquery requires the select to return a map, so we can refer to the fields later on. Something along these lines should work:
maxes =
from(m in Rule,
select: %{id: m.id, inserted_at: max(m.inserted_at)},
group_by: m.id)
from(r in Rule,
join: m in ^subquery(maxes), on: r.id == m.id and r.inserted_at == m.inserted_at)
PS: I have pushed a commit to Ecto that clarifies the error message in cases like yours.
invalid query was interpolated in a join.
If you want to pass a query to a join, you must either:
1. Make sure the query only has `where` conditions (which will be converted to ON clauses)
2. Or wrap the query in a subquery by calling subquery(query)
I'm trying to make a query in PostgreSQL for include results from 2 (or more) tables using left join lateral, and I need to have one record for each record for table entidad_a_ (main table) and all the records from table entidad_b_ must be included in one field generated by array_agg. And in this array, I have to delete duplicate elements and I have to preserve order array in main table.
I need to execute this SQL query:
SELECT entidad_a_._id_ AS "_id", CASE WHEN count(entidadB) > 0 THEN array_agg(DISTINCT entidadB._id,ordinality order by ordinality)
ELSE NULL END AS "entidadB"
FROM entidad_a_ as entidad_a_, unnest(entidad_a_.entidad_b_) WITH ORDINALITY AS u(entidadb_id, ordinality)
LEFT JOIN LATERAL (
SELECT entidad_b_3._id_ AS "_id", entidad_b_3.label_ AS "label"
FROM entidad_b_ as entidad_b_3
WHERE entidad_b_3._id_ = entidadb_id
GROUP BY entidad_b_3._id_
LIMIT 1000 OFFSET 0
) entidadB ON TRUE
GROUP BY entidad_a_._id_
LIMIT 1000 OFFSET 0
But I have errors....
How can I have these results?
Edited:
My error is:
ERROR: function array_agg (integer, bigint) does not exist
SQL state: 42883
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Character: 69
If the query is:
......array_agg (DISTINCT entidadB._id order by ordinality).....
The eror is:
ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
SQL state: 42P10
Character: 110
My problem is the combination of array_agg, DISTINCT, and ORDER by
Solved!! I've created a postgres extension with a custom aggregation.
CREATE AGGREGATE array_agg_dist (anyelement)
(
sfunc = array_agg_transfn_dist,
stype = internal,
finalfunc = array_agg_finalfn_dist,
finalfunc_extra
);
Creating functions and c code for this custom functions.
I have a simple model:
schema "torrents" do
field :name, :string
field :magnet, :string
field :leechers, :integer
field :seeders, :integer
field :source, :string
field :filesize, :string
timestamps()
end
And I want to search based on the name. I added the relevant extensions and indexes to my database and table.
def change do
create table(:torrents) do
add :name, :string
add :magnet, :text
add :leechers, :integer
add :seeders, :integer
add :source, :string
add :filesize, :string
timestamps()
end
execute "CREATE EXTENSION pg_trgm;"
execute "CREATE INDEX torrents_name_trgm_index ON torrents USING gin (name gin_trgm_ops);"
create index(:torrents, [:magnet], unique: true)
end
I'm trying to search using the search term, but I always get zero results.
def search(query, search_term) do
from(u in query,
where: fragment("? % ?", u.name, ^search_term),
order_by: fragment("similarity(?, ?) DESC", u.name, ^search_term))
end
SELECT t0."id", t0."name", t0."magnet", t0."leechers", t0."seeders", t0."source",
t0."filesize", t0."inserted_at", t0."updated_at" FROM "torrents"
AS t0 WHERE (t0."name" % $1) ORDER BY similarity(t0."name", $2) DESC ["a", "a"]
Is something wrong with my search function?
My initial guess is that because you're using the % operator, the minimum limit to match is too high for your queries. This limit defaults to 0.3 (meaning that the strings' trigrams are 30% similar). If this threshold isn't met, no results will be returned.
If that is the issue, this threshold is configurable in a couple of ways. You can either use set_limit (docs here), or set the limit on a per query basis.
The set_limit option can be a bit of a hassle, as it needs to be set per connection every time. Ecto (through db_connection) has an option to set a callback function for after_connect (docs here).
To change the limit per query, you can use the similarity function in the where clause, like this:
def search(query, search_term, limit = 0.3) do
from(u in query,
where: fragment("similarity(?, ?) > ?", u.name, ^search_term, ^limit),
order_by: fragment("similarity(?, ?) DESC", u.name, ^search_term))
end
To start, I would try that with a limit of zero to see if you get any results.
I'm trying to replicate the following query usine Squeryl.
SELECT c.order_number,p.customer,p.base,(
SELECT sum(quantity) FROM "Stock" s where s.base = p.base
) as stock
FROM "Card" c, "Part" p WHERE c."partId" = p."idField";
I have the following code for selecting the Cards and Parts but I cannot see a way to add a sumation into the select clause.
from(cards, parts)((c,p) =>
where(c.partId === p.id)
select(c,p)
Any help is much appreciated!
In Squeryl, you can use any Queryable object in the from clause of your query. So, to create a subquery, something like the following should work for you:
def subQuery = from(stock)(s => groupBy(s.base) compute(sum(s.quantity)))
from(cards, parts, subquery)((c, p, sq) =>
where(c.partId === p.idField and sq.key === p.base)
select(c.orderNumber, p.customer, sq.measures))
Of course the field names may vary slightly, just guessing at the class definitions. If you want the whole object for cards and parts instead of the single fields from the original query - just change the select clause to: select(c, p, sq.measures)