Orientdb query - Find all the connection leading to a vertex - orientdb

I have the following use-case in orientdb.
(V= Vertex, E=Edge)
FacebookAccount(V) -> OwnedBy(E) ->Human(V)
TwitterAccount(V) -> OwnedBy(E) ->Human(V)
TwitterAccount(V) OR FacebookAccount(V) -> FriendOf(E) -> TwitterAccount(V) OR FacebookAccount(V)
I want to see how all the Humans are connected to a specific Human.
1. I dont want to see FacebookAccount or TwitterAccount that are NOT owned by a Human, this should be hidden: 15:2.
2. I dont want to see FacebookAccount or TwitterAccount that are OwenedBy a Human and not leading to another Human. these should be hidden: 17:0, 18:0, 17:1
I have the following query:
traverse in("FriendOf"),out("FriendOf"), in("OwnedBy"),out("OwnedBy") from #19:0 while $depth <= 10
The problem is that this query returns back Accounts that are not owned by anyone, I want to remove that.
Here is the full script to create the database with all the data:
create class FacebookAccount IF NOT EXISTS extends V;
create class TwitterAccount IF NOT EXISTS extends V;
create class Human IF NOT EXISTS extends V;
create class OwnedBy IF NOT EXISTS extends E;
create class FriendOf IF NOT EXISTS extends E;
/* A */
create VERTEX FacebookAccount set user='A_FB';
create VERTEX TwitterAccount set user='A_TW';
create VERTEX Human set person='A';
CREATE EDGE OwnedBy FROM (Select from FacebookAccount where user='A_FB') TO (Select from Human where person='A');
CREATE EDGE OwnedBy FROM (Select from TwitterAccount where user='A_TW') TO (Select from Human where person='A');
/* B */
create VERTEX FacebookAccount set user='B_FB';
create VERTEX TwitterAccount set user='B_TW';
create VERTEX Human set person='B';
CREATE EDGE OwnedBy FROM (Select from FacebookAccount where user='B_FB') TO (Select from Human where person='B');
CREATE EDGE OwnedBy FROM (Select from TwitterAccount where user='B_TW') TO (Select from Human where person='B');
/* C */
create VERTEX FacebookAccount set user='C_FB';
create VERTEX TwitterAccount set user='C_TW';
create VERTEX Human set person='C';
CREATE EDGE OwnedBy FROM (Select from FacebookAccount where user='C_FB') TO (Select from Human where person='C');
CREATE EDGE OwnedBy FROM (Select from TwitterAccount where user='C_TW') TO (Select from Human where person='C');
/* X */
create VERTEX Human set person='X';
create VERTEX FacebookAccount set user='X_FB';
CREATE EDGE OwnedBy FROM (Select from FacebookAccount where user='X_FB') TO (Select from Human where person='X');
/* Y */
create VERTEX FacebookAccount set user='Y_FB';
CREATE EDGE FriendOf FROM (Select from FacebookAccount where user='Y_FB') TO (Select from FacebookAccount where user='X_FB');
CREATE EDGE FriendOf FROM (Select from FacebookAccount where user='X_FB') TO (Select from FacebookAccount where user='C_FB');
CREATE EDGE FriendOf FROM (Select from FacebookAccount where user='A_FB') TO (Select from FacebookAccount where user='B_FB');
CREATE EDGE FriendOf FROM (Select from FacebookAccount where user='B_FB') TO (Select from FacebookAccount where user='C_FB');

Related

Randomly pick N distinct winners with weights for a raffle

I've been trying to find a solution to this problem for a day now.
So, I have a table (raffle_tickets), from which I want to pick N distinct users, with their probability of being picked based on the sum of the number of tickets they bought, as the winners of a raffle and insert the winners into raffle_winners.
Now, I've found a solution on SO to pick 1 winner, but not N (And also it has a slight issue, where if there's, let's say, exactly 1 entry it is totally random whenever it is picked or not, which is not acceptable, obviously).
In that same answer (and others of other questions) I saw cross join being used with generate_series, but from what it looks like it would pick with replacement (e.g. with duplicates, not distinct), and that's not what I want.
I'm using Postgres/PSQL 14.5.
Here's some of the table structure:
/* Table with raffle tickets. Each user might have multiple entries in it for the same raffle */
CREATE TABLE IF NOT EXISTS raffle_tickets (
id SERIAL PRIMARY KEY,
raffle_id BIGINT REFERENCES raffles(id),
user_id BIGINT NOT NULL,
num_tickets INT NOT NULL,
date TIMESTAMP NOT NULL DEFAULT NOW()
);
/* Winners of raffles. Selected based on distinct users and weights from `raffle_tickets` */
CREATE TABLE IF NOT EXISTS raffle_winners (
id SERIAL PRIMARY KEY,
raffle_id BIGINT REFERENCES raffles(id),
user_id BIGINT NOT NULL,
probability FLOAT NOT NULL
CONSTRAINT user_winner_once_per_raffle UNIQUE(raffle_id, user_id) /* One user might not be picked more than once as a winner of a raffle */
);
/* Simplified table, in reality it has more fields */
CREATE TABLE IF NOT EXISTS raffles (
id SERIAL PRIMARY KEY,
num_max_winners INT NOT NULL
);
The code I wrote (below) is based on this answer if anyone is interested.
WITH users_and_weights AS (
SELECT
DISTINCT(user_id),
SUM(num_tickets) AS weight
FROM
raffle_tickets
WHERE
raffle_id=$1
GROUP BY
user_id
), p AS ( /* probability */
SELECT
*,
(weight / SUM(weight) OVER ()) AS probability
FROM
users_and_weights
), cp AS ( /* cumulative probability */
SELECT
*,
SUM(p.probability) OVER (
ORDER BY probability DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cum_probability
FROM
p
), fp AS ( /* final probability */
SELECT
*,
cum_probability - probability AS start_probability,
cum_probability AS end_probability
FROM
cp
)
INSERT INTO
raffle_winners (user_id, raffle_id, probability)
SELECT
user_id,
$1 AS raffle_id,
probability
FROM
fp
WHERE
random() BETWEEN start_probability AND end_probability
LIMIT
(SELECT num_max_winners FROM raffle_data)
You are making this more complicated than necessary.
This is simplified for a single raffle:
with gen_tickets as (
-- Use `generate_series()` to create a row for each ticket
select user_id
from raffle_tickets
cross join lateral generate_series(1, num_tickets)
), shuffle as (
select user_id, row_number() over (order by random()) as rn
from gen_tickets
), min_row as (
-- Limit to one win per user
select user_id, min(rn)
from shuffle
group by user_id
), winner_order as (
select user_id, row_number() over (order by rn) as rn
from min_row
)
select *
from winner_order
where rn <= <num_max_winners>
To those interested, this here's what I ended up using.
(It's not perfect, but I already spent way too much time on it, so if anyone feels like fixing it up, please do.)
(As a side-note, is there a good way to pass query parameters to a function block like this? I'm using asyncpg (python))
DO
$do$
DECLARE
num_uniq_participants INTEGER;
num_max_winners_to_select INTEGER;
BEGIN
num_max_winners_to_select := (
SELECT
num_max_winners
FROM
raffles
WHERE
id={raffle_id}
);
num_uniq_participants := (
SELECT
COUNT(*)
FROM (
SELECT
DISTINCT(user_id)
FROM
raffle_tickets
WHERE
raffle_id={raffle_id}
) AS q
);
IF (num_max_winners_to_select >= num_uniq_participants) THEN
/* There are less participants than the required amount of winners, so everyone is a winner */
INSERT INTO
raffle_winners(user_id, raffle_id, probability)
SELECT
DISTINCT(user_id),
$1 AS raffle_id,
1 AS probability
FROM
raffle_tickets
WHERE
raffle_id={raffle_id};
ELSE
/**
* Pick winners.
* Each iteration the winners are excluded from the
* newly pickable participant list.
**/
/**
* TODO:
* Right now this isn't super efficient, as we always re-calculate
* the weight of each participant in each iteartion.
* For now it's okay, but something to keep in mind in the future.
* (Though, unless there's hunderds of thousands of participants it shouldn't be too bad)
**/
FOR i IN 1..LEAST(num_max_winners_to_select, num_uniq_participants) LOOP
WITH users_and_weights AS (
SELECT
DISTINCT(user_id),
SUM(num_tickets) AS weight
FROM
raffle_tickets rt
WHERE
NOT EXISTS ( /* Don't re-pick winners */
SELECT
1
FROM
raffle_winners rw
WHERE
rw.user_id=rt.user_id AND rw.raffle_id=rt.raffle_id
) AND raffle_id={raffle_id}
GROUP BY
user_id
), p AS ( /* probability */
SELECT
*,
(weight / SUM(weight) OVER ()) AS probability
FROM
users_and_weights
), cp AS ( /* cumulative probability */
SELECT
*,
SUM(p.probability) OVER (
ORDER BY probability DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cum_probability
FROM
p
), fp AS ( /* final probability */
SELECT
*,
cum_probability - probability AS start_probability,
cum_probability AS end_probability
FROM
cp
), const_rnd AS (
/* Must put this into a CTE otherwise it's re-evaluated for
* each row and might cause no entry to be selected at all.
**/
SELECT RANDOM() AS RND
)
INSERT INTO
raffle_winners(user_id, raffle_id, probability)
SELECT
user_id,
$1 AS raffle_id,
probability
FROM
cp
WHERE
(SELECT rnd FROM const_rnd) BETWEEN cum_probability - probability AND cum_probability
LIMIT
1; /* Pick 1 winner / interation */
END LOOP;
END IF;
END
$do$;

Script Activity in Data Factory

I have a Script Activity with the following sql script in my data pipeline as follows:
#concat('ALTER TABLE tbl',replace(pipeline().RunId,'-',''),' ADD Depth int; WITH emp AS ( SELECT *, 1 AS d FROM tbl',replace(pipeline().RunId,'-',''),' WHERE Email = ','EMAIL',' UNION ALL SELECT e.*, emp.d + 1 FROM tbl',replace(pipeline().RunId,'-',''),' e INNER JOIN emp ON e.ReportsToPersonnelNbr = emp.PersonnelNumber), ForUpd as (SELECT PersonnelNumber, d FROM emp) UPDATE tbl',replace(pipeline().RunId,'-',''),' SET Depth = B.d FROM tbl',replace(pipeline().RunId,'-',''),' A JOIN ForUpd B ON A.PersonnelNumber = B.PersonnelNumber')
I see this error on running pipeline:
Operation on target Add Depth Column failed: Invalid column name 'EMAIL'.
What am I missing?
Converting the dynamic content that you have written to SQL script, the query would be as following:
ALTER TABLE <table_name> ADD Depth int; WITH emp AS ( SELECT *, 1 AS d FROM <table_name> WHERE Email =EMAIL UNION ALL SELECT e.*, emp.d + 1 FROM <table_name> e INNER JOIN emp ON e.ReportsToPersonnelNbr = emp.PersonnelNumber), ForUpd as (SELECT PersonnelNumber, d FROM emp) UPDATE <table_name> SET Depth = B.d FROM <table_name> A JOIN ForUpd B ON A.PersonnelNumber = B.PersonnelNumber
According to the error message, the problem is with Email = EMAIL. This is because EMAIL column does not exist. If EMAIL is on of the column values for Email column, then has to be enclosed in single quotes.
Instead of using #concat() and complicating the process to write a query, you can use string interpolation. Using #{...}, you can use dynamic content within a string. Look at the following example.
I am creating a table using string interpolation in the following demonstration.
create table #{variables('table_name')} (id int, name varchar(20))
So, first use set variable activity to store the table name in a my_table variable with the value
#concat('tbl',replace(pipeline().RunId,'-',''))
Now replace your dynamic content in script activity with the following:
ALTER TABLE #{variables('my_table')} ADD Depth int; WITH emp AS ( SELECT *, 1 AS d FROM #{variables('my_table')} WHERE Email ='EMAIL' UNION ALL SELECT e.*, emp.d + 1 FROM #{variables('my_table')} e INNER JOIN emp ON e.ReportsToPersonnelNbr = emp.PersonnelNumber), ForUpd as (SELECT PersonnelNumber, d FROM emp) UPDATE #{variables('my_table')} SET Depth = B.d FROM #{variables('my_table')} A JOIN ForUpd B ON A.PersonnelNumber = B.PersonnelNumber

INSERT INTO a table with serial using SELECT *

In Postgres, I have a table with many columns (e.g. t1(a, b, c, ..., z)). I need to obtain its subset through a select-from-where statement into a new table (e.g. t2), but this new table must have a serial attribute. So, t2 would like t2(id, a, b, c, ..., z), where id the serial attribute. In Postgres, this works:
INSERT INTO t2(a, b, c, d, ..., z)
SELECT *
FROM t1
WHERE <condition>
However, is it possible to achieve the same without writing all the attributes of t1?
You can define a view that is a simple SELECT of all but the serial column.
Such views are updateable in PostgreSQL, so you can use it as the target for your INSERT.
In addition to Laurenz's answer, it's worth noting that you can call the next value for each record in your serial sequence within your insert.
One way you could do it requires that you know the name of your sequence beforehand. By default the naming convention for a serial sequence will be tablename_id_seq where tablename in this case would be t2.
INSERT INTO t2
SELECT
nextval('t2_id_seq')
, t1.*
FROM t1
For more details on dealing with sequences:
Auto-generated sequences will adhere to the pattern ${table}_${column}_seq.
You can find all sequences by running the following queries:
/* Version 10+ */
SELECT
*
FROM pg_sequences -- Not to be confused with `pg_sequence`
WHERE sequencename LIKE '%t2%'
;
/* Version 9.5+ */
-- Returns the sequences associated with a table
SELECT
pg_get_serial_sequence('schema.tablename', 'columnname')
;
-- Returns sequences accessible to the user, not those owned by the user
SELECT
*
FROM information_schema.sequences
WHERE sequence_name LIKE '%t2%'
;
-- Return sequences owned by the current user
SELECT
n.nspname AS sequence_schema,
c.relname AS sequence_name,
u.usename AS owner
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_user u ON u.usesysid = c.relowner
WHERE c.relkind = 'S'
AND u.usename = current_user;
/* Version 8.1+ */
-- Returns sequences accessible to the user, not those owned by the user
SELECT
relname
FROM pg_class
WHERE relkind = 'S' -- 'S' for sequence
;

How do I use CROSS APPLY in this situation?

I have an inline TVF that accept a primary key of a table and computes a value out of the row with that primary key (actually a table with that value as part of the select, but whatever).
Now I want to do something like this:
SELECT something
FROM table1
CROSS APPLY thefunction(table1primarykey) func
ON func.computedvalue = func.computedvalue(table2primarykey)
Problem is I did not use table2 yet, and could not do it, because the only way table1 and table2 are joined is via the same function return value.
How can I do something like this?
How about
SELECT *
FROM (
SELECT *
FROM table1
CROSS APPLY thefunction(table1primarykey)
) AS t1
INNER JOIN (
SELECT *
FROM table2
CROSS APPLY thefunction(table2primarykey)
) AS t2 ON t1.computedvalue = t2.computedvalue

How can I extract the values from a record as individual columns in postgresql

How can I extract the values from a record as individual comuns in postgresql
SELECT
p.*,
(SELECT ROW(id,server_id,format,product_id) FROM products_images pi WHERE pi.product_id = p.id LIMIT 1) AS image
FROM products p
WHERE p.company = 1 ORDER BY id ASC LIMIT 10
Instead of
image
(3, 4, "jpeg", 7)
I would like to have
id | server_id | format | product_id
3 | 4 | jpeg | 7
Is there any way of selecting only one image for each product and return the columns directly instead of a record?
Try this:
create type xxx as (t varchar, y varchar, z int);
with a as
(
select row(table_name, column_name, (random() * 100)::int) x
from information_schema.columns
)
-- cannot cast directly to xxx, should cast to text first
select (x::text::xxx).t, (x::text::xxx).y, (x::text::xxx).z
from a
Alternatively, you can do this:
with a as
(
select row(table_name, column_name, (random() * 100)::int) x
from information_schema.columns
),
-- cannot cast directly to xxx, should cast to text first
b as (select x::text::xxx as w from a)
select
(w).t, (w).y, (w).z
from b
To select all fields:
with a as
(
select row(table_name, column_name, (random() * 100)::int) x
from information_schema.columns
),
-- cannot cast directly to xxx, should cast to text first
b as (select x::text::xxx as w from a)
select
(w).*
from b
You can do this too, but this makes the whole exercise of using ROW a pointless one when you can just remove the ROW function and re-pick it up from outside of cte/derived table. I surmised the OP's ROW came from a function; for which he should use the codes above, not the following:
with a as
(
select row(table_name, column_name, (random() * 100)::int)::xxx x
from information_schema.columns
)
select
(x).t, (x).y, (x).z
from a
Just specify the components of your struct:
SELECT a,b,c,(image).id, (image).server_id, ...
FROM (
SELECT
p.*,
(SELECT ROW(id,server_id,format,product_id) FROM products_images pi WHERE pi.product_id = p.id LIMIT 1) AS image
FROM products p
WHERE p.company = 1 ORDER BY id ASC LIMIT 10
) as subquery
But anyway, I would rework the query and use a join instead of a subclause.
SELECT DISTINCT ON (p.*) p.*,
p.id,pi.server_id,pi.format,pi.product_id
FROM products p
LEFT JOIN product_images pi ON pi.product_id = p.id
WHERE p.company = 1
ORDER BY id ASC
LIMIT 10
But I believe you have to specify all the p-fields in the distinct separately to ensure just one image is loaded per product.
Try this, will work on your existing code with minimal modification(if creating a type is a minimal modification for you ;-)
create type image_type as (id int, server_id int, format varchar, product_id int);
SELECT
p.*,
( (SELECT ROW(id,server_id,format,product_id)
FROM products_images pi
WHERE pi.product_id = p.id LIMIT 1)::text::image_type ).*
FROM products p
WHERE p.company = 1 ORDER BY id ASC LIMIT 10
Proof-of-concept code:
Create type first:
create type your_type_here as (table_name varchar, column_name varchar)
Actual code:
select
a.b,
( (select row(table_name, column_name)
from information_schema.columns limit 1)::text::your_type_here ).*
from generate_series(1,10) as a(b)
But I guess you should tackle it with GROUP BY' andMAXcombo or useDISTINCT ON` like what Daniel have posted
every table has an associated composite type of the same name
https://www.postgresql.org/docs/current/plpgsql-declarations.html#PLPGSQL-DECLARATION-ROWTYPES
So, this code
drop table if exists "#typedef_image"
;
create temp table "#typedef_image"(
id int,
server_id int,
format text,
product_id int
)
;
select (row(3, 4, 'jpeg', 7)::"#typedef_image").*
will work