Intersect two queries with different filters - druid

I use Druid for monitoring events in my website.
The data can be represented as follows:
event_id | country | user_id | event_type
================================================
1 | USA | id1 | visit
2 | USA | id2 | visit
1 | Canada | id3 | visit
3 | USA | id1 | click
1 | Canada | id4 | visit
3 | Canada | id3 | click
3 | USA | id2 | click
I also defined an aggregation for counting events.
I made queries to Druid to present data for event_id=3 as follows:
Pay attention that the visits are not related to the event_id.
country | visits | clicks
===============================
USA | 4 | 2
Canada | 3 | 2
Currently I use two queries of topNResults with 2 different filters:
event_type = visit -> to count visits per country regardless of the event id.
event_id = 3
Of course my data is much larger than that and contains many countries.
topNResults api must have threshold param that represents the max amount of results that I want to get as a response.
The problem is if my threshold is smaller than actual results, those two queries might not have the same countries results.
Currently, I merge the overlapping results in my server but I loose some countries results and I display less than my threshold although there are more results.
What can I do to optimize that I will always have the same countries for my threshold (without sending list of countries returned from the first query to the second filter - I tried it and it was very slow)?

Sounds Filtered Aggregator will save you all queries.
Filtered Aggregator aggregate only the values which match the dimension filter.
The following query will do the trick in your case:
After druid will group all events under the countries (because the dimension is country) the aggregator filter will filter all events which it's event ids in (e1,e2) and perform count aggregator on the filtered results.
{
...
"dimension":"country",
...,
"aggregations": [
{
"type" : "filtered",
"filter" : {
"type" : "selector",
"dimension" : "event_id",
"value" : ["1","2"]
"type": "in"
}
"aggregator" : {
"type" : "count",
"name" : "count_countries" }
}
}
]
}
Let's take your table.
event_id | country | user_id | event_type
================================================
1 | USA | id1 | visit
2 | USA | id2 | visit
1 | Canada | id3 | visit
3 | USA | id1 | click
1 | Canada | id4 | visit
3 | Canada | id3 | click
3 | USA | id2 | click
Druid will group the results by country.
country | user_id | event_type | event_id
================================================
USA | id1 | visit | 1
USA | id2 | visit | 2
USA | id1 | click | 1
USA | id2 | click | 3
Canada | id3 | visit | 1
Canada | id4 | visit | 3
Canada | id3 | click | 3
The aggregator filter will remove all event_id=3 because of our filter ("value" : ["1","2"])
country | user_id | event_type | event_id
================================================
USA | id1 | visit | 1
USA | id2 | visit | 2
USA | id1 | click | 1
Canada | id3 | visit | 1
And return the following result (our aggregator is simple count)
country | count
===================
USA | 3
Canada | 1
Enjoy!

Related

Crystal Reports: group by one field, sort by another

I have an "Orders" table:
+---------+-------------+
| OrderID | InvoiceDate |
+---------+-------------+
| 1 | 15/02/2022 |
| 123 | 20/01/2022 |
+---------+-------------+
and a "Rows" table:
+---------+-------+--------+
| OrderID | RowID | Value |
+---------+-------+--------+
| 1 | 1 | 100,00 |
| 1 | 2 | 200,00 |
| 1 | 3 | 50,00 |
| 123 | 1 | 10,00 |
| 123 | 2 | 20,00 |
+---------+-------+--------+
As shown in the example, it may happen that an order with a higher OrderID value has a lower InvoiceDate value.
In my report I would like to show each order, along with the sum of each row's value, ordered by date:
+-------------+---------+--------+
| InvoiceDate | OrderID | Value |
+-------------+---------+--------+
| 20/01/2022 | 123 | 30,00 |
| 15/02/2022 | 1 | 350,00 |
+-------------+---------+--------+
My problem is that in order to create an OrderValue formula field with Sum({Rows.Value}, {Orders.OrderID}), I first need to group by Rows.OrderID
But this way rows are sorted by OrderID, and I don't know how to sort them by date.
Add a group total of Maximum (or minimum, or Average) Order Date by Order ID.
Go to the menu option of Report, Group Sort Expert...
and sort the groups by that total.

Flatten Postgers left join query result with dynamic values into one row

I have two tables products and product_attributs. One Product can have one or many attributs and these are filled by a dynamic web form (name and value inputs) added by the user as needed. For example for a drill the user could decide to add two attributs : color=blue and power=100 watts. For another product it could be 3 or more different attribus and for another it could have no special attributs.
products
| id | name | identifier | identifier_type | active
| ----------|--------------|-------------|------------------|---
| 1 | Drill | AD44 | barcode | true
| 2 | Polisher | AP211C | barcode | true
| 3 | Jackhammer | AJ2133 | barcode | false
| 4 | Screwdriver | AS4778 | RFID | true
product_attributs
|id | name | value | product_id
|----------|--------------|-------------|----------
|1 | color | blue | 1
|2 | power | 100 watts | 1
|3 | size | 40 cm | 2
|4 | energy | electrical | 3
|4 | price | 35€ | 3
so attributs could be anything which are set dynamically by the user. My need is to generate a report on CSV which contain all products with their attributs. Without a good experience in SQL I generated the following basic request :
SELECT pr.name, pr.identifier_type, pr.identifier, pr.active, att.name, att.value
FROM products as pr
LEFT JOIN product_attributs att ON pr.id = att.product_id
as you know the result will contain for the same product as many rows as attributs it has and this is not ideal for reporting. The ideal would be this :
|name | identifier_type | identifier | active | name | value | name | value
|-----------|-----------------|------------|--------|--------|-------|------ |------
|Drill | barcode | AD44 | true | color | blue | power | 100 w
|Polisher | barcode | AP211C | true | size | 40 cm | null | null
|Jackhammer | barcode | AJ2133 | true | energy | elect | price | 35 €
|Screwdriver| barcode | AS4778 | true | null | null | null | null
here I only showed a max of two attributes per product but it could be more if needed. Well I did some research and came across the pivot with crosstab function on Postgres but the problem it requests static values but this does not match my need.
thanks lot for your help and sorry for duplicates if any.
Thanks Laurenz Albe for your help. array_agg solved my problem. Here is the query if someone may be interested in :
SELECT
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active,
ARRAY_TO_STRING(ARRAY_AGG (oa.name || ' = ' || oa.value),', ') attributs
FROM
products pr
LEFT JOIN product_attributs oa ON pr.id = oa.product_id
GROUP BY
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active
ORDER BY
pr.name;

T-SQL Update Query to enter recurring data in groups of three

I have a table of data, it is duplicated twice in the same table to make three sets.
Its "ReferenceID" is the primary key, i want to in a way group the 3 same ReferenceID's and inject these three values "f2f" "NF2F" "Travel" into the row called "Type" in any order but ensure that each ReferenceID only has one of those values.
For Example:
ReferenceID | Type
------------|-------
1 f2f
1 nf2f
1 Travel
2 f2f
2 nf2f
2 Travel
3 f2f
3 nf2f
3 Travel
etc etc...
Is it possible ?
You can do this with a row_number that you mod by the number of groups you have (in your case 3):
declare #t table(RefType varchar(10));
insert into #t values ('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel'),('f2f'),('nf2f'),('Travel');
select (row_number() over (order by RefType) % 3) + 1 as ReferenceID
,RefType
from #t
order by ReferenceID
,RefType;
Output
+-------------+---------+
| ReferenceID | RefType |
+-------------+---------+
| 1 | f2f |
| 1 | nf2f |
| 1 | Travel |
| 2 | f2f |
| 2 | nf2f |
| 2 | Travel |
| 3 | f2f |
| 3 | nf2f |
| 3 | Travel |
+-------------+---------+

Sort partitions and rows inside the partitions

I am following this tutorial:
http://www.postgresqltutorial.com/postgresql-window-function/
I'm looking for a case that is not described in the tutorial and I don't found a solution.
At one moment on the tutorial, this SELECT query is used to display the products grouped by group name and their prices sorted ascending in each group, here is the result :
the request is :
SELECT
product_name,
group_name,
price,
ROW_NUMBER () OVER (
PARTITION BY group_name
ORDER BY
price
)
FROM
products
INNER JOIN product_groups USING (group_id);
I would like to sort the rows by price like in the example AND to sort the partition by descending alphabetical order, like this :
How can modify the request to obtain this result ?
ORDER BY can be followed by a comma-separated list of sort_expressions. Use ASC or DESC to set the sort direction for each expression. ASC (ascending order) is the default sort direction.
Thus, you could use ORDER BY group_name DESC, price:
SELECT
product_name,
group_name,
price,
ROW_NUMBER () OVER (
PARTITION BY group_name
ORDER BY
group_name DESC, price
)
FROM
products
INNER JOIN product_groups USING (group_id);
yields
| product_name | group_name | price | row_number |
|--------------------+------------+---------+------------|
| Kindle Fire | Tablet | 150.00 | 1 |
| Samsung Galaxy Tab | Tablet | 200.00 | 2 |
| iPad | Tablet | 700.00 | 3 |
| Microsoft Lumia | Smartphone | 200.00 | 1 |
| HTC One | Smartphone | 400.00 | 2 |
| Nexus | Smartphone | 500.00 | 3 |
| iPhone | Smartphone | 900.00 | 4 |
| Lenovo Thinkpad | Laptop | 700.00 | 1 |
| Sony VAIO | Laptop | 700.00 | 2 |
| Dell Vostro | Laptop | 800.00 | 3 |
| HP Elite | Laptop | 1200.00 | 4 |

Using HiveQL, how do I pull the row with the highest integer?

I have a table with a few million rows of data that looks like this:
+---------------+--------------+-------------------+
| page | search_term | interactions |
+---------------+--------------+-------------------+
| /mom | pizza | 15 |
| /dad | pizza | 8 |
| /uncle | pizza | 2 |
| /brother | pizza | 7 |
| /mom | pasta | 12 |
| /dad | pasta | 23 |
+---------------+--------------+-------------------+
My goal is to run a HiveQL Query that will return the largest 'interactions' number for each unique page/term combo. For example:
+---------------+--------------+-------------------+
| page | search_term | interactions |
+---------------+--------------+-------------------+
| /dad | pasta | 23 |
| /mom | pizza | 15 |
+---------------+--------------+-------------------+
How would I write this considering that each unique page has hundreds of thousands of search_terms, but I only want to pull the one search_term with the most interactions?
I have tried using max(interactions) and max(struct(interactions, search_term)).col1 but have had no luck. My output is consistently giving me all of the search_terms for each page no matter how many interactions.
Thanks!
Use row_number() analytic function:
select page, search_term, interactions
from
(select page, search_term, interactions,
row_number() over (partition by page order by interactions desc ) rn
)s
where rn = 1;