Getting number of rows using Left Join kql? Function 'row_number' cannot be invoked in current context. Details: the row set must be serialized - row-number

I have the following query:
let p1 = pageViews | where url has "xxx";
p1
| join kind=inner (pageViews
| where url !has "xxx")
on session_Id
| project timestamp1, session_Id1, url1, client_CountryOrRegion1, client_StateOrProvince1, client_City1, user_Id1
It does get users that originated from a certain provider and then looks at which URLs they are going to.
I am now trying to get how many users I got from that provider.
I could just do distinct session_Id and count but what I would like to do is add two columns, first for specific session_id and then increment it when it changes and another one to increment for the number of requests made.
i.e
I tried:
let p1 = pageViews | where url has "project-management";
p1
| join kind=inner (pageViews
| where url !has "project-management")
on session_Id
| project timestamp1, session_Id1, url1, client_CountryOrRegion1, client_StateOrProvince1, client_City1, user_Id1
| extend Rank=row_number(1)
but it gave me
Function 'row_number' cannot be invoked in current context. Details: the row set must be serialized

The records in the output aren't sorted, therefore there's no meaning to row_number().
row_number() only works on serialized records, which you have after using order by, or serialize.
So the solution to your question is to add | serialize before | extend Rank=row_number(1).

Related

ADF - Dataflow, using Join to send new values

there are two tables
tbl_1 as a source data
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5
6 | A00_6
7 | A00_7
tbl_2 as destination. In this table, Submission_id is unique key.
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
tbl_1 as input value and tbl_2 as destination (sink). Expected result is only A00_5, A00_6 & A00_7 sent to tbl_2. So, this picture below is the Join
for AlterRow,
expected ouput
tbl_2
ID | Submission_id
--------------------
1 | A00_1
2 | A00_2
3 | A00_3
4 | A00_4
5 | A00_5 -->(new)
6 | A00_6 -->(new)
7 | A00_7 -->(new)
But, output result from alterRow are all Submission_id. It should be only not equal comparison that has been stated in the alter row condition,
notEquals(DC__Submission_ID_BigInt, SrcStgDestination#{_Submission_ID}).
How to solve this problem in Azure DataFlow use 'Join' ?
I tried doing the same procedure and got the same result (all rows getting inserted). We were able to perform join in the desired way but couldn’t proceed further to get the required output. You can use the approach given below instead, which is achieved using JOINS.
In general, when we want to get records from table1 which are not present in table2, we execute the following query (in sql server).
select t1.id,t1.submission_id from t1 left outer join t2 on t1.submission_id = t2.submission_id where t2.submission_id is NULL
In the Dataflow, we were able to achieve the join successfully (same procedure as yours). Now instead using alter row transformation, I used filter transformation (to achieve t2.submission_id is NULL condition). I used the following expression (condition) to filter.
isNull(d1#submission_id) && isNull(d1#id)
Now proceed to configure the sink (tbl_2). The preview would show the records as in the below image.
Publish and run the dataflow activity in your pipeline to get the desired results.

Aggregate function to extract all fields based on maximum date

In one table I have duplicate values ​​that I would like to group and export only those fields where the value in the "published_at" field is the most up-to-date (the latest date possible). Do I understand it correctly as I use the MAX aggregate function the corresponding fields I would like to extract will refer to the max found or will it take the first found in the table?
Let me demonstrate you this on simple example (in real world example I am also joining two different tables). I would like to group it by id and extract all fields but only relating to the max published_at field. My query would be:
SELECT "t1"."id", "t1"."field", MAX("t1"."published_at") as "published_at"
FROM "t1"
GROUP By "t1"."id"
| id | field | published_at |
---------------------------------
| 1 | document1 | 2022-01-10 |
| 1 | document2 | 2022-01-11 |
| 1 | document3 | 2022-01-12 |
The result I want is:
1 - document3 - 2022-01-12
Also one question - why am I getting this error "ERROR: column "t1"."field" must appear in the GROUP BY clause or be used in an aggregate function". Can I use MAX function on string type column?
If you want the latest row for each id, you can use DISTINCT ON. For example:
select distinct on (id) *
from t
order by id, published_at desc
If you just want the latest row in the whole result set you can use LIMIT. For example:
select *
from t
order by published_at desc
limit 1

Select by id and generate column with relationships in array

Essentially what i want to do is to get by id from "Tracks" but i also want to get the relations it has to other tracks (found in table "Remixes").
I can write a simple query that gets the track i want by id, ex.
SELECT * FROM "Tracks" WHERE id IN ('track-id1');
That gives me:
id | dateModified | channels | userId
-----------+---------------------+-----------------+--------
track-id1 | 2019-07-21 12:15:46 | {"some":"json"} | 1
But this is what i want to get:
id | dateModified | channels | userId | remixes
-----------+---------------------+-----------------+--------+---------
track-id1 | 2019-07-21 12:15:46 | {"some":"json"} | 1 | track-id2, track-id3
So i want to generate a column called "remixes" with ids in an array based on the data that is available in the "Remixes" table by a SELECT query.
Here is example data and database structure:
http://sqlfiddle.com/#!17/ec2e6/3
Don't hesitate to ask questions in case anything is unclear,
Thanks in advance
Left join the remixes and then GROUP BY the track ID and use array_agg() to get an array of the remix IDs.
SELECT t.*,
CASE
WHEN array_agg(r."remixTrackId") = '{NULL}'::varchar(255)[] THEN
'{}'::varchar(255)[]
ELSE
array_agg(r."remixTrackId")
END "remixes"
FROM "Tracks" t
LEFT JOIN "Remixes" r
ON r."originalTrackId" = t."id"
WHERE t."id" = 'track-id1'
GROUP BY t."id";
Note that, if there are no remixes array_agg() will return {NULL}. But I figured you rather want an empty array in such a case. That's what the CASE is for.
BTW, providing a fiddle is a nice move of yours! But please also include the code in the original question. The fiddle site might be down (even permanently) and that renders the question useless because of the missing information.
That's a simple outer join with a string aggregation to get the comma separated list:
SELECT t.*,
string_agg(r."remixTrackId", ', ') as remixes
FROM "Tracks" t
LEFT JOIN "Remixes" r ON r."originalTrackId" = t.id
WHERE t.id = 'track-id1'
GROUP BY t.id;
The above assumes that Tracks.id is the primary key of the Tracks table.

Drools - Finding a single matching condition for a table of products ranked by consumers

I have a table displaying information for the top four ratings of produce in a store. I want to be able to find specific products in this rating table. Here is a structure of the table
----------------------------------------------------------------------------
sectId | product_code | product_category | consumer_raniking
10444 | 11222 | PRODUCE | RATING_1
10444 | 45555 | PRODUCE | RATING_1
10444 | 10005 | PR0DUCE | RATING_1
20555 | 11344 | PRODUCE | RATING_2
20555 | 94003 | PRODUCE | RATING_2
... and so on.
I wrote a rule to find inserted products which ins not working the way I want, i.e. to find the targetted fact that was inserted into the table. Here is the rule I put together:
rule "find by product codes rating_1"
when
$product_table: ProductRanking( $rank1: this.getProductCodesRankFirst())
$product1 : Product( this.product_code memberOf $rank1, $product_code: product_code )
$product2 : Product( this.product_code == 10444,this.product_code != $product_code ,$product_code2: product_code)
then
System.out.println("Found Products for product_codes "+$product_code+ " "+$product_code2 ) ;
end
Unfortunately, this returns 3 rows. I inserted into the session the product in row 2 i.e. product with ocde 45555 and it does find row 2. However, ir also brings in row 1 and row3.
I can see why it's doing that. It's because the skus are in the sectId with sectId 10444. However, I want to only bring in the row
that I inserted, which is sectionId(10444), product_code(45555). How can I achieve that?
I solved it by using a global to filter out the extra products. In the first line that brings the rankings, I eliminate the extra-matching products this way:
global ProductHelper productHelper
$product_table: ProductRanking( $rank1: productHelper.getProductCodesRankFirst(),
productCode != productHelper.getProductCodeFruitCategory() && productCode!=
productHelper.productCodeVegetableCategory())
The ProductHelper identifies the product codes I want to eliminate and hence the extra 2 products brought in are ignored, creating a single match. I'm sure there is a better way, but since I'm no expert, this is what I was able to come up with.

Postgresql query results to depend on few rows of same table

I'm working on some application, and we're using postgres as our DB. I don't a lot of experience with SQL at all, and now i encountered a problem, that i can't find answer to.
So here's a problem:
We have privacy settings stored in separate table, and accessibility of each row of data depends on few rows of this privacy table.
Basically structure of privacy table is:
entityId | entityType | privacyId | privacyType | allow | deletedAt
-------------------------------------------------------------------
5 | user | 6 | user | f | //example entry
5 | user | 1 | user_all | t |
In two words, this settings mean, that user id5 allows to have access to his data to everybody except user id6.
So i get available data by query like:
SELECT <some_relevant_fields> FROM <table>
JOIN <join>
WHERE
(privacy."privacyId"=6 AND privacy."privacyType"='user' AND privacy.allow=true)
OR (
(privacy."privacyType"='user_all' AND privacy."deletedAt" IS NOT NULL)
AND
(privacy."privacyType"='user' AND privacy."privacyId"=6 AND privacy.allow!=false)
);
I know that this query is incorrect in this form, but i want you to get idea of what i try to achieve.
So it must check for field with its type/id and allow=true, OR check that user_all is not deleted(deletedAt field is null) and there is no field restricting access with allow=false to this user.
But it seems like postgres is chaining all expressions, so it overrides privacy."privacyType"='user_all' with 'user' at the end of expression, and returns no results, or returns data even if user "blocked", because 'user_all' exist.
Is there a way to write WHERE clause to return result if AND expression is true for 2 different rows, for example in code above: (privacy."privacyType"='user_all' AND privacy."deletedAt" IS NOT NULL) is true for one row AND (privacy."privacyType"='user' AND privacy."privacyId"=6 AND privacy.allow!=false) is true for other, or maybe check for absence of row with this values.
Is this what you want?
select <some_fields> from <table> where
privacyType='user_all' AND deletedAt IS NOT NULL
union
select <some_fields> from <table> where
privacyType='user' AND privacyId=6 AND allow<>'f';
You left join the table with itself and found what element doesnt have a match using the where.
SELECT p1.*
FROM privacy p1
LEFT JOIN privacy p2
ON p1."entityId" = p2."entityId"
AND p1."privacyType" = 'user_all'
AND p1."deletedAt" IS NULL
AND p2."privacyType"='user' AND
AND p2."privacyId"= 6
AND p2.allow!=false
WHERE
p2.privacyId IS NOT NULL