Azure Data Flow filter distinct rows - azure-data-factory

What I want to achieve is that I have sources which sending me some data, but before saving that data in sink I want to filter that distinct with respect to columns I am not able to find Distinct function in expression functions. Can anyone tell me how to achieve this

Not sure if you still have this problem, I suggest to use the 'Aggregate' component in dataflow, I did a test like below:
in 'Aggregate Settings' we define all the 'Group by' columns and 'Aggregates' columns, the source table have 9 columns in total, and 900 rows in total containing 450 distinct rows plus 450 duplicated rows.
I use max to aggregate the 'ModifiedDate' column, and in sink table there's only 450 distinct rows.

This can be done by manually editing the script (and then linking it together on the UI).
The following snippet does a distinct filtering using all columns:
aggregate(groupBy(mycols = sha2(256,columns())),
each(match(true()), $$ = first($$))) ~> DistinctRows
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-script#distinct-row-using-all-columns

Related

Getting the total number of records in a single knexjs query when using the limit() method

I use knexjs and postgresql. Is it possible in knexjs to get the total of records from the same query in which the limit is used?
For example:
knex.select().from('project').limit(50)
Is it possible to somehow get the total number of records in the same query if there are more than 50?
The question arose due to the fact that my query is much more complex, which uses a lot of subqueries and conditions, and I would not like to make this query twice to get the data in one query and the total number of records (I use the .count() method) from another.
I do not know your obscurification manager (knexjs?) but I would think you should be able to add the window version of the count() function to your select list. In plain SQL something like: Where ... represents your current select list. (see demo)
select ..., count(*) over() total_rows
from project
limit 5;
This works because the window count function counts all rows selected, after all rows selected, but before the LIMIT clause is applied. Note: This adds a column to the result set with the same value in every row.

Filter on Group Expression using lookup to second dataset. SSRS 2012

I have two datasets (one is a stored procedure written by a vendor that cannot change). Stored procedure is the main dataset. I am using a second data set to bring back filtered results. The second dataset contains already filtered records.
Trying to use the Tablix filter:
=Lookup (Fields!UserGUID.Value,Fields!UserGUID.Value,Fields!MembershipPolicyGUID.Value, "Dataset2")
I have a group that needs to display only the records in Dataset2.
Dataset2 is written using a where statement only displaying
MembershipPolicyGUID not in'00000000-0000-0000-0000-000000000000' and '29976BA0-E2D7-494E-A1CE-20E609C76929' (these numbers are stored as text)
I need help in how to filter the records.
I have tried the <> in the Tablix filter using the above expression, but it does not work or rather it does not return any records from Dataset2 except those from Dataset1.
Figured it out. The filter should be:
expression =Lookup (Fields!UserGUID.Value,Fields!UserGUID.Value,Fields!MembershipPolicyGUID.Value, "Dataset2")
then operator >
then value 0

kdb/q: use function in a select from partitioned table

I'm trying to get max drawdown from a partitioned table across multiple dates. The query works fine when run with a date constrained to a specific day. E.g.
select {max neg x-maxs x} pnl from trades where date=last date
It's getting map-reduced over multiple dates so the above query no longer works. I can make the query run over multiple dates by adding another aggregation:
select max {max neg x-maxs x} pnl from trades
but it's not getting the max drawdown from continuous sequence of trades but a maximum of daily drawdowns.
I wonder if there's a way to make it work with a single select without chaining selects like
select {max neg x-maxs x} pnl from select pnl from trades
I've got a rather big query to pull a lot of various metrics on the trades where max drawdown is just one of them. Using chained select means that I need to break the big query into two queries, map-reduced and non-map-reduced, and then join them back which would make the query look ugly.
Thanks!
Select query runs on each date in partition db and apply function to each date values and finally aggregates them depending upon the call (user defined function behaves differently than plain 'q' functions).
So I don't think you can combine that into one query. But there are ways you can look for to make your query more generalized and reusable for different scenarios.
For ex. convert your query to functional form and use variables in that query for column name and user function. Put this in one function which will accept column name and user function. Now you can call this function with different set of (column ;function). Something like :
runF:{[col;usrfunc] funtional_query_uses_col_userfunc }
All this depends on your use cases. Also check for memory usage as you'll be taking lot of data into memory.

See length (count) of query results in workbench

I just started using MySQL Workbench (6.1). The default limit for queries is 1,000 and that's fine I want to keep that.
But the results from the action output message will therefore always say "1000 rows returned".
Is there a setting to see the number of records that would be returned in the query had their been no limit? For sanity checking query results?
I know this is late by a few years, but I think you're asking for a way to see total row count in the bottom of the results pane, like in SQL Server. In SQL Server, you would also go in the messages pane and it would say how many rows were returned. I was actually looking for exactly what you were asking for as well, and seems like there is no way to find that. If you have an ID in your table that is just numeric and is in numeric order, you could order by ID desc and look at the biggest number there. That is what I've decided to do.
The result is not always "1000 rows returned". If there are less records than that you will get the actual count. If you want to know the total number of rows in a table do a select count(*) from table. Alternatively, you can switch off the automatic limit and have all records returned by MySQL Workbench, but that can be time + memory consuming for large tables.
I think removing the row limit will help. By default, MySQL workbench will limit the result set to 1000 rows but you can always disable the limit. Check out https://superuser.com/questions/240291/how-to-remove-1000-row-limit-in-mysql-workbench-queries on how to do that.
You can run a second query to check that
select count(*) from (your original query) as t;
this will return the total rows in actual result.
You can use the SQL count function. It returns the count of the total number of rows a query returns.
A sample query:
select count(*) from tableName where field1 = value1
In workbench, in the dropdown menu at the top, set it to dont limit Then run the query to extract data from table Then under the output pane below, the total count of the query results will be displayed in the message column

Reporting on multiple tables independently in Crystal Reports 11

I am using Crystal Reports Developer Studio to create a report that reports on two different tables, let them be "ATable" and "BTable". For my simplest task, I would like to report the count of each table by using Total Running Fields. I created one for ATable (Called ATableTRF) and when I post it on my report this is what happens:
1) The SQL Query (Show SQL Query) shows:
SELECT "ATABLE"."ATABLE_KEY"
FROM "DB"."ATABLE" "ATABLE"
2) The total records read is the number of records in ATable.
3) The number I get is correct (total records in ATable).
Same goes for BTableTRF, if I remove ATableTRF I get:
1) The SQL Query (Show SQL Query) shows:
SELECT "BTABLE"."BTABLE_KEY"
FROM "DB"."BTABLE" "BTABLE"
2) The total records read is the number of records in BTable.
3) The number I get is correct (total records in BTable).
The problems starts when I just put both fields on the reports. What happens then is that I get the two queries one after another (since the tables are not linked in crystal reports):
SELECT "ATABLE"."ATABLE_KEY"
FROM "DB"."ATABLE" "ATABLE"
SELECT "BTABLE"."BTABLE_KEY"
FROM "DB"."BTABLE" "BTABLE"
And the number of record read is far larger than each of the tables - it doesn't stop. I would verify it's count(ATable)xcount(BTable) but that would exceed my computer's limitation (probably - one is around 300k rows the other around 900k rows).
I would just like to report the count of the two tables. No interaction is needed - but crystal somehow enforces an interaction.
Can anyone help with that?
Thanks!
Unless there is some join describing the two tables' relationship, then the result will be a Cartesian product. Try just using two subqueries, either via a SQL Command or as individual SQL expressions, to get the row counts. Ex:
select count(distinct ATABLE_KEY) from ATABLE
If you're not interested in anything else in these tables aside from the row counts, then there's no reason to bring all those rows into Crystal - better to do the heavy lifting on the RDBMS.
You could UNION the two queries. This would give you one record set containing rows from each query once.