talend how can we estimate taggregateSortedRow recordcount parameter value - talend

We are trying out talend and we wanted to aggregate some sorted data on few keys .
Simple enough but when we try to use taggregatesortedrow its asking for Exact number of rows to be specified.
I am not sure how any one can input this on the fly. Dosen't this value change for every run ? am i missing something. surely they cant expect us to know total recs before we run the job.

This has to do with the way in how the Talend component tAggregateSortedRow is programmed. To avoid it omitting data you need to provide the record count. There are a few users with the same question like you asked:
https://www.talendforge.org/forum/viewtopic.php?id=50094
https://www.talendforge.org/forum/viewtopic.php?id=54231
https://www.talendforge.org/forum/viewtopic.php?id=7641
which I found simply by using Google.
Anyway, if you need to do sorting and aggregating, consider using the components tAggregateRow and tSortedRow separately. It should work fine.

Related

AZURE DATA FACTORY - Can I set a variable from within a CopyData task or by using the output?

I have simple pipeline that has a Copy activity to populate a table. That task is based on a query and will only ever return 1 row.
The problem I am having is that I want to reuse the value from one of the columns (batch number) to set a variable so that at the end of the pipeline I can use a Stored Procedure to log that the batch was processed. I would rather avoid running the query a second time in a lookup task so can I make use of the data already being returned?
I have tried duplicating the column in the Copy activity and then mapping that to something like #BatchNo but that fails and have even tried to add a Set Variable task but can't figure out how to take a single column #{activity('Populate Aleprstw').output} does not error but not sure what that will actually do in this case.
Thanks and sorry if its a silly question.
Cheers
Mark
I always do it like this:
Generate a batch number (usually with a proc)
Use a lookup to grab it into a variable
Use the batch number in all activities (might be multiple copes, procs etc.)
Write the batch completion
From your description it seems you have the batch embedded in the data copy from the start which is not typical.
If you must do it this way, is there really an issue with running a lookup again?
Copy activity doesn't return data like that, so you won't be able to capture the results that way. With this design, running the query again in a Lookup is the best option.
Is the query in the Source running on the same Server as the Sink? If so, you could collapse the entire operation into a Stored Procedure that returns the data point you are trying to capture.

Talend Component with multiple inputs and unrelated outputs

using Talend Open Studio, I have a data-processing component, for which I'd appreciate your advice on how to make this possible (a) in a single component and (b) without a dirty workaround - thanks.
Relating part (a):
I have two different inputs:
One Input (with exactly one row) defines some kind of metadata for my processing.
One Input (with 1...n rows) defines the core data to process.
Currently, I solved this first requirement using two components and passing my metadata to the second component using the globalMap. But it would be nice, if I could integrate both connections into one component.
Relating part (b): After I have read all my input rows, I need to process them all at once. So far, so easy, I could use the end-section - my problem comes here: After that processing, I need to create a number of output-rows for a single output connection. Problem is, that Output-rows can only be created in the main-part and there I don't know when the last row was read...
Currently, I solved this counting the input-rows in advance and then, after that number is reached, I create that output. But this seems a really dirty workaround to me, so maybe someone has a solution for that, too?
Thank you for any useful tips!

Tableau performance

I've a problem with the dashboard in Tableau. In the dashboard there are many worksheets, and all the columns that are in the report are calculable. The problem is that dashboard is being formed for a very long time. The report contains approximately 2 million rows. And it is generated about 5 minutes.
Tell me, what are the solutions in this case?
Maybe I can somehow adjust the page display and not all the records at once?
To reduce the calculation time, try to exclude data you don't need with a data source filter in tableau. You can also hide or delete unused calculated fields. Other things you can do is reduce sheets that are not used.
Here's a link: https://www.tableau.com/about/blog/2016/1/5-tips-make-your-dashboards-more-performant-48574
Steps to follow to reduce calculation time:
Extract the data and use Extract data and also keep option as extract instead of live.Also replace the data source using extract data.
Use "User Filter" to reduce calculation time so that tableau will display of particular user data only.
I hope this will work to solve your problems.
I have one more idea to resolve this issue.
1)when you loan first time your dashboard put into Dashboard Action Filter
First Time load dashboard data exclude in your sheet.
Dashboard Menu->Action->add action->select sheet and exclude option.
2) Live to Extract data source and select radio button extract.
3)use user filter.
I am following the other answers (use extract, dashboard action filter...) and I want to add one point:
Drag every field used by any tablesheet on the dashboard on "Detail" of every tablesheet you are using on the Dashboard. Now Tableau loads all needed data while loading the first tablesheet and can use this data for the other sheets.
i.e. A dashboard contains three tablesheets (A, B, C) now you drag every field used by A on "Deatil" of B and C, every field used by B on "Deatil" of A and C, every field used by C on "Deatil" of B and A.
We are also having a similar issue with 150 million rows but I want to check if you are doing following steps. This may help you. This goes back to fundamentals of Tableau reporting.
1/ Try to make sure your data set is in star schema format. This will help a lot in report.
2/ Try to have tables and views in DB in such a way that same columns are used in Tableau. Any extra columns in tables adds to the performance issue.
3/Make sure indexing is done properly for all the fields that are joined.
4/ In my experience Dashboard adds extra performance lag. So make sure you try to get as much performance tuning on sheets as possible before even going to dashboard.
5/ If required try to use materialized views.
hope this helps.
Try to capture performance metrics using performance recorder option in Tableau.
Check for the underlying DB tables and joins present on the data source layer.
Try using optimized sets and parameters as required and get rid of less relevant filters.
Try using data extracts with scheduled refresh with data source filter for limited business years data.

Dynamic parameters in Tableau

Looking for a solution to dynamically update the parameter in tableau, so that it populates new/changed values, automatically, once the data is refreshed.
Just fyi, Currently, my parameter is getting values from a field which has got date values in string format(for eg: 2016Q1).
Request help.
This is not currently possible without serious XML hacking and scripting. There have been multiple requests for this feature from Tableau users, and it might be added in the future.

iReport: Setting Parameter Values from Query MongoDB

I am fairly new to JasperReports and iReport and am struggling with something, which seems should be basic.
If you use MongoDB then you know it does not support the concept of a 'JOIN'. Therefore, from the iReport main dataset query I want to set a parameter/variable from the results. Then I want to use the collection values I just set in a different dataset as a query parameter/variable (NOT table, or LIST - just a plain old simple dataset I create, which will also query MongoDB as the source).
It seems this would be a straight forward use case, but I don't see anything intuitive in iReport that seems would do this. Can this be done? If so any clues you can give me would be wonderful and greatly appreciated.
Do you want to pass the values as collection from one report onto the other ?
This can be done by writing the following in your filter expression $P{parameter_name}.contains($F{field_name}). Additionally, you need to create parameter with the same parameter_name with class type java.util.collection .
Now this report is ready to receive any parameters as collections. This works for MongoDB as I have tried this out. Now as you have already said that you have been able to send the collection from the main report, the above method will work for receiving the parameter in the second report.