Reading formula from a different collection - mongodb

Here is my scenario:
I have 2 XLS files :
One with 5 columns : date, service, value1, value2, Value3
One with the formulas to apply: Service, Type_of_aggregation, columns_to_aggregate
According to the service you apply a formula (sum or avg of the values of a column.)
Questions:
- How can i tell mongodb to retrieve and apply the specified formula ?
Example:
We insert both XLS files into seperate collections.
We somehow trigger off the calculation: example: For Service1 do a Sum of all the values of colunm_value1.
We store the result in a new collection
I hope you understand what i am trying to do and hoping to get some help on the best way to do this.
Thank you,

Related

How to filter one source by clicking and filtering a bar chart from another source in Tableau?

I used an Apriori algorithm to view the frequent relationships in the dataset and I want to do a dashboard to better visualize this data but I don't know how to do this filter.
This is the bar chart that I created to show the support (amount of times something happend) and the confidence (probability of B happening given A) of these associations:
Apriori Chart
Next to it on the dashboard, I'll have a table with the full dataset used in this Apriori analysis where I have more information such as ID, Income, Hours Worked, etc:
Table from different data source
How can I create this relationship? The two data sources don't have a column in common that I can use for that.
I would need some way to:
Split the values in the antecedents columns by comma and filter only those columns with value equal to 1 in the other dataset
**Dataset A**
'Age Range <=30, Joblevel 1, Maritalstatus Single'
->
'Age Range <=30'
'Joblevel 1'
'Maritalstatus Single'
**Dataset B**
'Age Range <=30' == 1
'Joblevel 1' == 1
'Maritalstatus Single' == 1
Clicking this would filter the table next to it
Is there any way I can do this in Tableau?
You can download the tbwx i used in this example here https://community.tableau.com/servlet/JiveServlet/download/1083124-384949/Apriori.twbx
Thanks in advance for the help!
I am not able to check your twbx on the machine I'm using but I think you should be able to do this. The fields in the 2 data sources need to match so manipulate the data sources the make this happen.
For data source 1 there's a function SPLIT which will mean you are able to split the comma separated string to 3 fields.
Putting those 3 fields to the Detail shelf of your bar chart (or even Rows and hiding the header) will mean you can use them in an action filter.
Your second data source is a cross tab - post pivot. You should be able to pivot this data source. Highlight the measures and pivot them. This will give you the field Pivot Field Names and Pivot Field Values.
You only want to keep those with a value of 1 so create a calculated field
[Lookup1]: IF [Pivot Field Values] = 1 THEN [Pivot Field Names] END
Duplicate this field twice so you have Lookup1, Lookup2 and Lookup 3.
Then you should be able to action filter the table.
In the action filter set it up so SplitField1 = Lookup1, SplitField2 = Lookup2, etc.
Fingers crossed this works, I haven't been able to test so I am pulling it out of my head.

In BIRT is it possible to create a chart from summarized data columns?

Is it possible to create a line chart using summarized data from dataset?
My scenario is the following:
detail section: each one of the cells are one output field in the dataset
JAN FEB MAR
item1 R X R
item2 X A R
item3 R R R
footer section: here we count the occurrence of each value per month by using count aggregation elements and filter by value
TOTALS:
R 2 1 3
X 1 1 0
A 0 1 0
And what I need to do is to add a chart that shows something like this:
Needed Chart
And what I have is this:
report output
report design
In EXCEL this is really simple, but I cannot figure out how it can be done in BIRT.
I thought something like to create a new data set with 3 output fields for R, X, and A and each row will be one month, so I will have a transposed table and that way it will me much easier for graph it. But I cannot do it using the aggregation fields, and I cannot find out how to it with the output fields from the main dataset.
Any ideas? If you need the source rptdesign file I can provide it to you, but the logic in it could be hard to understand.
Any help is appreciated, and thanks in advance.
Have a great 2020.
First of all, a BIRT rule of thumb: if you need aggregations outside of a table, create them outside of a table. Do not try to access values in a table from the outside. It is possible, it may sometimes be the only solution, but it usually messes up your whole report, it is hard to debug, and even harder to maintain.
Aggregate
As your dataset looks quite simple and you already know how and what to aggregate, your first call should be computed columns in the dataset:
Here you aggregate in the language according to your datasource. If that is SQL, I guess a COUNT and GROUP BY statement will do the job.
Create all the columns you need for your graph here.
BTW: Computed columns are usually the silver bullet in BIRT. I use them for almost any pre-computation or custom field creation.
Visualize
You did not mention the library you are using for graphs, so I will assume you want to use the basic BIRT graphs. The basic charts with the months on the x-axis will do your job. I just want to add here that you have two options for multiple series:
You can either prepare your dataset so that you can feed the graph with a series per type (one line in your chart example) OR maybe easier: use optional y series grouping on your computed columns (as mentioned):
This way your graph will create the separate series for you. I hope this helps!
If you get stuck with the basic birt grpahs in general, you might want to think about finding a JavaScript-based graph library that does exactly what you need and implement that. Remember: you can put almost anything that is based on JavaScript into BIRT.
Final remark: For the sake of your report end users, please use a multi-bar chart. Line charts are not readable for overlapping values.
Thanks Kekzpanda for your help and time helping me in this question I had.
After struggling for a while I finally reach for a solution transposing the "table" of aggregations I had in the table footer, by using javascript arrays and an extra dataset and here are the steps I did in case someone else have the same problem:
For example, you need to transpose a table of 10 columns and 3 records
In report initialize method create an array with [10,3] dimension
// bi dimensional array indexes
var i=0;
var j=0;
// array definition and initialize it using 'for' iteration
var matriz = new Array(10);
for (i=0;i<10;i++){
matriz[i] = new Array(3);
}
// restart the array indexes in case you need to go through it in the future
i=0;
j=0;
Then you need to save the aggregation field value in one of the positions in the array. For that click on the aggregation field and go to the onCreate method and add the following code:
matriz[0][1]=this.getValue();
When finish all the aggregation fields, you will have an array with the transpose table. CHECKED!
Move the data in the array to the new dataset, select the fetch method and add something like this:
if(i >= array.length) return(false); // when you finish going through each item in the array.
row["A"]=matriz[i][0];
row["B"]=matriz[i][1];
row["C"]=matriz[i][2];
i++; // increment the first index by 1 to move to next row in the array
return (true);
Now you have your new data set with the transpose data.
Now work with this data set and graph the data, create the different series in the graph design for each column in the dataset.
Hope this help.
Bye.

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

Insert multiple records into fact table based on fields in single record

I'm working in Pentaho 4.4.1-GA (Kettle / PDI). The database is Postgres.
I need to be able to insert multiple records into a fact table based on the fields that come from a single record. The single record contains fields:
productcode1, price1
productcode2, price2
productcode3, price3
...
productcode10,price10
So if there was a value for each of the 10 productcode / prices then I'd need to insert a total of 10 records into the fact table. If there were values for 4 of the combinations, then I'd need to insert 4 records into the fact table, etcetera. All field values for the fact records would be identical except for the PK (generated by sequence), product codes, and prices.
I figure that I need some type of looping construct which would let me check whether or not a value was present for each productx field, and if so, do an insert/update step on the fact table with the desired field values. I'm just not sure how to do this in Pentaho.
Any ideas? All suggestions are welcome :)
Thank You,
Rakesh
Could you give a sample input and output for your scenario??
From your example data I can infer that if there are 10 different product codes and only 4 product prices you want to have 4 records inserted into your table. Is that so?
Well for a start you can add a constant value of 1 to those records by filtering for NOT NULL and then use an Group BY Step to count the number of 1's. This would give you the count. BTW it would be helpful if you could provide more details on what columns you would be loading as there are ways to make a PDI transformation execute multiple times

Merge Fields in Crystal for Chart

Anyone know if there is a way to merge several fields (they have the same possible values) in Crystal so they can be used together in a chart?
For example: Field1, Field2 and Field3 all have values of only a 1 or 2. I want to merge all the values from these 3 fields together to show a total of how many 1s there are vs 2s combined.
Any ideas?
Use a formula and chart on that. You can do whatever operations you want in the formula.