Extending BIRT using Programming by Example concept - eclipse-rcp

Can the following be achieved using eclipse BIRT ? Consider a dataset with some 1000 records .Get 20 records after randomizing them.
The user selects few rows and columns(say checkboxes are provided for selection) of his interest among the 20.
Based on his selection can "SELECT" queries be predicted.
Basically, can "PROGRAMMING BY EXAMPLE" concept be used to suggest query in a brute-force manner.
Employee table
id first_name last_name email gender mobile salary

No, this feature is not supported by BIRT.

Since BIRT can be called launched with APIs and URLs, the functionality you describe above can easily be accomplished on a JSP page (or RCP screen if thats what you are doing) and then the query can be passed into a BIRT rptdesign. With a little bit of report scripting, a table and columns can be added. There are lots of examples floating around of dynamically adding tables and columns with BIRT scripting.

Related

The Good way to create pentaho cde dashboard

Pentaho version : bi server CE 6.1
I'm new to pentaho universe and I found myself stuck in finding documentation to create a cde dashboard. Just to be clear, I have no idea of what is the good way to create cde dashboard, but i tried many things based on tutorials found pretty much everywhere
What i have done so far
From this data model
I already created a dynamic chart with "sql over sqljdbc" datasource.
Here is my query (and the result behind in picture)
SELECT (select survey_type from survey where id = pr.form_type) as "form type",
pr.date as "Date",
count(pr.id) as "Form number"
FROM result pr
inner join district pd on pr.district_id=pd.id
inner join departement pdep on pd.departement_id=pdep.id
inner join region pre on pdep.region_id=pre.id
WHERE pre.region_text = ${region}
GROUP by date,form_type
ORDER by date;
Dashboard generated by the query - Form number by date, type and region (set dynamically)
What I want to achieve
I want to do this kind of chart : community.pentaho.com/ctools/ccc/#type=bar&anchor=small-multiple-bars or community.pentaho.com/ctools/ccc/#type=bar&anchor=stacked-bar (sorry i don't have enough reputation to post more than 2 links) with a "sql over jdbc" datasource
Can anyone give me an example of sql request to achieve that ? (preferably with the sql request given up on this post with some modification.I tried this but it does not work as expected:
SELECT (select survey_type from survey where id = pr.form_type) as "form type",
pr.date as "Date",
pre.region_text as region,
count(pr.id) as "Form number"
FROM result pr
inner join district pd on pr.district_id=pd.id
inner join departement pdep on pd.departement_id=pdep.id
inner join region pre on pdep.region_id=pre.id
GROUP by date,form_type,pre.id
ORDER by date;
)
And where can i put the code given behind this example to previsualize it in my own instance of pentaho ? I need to know how to reproduce it
What i want to know
The good way to do cde chart on pentaho :
how the query need to be formatted ? (how fields are organised on dashboard, number max of fields...)
what is the difference between mdx queries and sql queries and purpose ?
what is the best way to do chart between those two types (mdx and sql) ?
how can i transform my relational database in mondrian cube if i want to use mdx queries (or what i should do is to redesign the database in datawarehouse using kettle ?)
Thank you for your answers.
First of all you should realize that you're asking alot here. Having said that you've pretty much done what I did when I first started with Pentaho which was experiment. Alot.
Regarding your questions I have some links which should help you (if you haven't checked them already)
http://pentaho-bi-suite.blogspot.be/2014/01/inter-panel-communication-in-pentaho.html
http://holowczak.com/getting-started-with-pentaho-community-edition-dashboard-editor-cde/
The first link is a very good blog on which I have found several answers regarding dashboards.
The second link is more of an overal tutorial.
There is no general "best way" (apart from applying general best practices ofcourse) for creating dashboards. I suggest you keep trying (getting to know all of the properties and settings along the way) and find out what method works best for you.
Regarding your questions about MDX and Mondrian, I haven't had much experience in these area's but as I understand it MDX queries are based off of Mondrian cubes which you prepare in the Mondrian Schema Workbench of Pentaho.
http://mondrian.pentaho.com/documentation/olap.php
I believe this should answer (atleast some of) your questions. Trying many different things and experimenting will get you quite far as you'll catch up with plenty of small things one at a time.
I will elaborate a little bit on this.
As dooms stated, you ask a lot of things here but I am glad you are trying to create some great dashboards.
In order to format charts and tune them, I remembered I had to learn some JavaScript/JQuery.
The difference between SQL and MDX. They are completely different,
even when sometimes the syntax looks similar. You use SQL to query
relational databases whereas MDX is used to query Cubes. If you don’t
have cubes in place you need to use SQL of course. If not, you should
ask the cube developer to introduce you to this world. Basically
cubes are good at aggregating data and allows to easily interact and
perform ad-hoc analysis, it is intended for business analyst to let
them better explore the data. I am a MDX fan, but I would recommend
you to explore new alternatives to multidimensional cubes, like
tabular models or other in-memory technologies.
The best way to do a chart has nothing to do with MDX or SQL. It depends where your data is stored. The most important thing is to have a good data model behind.
Again, depending on your architecture, you should have a multidimensional model in your data mart, without snow flake if possible. That allows you either to build easy SQL queries and a straight forward cube design. Designing cubes required some extra skills. I would try to have a clean data model and then start to evaluate if a cube is required.
I hope I give you some lights, it is not easy to answer the broad questions you asked. Important is to define the scope of your project.
Kind Regards,

Backoffice java client framework - load on deman

We are building our new Next generation server for a medium sized back office application.
We already decided we would like to use a java framework for the client side (gwt \ vaadin \ zkoss)
What we would like now is to create a Proof Of Concept example of each technology.
our back office ui is pretty standard, we have tables \ grids with filters that should show entries straight from the DB.
Problem is we got huge amount of rows in each table (1M minimum)
which mean that we must use a load on demand tables for them.
My questions is: how do i implement a load on demand table for my big tables? I looked around and saw the following concept again and again:
you create a container, you populate it with data, the data is being displayed on the client side.
problem is i tried this naive way to populate the containers with 1M entries and it was awful. are there any built in on demand containers?>
any code examples \ references will be a huge help!
You would want to use GWT Cell Table, which has the AsyncDataProvider, that lets you handle the user's paging and sorting events by grabbing data from your server.
It also provides an alternative ListDataProvider, which lets you grab your data as a list of objects, then set that data to your table. If you use ListDataProvider, you have to define how to sort your objects with Comparators, and table will handle sorting and paging against that list.
Google "gwt celltable asyncdataprovider example" for more examples and tutorials.
Vaadin has a nice concept of lazy loading data in most of the components.
For example the table, list, dropdown's etc. have that concept.
The only thing you realy need to know at start, is the number of total rows.
Everything else can then be handled "ondemand".
For example the Table component initially only loads about 30 rows (can be customized)
and then fetches rows as needed. (Or better they are usually fetched just before the user scrols to the next rows)
A example is this demo
http://demo.vaadin.com/dashboard/#!/transactions
How you retrieve the data from your backend depends on the technology used.
But vaadin has working concepts where you don't need to load all 1mio. rows into memory,
it will handle the "fetch on demand" as the rows need to be displayed.

Cant use LIKE but need to find related records in SQL Server

I've got a table used for issue tracking (kind of like stackoverflow :) to log PC related issues) and for simplicity I'll narrow it down to a few fields, something like the following:
Site Category Issue
MI Office Software My MS word does not run macros.
CL Office Hardware PC memory needs to be upgraded
MX Office Printer Printer is out of memory.
MI Office Software Office product prompts for allowing macro to run
I want to find related issues when I am looking at for instance one issue. I can't really use the LIKE operator as for instance if I do:
SELECT...FROM...WHERE Issue LIKE '%My MS word does not run macros.%'
Would only return the first record. Do I have to figure out how to pull key words like "Macros" ? How would I find related records so that my query for instance could return records 1 and 4. Or return 2 and 3 together?
Well here are 3 ways to go about it..
1. Best case:
We have the users add 'tags' to each issue. This way users can search issues using tags and find related issues too. (Just like http://stackoverflow.com ;) )
This could be implemented creating two new tables:
tag_metadata (tag#, name, description, ...)
tag_issue_relationship(tag#, issue#)
We could go a step further and add weight to each issue entry that will determine its position in the similar issue search/look-up ranking.
2. Average case:
We have more levels to sub-categories to help further classify the problem. Now thinking of change control, will your system support easily adding/removing/re-arranging the category hierarchy over time..?
3. Worst case:
Lets say, the users are very lazy and don't want to spend a few seconds tagging their issues :).. Then you would have to implement an indexing algorithm that picks up keywords (nouns) for the issue description and builds indexes to facilitate finding 'similar' issues. Now many-a-times we may have keywords in the description that may not be significant and would result in false positives.
[Update]
Basically the solution what you are looking for could be broken up into these modules:
Parser: Will extract significant keywords from the issue description. A custom dictionary list of keywords would be used as the lookup table.
Indexer: Would index these keywords to make it searchable. This involves maintaining forward and reverse indices!
Search: Would use the indexes to locate 'similar' issues.
There may be an existing commercial/open-source product that does this..

How to define checkboxlist-like database field

I've tried looking around the internet for an answer, but because I don't understand databases very well I'm having trouble phrasing my question to work as a Google search.
I apologize if this has been answered somewhere, but I can't seem to find it. A link to something that could answer this would be greatly appreciated.
I am trying to set up tables in my database. I have a Software table that contains many fields, including a field called "Compatible with."
"Compatible with" is used to store the operating systems this software is compatible with (i.e. Windows XP, Windows 7, etc.). Simple enough.
With other fields that have a set number of responses (like dropdownslists), I have normalized the database with separate tables and foreign keys. My guess is that it would be "good practice" to do this as well for the "Compatible with" field, but I'm not really sure how I should set up the normalized table.
I found something here http://forums.asp.net/t/1675666.aspx/1 that may be the right direction to go, so I thought of making my table like this:
Column Name Data Type
CompatibleWithId int
Windows XP bit
Windows 7 bit
...want to be able to add more later...
The problem I ran into is that in the future I will need to add to the list of options in the "Compatible with" field. Is there a way in C# code (I am using MVC 2 and Entity Framework 4.3.1 I believe) to add a column to a table like the one above? Or is there a completely different way I should be setting this up?
According to my understanding of your software requirement (please correct me if i am wrong), here is how I will have the database schema
table1: software (id, name, blah, blah)
table2: operating_sys (id, os_name, version)
table3: compatibility (software_id, os_id)
Relations:
Software to compatibility --> one to Many
So compatibility table will have multiple rows for each software. Now in future if you have more Operating systems, or versions you can add them in the operating_sys table, and add a corresponding entry in compatibility table.
Is that what you were looking for?

Options for handling a frequently changing data form

What are some possible designs to deal with frequently changing data forms?
I have a basic CRUD web application where the main data entry form changes yearly. So each record should be tied to a specific version of the form. This requirement is kind of new, so the existing application was not built with this in mind.
I'm looking for different ways of handling this, hoping to avoid future technical debt. Here are some options I've come up with:
Create a new object, UI and set of tables for each version. This is obviously the most naive approach.
Keep adding all the fields to the same object and DB tables, but show/hide them based on the form version. This will become a mess after a few changes.
Build form definitions, then dynamically build the UI and store the data as some dictionary like format (e.g. JSON/XML or maybe an document oriented database) I think this is going to be too complex for the scope of this app, especially for the UI.
What other possibilities are there? Does anyone have experience doing this? I'm looking for some design patterns to help deal with the complexity.
First, I will speak to your solutions above and then I will give my answer.
Creating a new table for each
version is going to require new
programming every year since you will
not be able to dynamically join to
the new table and include the new
columns easily. That seems pretty obvious and really makes this a bad choice.
The issues you mentioned with adding
the columns to the same form are
correct. Also, whatever database you
are using has a max on how many
columns it can handle and how many
bytes it can have in a row. That could become another concern.
The third option I think is the
closest to what you want. I would
not store the new column data in a
JSON/XML unless it is for duplication
to increase speed. I think this is
your best option
The only option you didn't mention
was storing all of the data in 1
database field and using XML to
parse. This option would make it
tough to query and write reports
against.
If I had to do this:
The first table would have the
columns ID (seeded), Name,
InputType, CreateDate,
ExpirationDate, and CssClass. I
would call it tbInputs.
The second table would have the have
5 columns, ID, Input_ID (with FK to
tbInputs.ID), Entry_ID (with FK to
the main/original table) value, and
CreateDate. The FK to the
main/original table would allow you
to find what items were attached to
what form entry. I would call this
table tbInputValues.
If you don't
plan on having that base table then
I would use a simply table that tracks the creation date, creator ID,
and the form_id.
Once you have those you will just need to create a dynamic form that pulls back all of the inputs that are currently active and display them. I would put all of the dynamic controls inside of some kind of container like a <div> since it will allow you to loop through them without knowing the name of every element. Then insert into tbInputValues the ID of the input and its value.
Create a form to add or remove an
input. This would mean you would
not have much if any maintenance
work to do each year.
I think this solution may not seem like the most eloquent but if executed correctly I do think it is your most flexible solution that requires the least amount of technical debt.
I think the third approach (XML) is the most flexible. A simple XML structure is generated very fast and can be easily versioned and validated against an XSD.
You'd have a table holding the XML in one column and the year/version this xml applies to.
Generating UI code based on the schema is basically a bad idea. If you do not require extensive validation, you can opt for a simple editable table.
If you need a custom form every year, I'd look at it as kind of a job guarantee :-) It's important to make the versioning mechanism and extension transparent and explicit though.
For this particular app, we decided to deal with the problem as if there was one form that continuously grows. Due to the nature of the form this seemed more natural than more explicit separation. We will have a mapping of year->field for parts of the application that do need to know which data is for which year.
For the UI, we will be creating a new page for each year's form. Dynamic form creation is far too complex in this situation.