Am need of your suggestion for scenario below :
one of our clients has 8 postgres DB servers used as OLTP and now wants to generate MIS reports/dashboards integrating all the data in the servers.
- There are around 100 reports to be generated
- There would be around 50k rows added to each of these databases
- the reports are to be generated for once every month
- they are running all there setup in baremetals
- they don't want to use hadoop/spark , since they think the maintainabilty will be higher
- they want to use opensource tech to accomplish this task
with all said above, one approach would be to write scripts to bring aggregated data into one server
and then manually code the reports with frontend javascript.
is there any better approach using ETL tools like Talend,Pentaho etc.
which ETL tool would be best suited for this ?
community editions of any ETL tool would suffice the above requirement..?
I know for the fact that the commercial offering of any of the ETL tools will not be in the budget.
could you please let me know your views on this.
Thanks in Advance
Deepak
Sure Yes. I did similar things successfully a dozen time in my life.
My suggestion is to use Pentaho-Data-Integrator (or Talend) to collect the data in one place and then filter, aggregate and format the data. The data volume is not an issue as long as you have a decent server.
For the reports, I suggest to produce them with Pentaho-Report-Designer so that they can be send by mail (with Pentaho-DI) or distributed with a Pentaho-BI-server.
You can also make javascript front end with Pentaho-CDE.
All of these tools are mature, robust, easy to use, have a community edition and well supported by the community.
Related
In other words, can Zeppelin be used as a Tableau replacement at small scale?
I have a new UI/UX design of reporting dashboard. Data for dashboard comes from relational database (SQL Server). This dashboard is to be viewed by ~300 colleagues in my company. Perhaps up to ten of them will be viewing it at the same time.
Currently the dashboard is implemented in Kibana with data being imported into Elasticsearch from SQL Server on a regular basis. However, the new design requires certain widgets and data aggregations that go beyond dashboarding capabilities of Kibana. Additionally, my organization desires to migrate this dashboard to a technology which is considered more familiar for data scientists that work with us (Kibana isn't considered such).
This report and dashboard could be migrated to Tableau. Tableau is powerful enough to perform desired data aggregations and present all desired widgets. However we can't afford licenses cost, but we can invest as much developer's time as needed.
I have evaluated couple of open-source dashboarding tools (Metabase and Superset) and they are lacking aggregations and widgets that we need. I would not go into details because the question is not about specifics. It is clear that Metabase and Superset are not powerful enough for our needs.
I have an impression that Apache Zeppelin is powerful enough with its support for arbitrary Python code (I would use Pandas for data aggregations), graphs and widgets. However, I am not sure if single Zeppelin instance can support well number of concurrent viewers.
We'd like to build a set of notebooks and make them available to all colleagues in the organization (access control is not an issue, we trust each other). Notebooks will be interactive with data filters and date range pickers.
Looks like Zeppelin has switchable interpreter isolation modes which we can use to make different user's sessions isolated from each other. My question is whether a single t2.large AWS instance hosting Zeppelin can sustain up to ten users viewing report aggregated on 300k rows dataset. Also, are there any usability concerns which make an idea of multi-user viewing of reporting dashboard impractical for Zeppelin?
I see a couple questions you're asking:
Can Zeppelin replace Tableau on a small scale? This depends on what features you are using in Tableau. Every platform has its own set of features that the others do or don't have, and Tableau has a lot of customization options that you won't find elsewhere. Aim to get as much of your dashboard converted 1:1 then warm everyone up to the idea that it will look/operate a little bit different since it's on a different platform.
Can a t2.large hosting Zeppelin sustain up to 10 concurrent users viewing a report aggregated on 300k rows? A t2.large should be more than big enough to run Zeppelin, Tableau, Superset, etc. with 10 concurrent users pulling a report with 300k rows. 300k isn't really that much.
A good way to speed things up and squeeze more concurrent users on with your existing infrastructure is to speed up your data sources. That is where a lot of the aggregation calculations happen. Taking a look at your ETL's and trying to aggregate ahead of time can help, as well as make sure your data scientists aren't running massive queries slowing down your database server.
Is there a way to store all timestamps and details of modifications of fields in a table automatically in Filemaker Pro 13? ie is there an easy option somewhere that Filemaker provides, or must I do this programmatically / manually?
Ray Cologon, PhD and allround FileMaker superstar, wrote a custom function that works well for us. It is free to use, but you must have a copy of FileMaker Pro Advanced to install the custom function.
http://www.nightwing.com.au/FileMaker/demosX/demoX01.html
FileMaker does not provide a ready-to-go method of audit logging. However, there are a few decent options. Linear Blue provides fmDataGuard and SyncDek just for this purpose, and does a very nice job. (SyncDek is great for [and requires] FileMaker Server; fmDataguard is great for standalone databases and small server deployments.)
Nightwing's solution is clever and hooks up very similarly to fmDataGuard, but I think fmDataGuard is more robust.
All of these audit logging solutions have a critical limitation. You cannot log deletions as a [Full Access] user. If this is critical for your application, SyncDek is the only solution that offers a work-around in the latest versions: record change polling.
There is a final possibility that might be worth considering for some applications. Databases like MySQL have audit log plugins without the permissions limitations of FileMaker. You can connect FileMaker to one or more MySQL databases and use their tables more-or-less like native FileMaker tables. With the MySQL audit log plugin, you can get your audit logging and use FileMaker for your UI.
You can do this programmatically. But with the help of MBS Plugin you can achieve this easily.
Here is the documentation link
Example :
MBS( "Audit.Changed"; timestamp; TableName { ; FieldsToIgnore } )
I'm using an application that is very interactive and is now at the point of requiring a real analytics solution. We generate roughly 2.5-3 million events per month (and growing), and would like to build reports to analyze cohorts of users, funneling, etc. The reports are standard enough that it would seem feasible to use an existing service.
However, given the volume of data I am worried that the costs of using a hosted analytics solution like MixPanel will become very expensive very quickly. I've also looked into building a traditional star-schema data warehouse with offline background processes (I know very little about data warehousing).
This is a Ruby application with a PostgreSQL backend.
What are my options, both build and buy, to answer such questions?
Why not building your own?
Check this open source project as an exemple:
http://www.warefeed.com
It is very basic and you will have to built datamart feature you will need in your case
I have a discussion-db, and I need a great amount of test data, for different sized samples. Please, see the ready SELECT, JOIN and CREATE-queries, please scroll down in the link.
How can I automatically generate test data to the db?
How to generate test data in different sized samples?
Is there some ready tool?
Here are a couple of suggestions for free tools that generate test data:
Databene Benerator: supports many JDBC-capable database brands, uses XML format compatible with DbUnit, GPL license.
Super Smack: originally a load-test tool for MySQL, it also supports PostgreSQL and it includes a generator of mock data.
A current version of Super Smack appears to be available here
I asked a similar question here on StackOverflow in February, and the two choices above seemed like the best options.
I know this question is super dated, but I was looking for the answer to this exact question today and I came across this:
http://wiki.postgresql.org/wiki/Sample_Databases
Out of the options listed (including built in tools like pgbench), pgFoundry has several compelling options that works perfectly for the test cases I am working on.
I thought it might help someone like me, so there it is.
I'm not sure how to get automatically generated data and insert it into the database (I'm sure you could pull it off with a python script or something), but if you're just looking for endless blabbering to stick into a db, this should be helpful.
I'm not a postres person, but in many other DBs I've used, a simple mechanism for generating large quantities of test data is a cross join. The technique is particularly useful for generating large quantities of test data.
Here's a nice blog post on it (SQL Server specific though).
I've heard of "Crystal Reports" for years, but I'm really confused why a small ActiveX type of component that just displays and prints out data from databases (does it?) should be considered a whole product within the VS suite of products.
Is it something better, like something for Windows Server that lets you generate report server-side as PDFs or similar which is why its considered so important?
Enlighten me.
Crystal Reports is a very robust (and in many developers' opinions, complicated and painful) tool to build complex reports. It's much more than simply printing what's in the database - taking relational data and transforming it into massive corporate reports with hundreds or thousands of conditions is very time-consuming and difficult. For example, what if the report needs to have product summary sections which can be formatted completely differently based on the qualities or attributes of the product? CR has a scripting model that permits pretty much any transformation imaginable.
To replace Crystal Reports with something you seem to be imagining, would require a data transformation engine; an end-user-friendly UI to write transformation rules and design reports; and a presentation engine to format the reports in a print-friendly way. That definitely sounds like a full-fledged product to me.
The worst thing about CR is that there isn't anything better at what it does.
If what you want to do is what it "likes" to do--dump data from the DB into a formatted page--it's dead simple. If you're willing to tolerate pain & frustration, you can make it do all sorts of fancy things.
It's definitely more than just "an ActiveX control".
It's a whole product because it is supplied as such by the developer, and is installed only optionally. It enables support for Crystal Report files.
And no, it's not a small ActiveX type of component. It comes with a full-fledged report designer and runtime component and is a complete report solution, much like SSRS (SQL Server Report Services, or something - is that what you meant with the thing for Windows Server?). Have look at their web page for more information.
The Crystal Reports that come with Visual Studio are a 'lite' version of the suite of products , see this page for comparison of features between the full and lite versions of 2008
You should try Stimulsoft Reports.Net its better than CR.NET.In this solution there are no ActiveX involved and no merge module and runtimes....
One of the cool things they added was support for binding to .NET and other data providers. This company has been bought by so many companies in the past it has really hurt the product IMO.
Crystal Report is a third part "Reprot Creation" tool.
This comes as build-in with Visual Studio IDE, and using this tool you can create reports in your application.
Its a reporting tool that has a stand alone application for generating reports or reports can be integrated into a .net application.