I have a process to automate daily P&L monitoring. One year's worth of excel data worth of 5million rows. Can Qlik sense Desktop manage without server involvement?
I have more concerns regarding a Qlik Sense and would appreciate someone who is capable of Qlik Sense Desktop to get their expertise.
From a data processing point of view there is no difference between QS Desktop and QS Enterprise. Both products are using the same engine in the background. The limitation will be on the machine resources. Since Qlik Engine is an in-memory tool the main resource will be the RAM. So more RAM - the better.
Broadly speaking 5 million rows should be fine on any laptop/desktop. But again it depends on the machine specs.
Let me know in the comments (or open another question. SO sometimes clear old comments) if you have any other questions.
Related
In other words, can Zeppelin be used as a Tableau replacement at small scale?
I have a new UI/UX design of reporting dashboard. Data for dashboard comes from relational database (SQL Server). This dashboard is to be viewed by ~300 colleagues in my company. Perhaps up to ten of them will be viewing it at the same time.
Currently the dashboard is implemented in Kibana with data being imported into Elasticsearch from SQL Server on a regular basis. However, the new design requires certain widgets and data aggregations that go beyond dashboarding capabilities of Kibana. Additionally, my organization desires to migrate this dashboard to a technology which is considered more familiar for data scientists that work with us (Kibana isn't considered such).
This report and dashboard could be migrated to Tableau. Tableau is powerful enough to perform desired data aggregations and present all desired widgets. However we can't afford licenses cost, but we can invest as much developer's time as needed.
I have evaluated couple of open-source dashboarding tools (Metabase and Superset) and they are lacking aggregations and widgets that we need. I would not go into details because the question is not about specifics. It is clear that Metabase and Superset are not powerful enough for our needs.
I have an impression that Apache Zeppelin is powerful enough with its support for arbitrary Python code (I would use Pandas for data aggregations), graphs and widgets. However, I am not sure if single Zeppelin instance can support well number of concurrent viewers.
We'd like to build a set of notebooks and make them available to all colleagues in the organization (access control is not an issue, we trust each other). Notebooks will be interactive with data filters and date range pickers.
Looks like Zeppelin has switchable interpreter isolation modes which we can use to make different user's sessions isolated from each other. My question is whether a single t2.large AWS instance hosting Zeppelin can sustain up to ten users viewing report aggregated on 300k rows dataset. Also, are there any usability concerns which make an idea of multi-user viewing of reporting dashboard impractical for Zeppelin?
I see a couple questions you're asking:
Can Zeppelin replace Tableau on a small scale? This depends on what features you are using in Tableau. Every platform has its own set of features that the others do or don't have, and Tableau has a lot of customization options that you won't find elsewhere. Aim to get as much of your dashboard converted 1:1 then warm everyone up to the idea that it will look/operate a little bit different since it's on a different platform.
Can a t2.large hosting Zeppelin sustain up to 10 concurrent users viewing a report aggregated on 300k rows? A t2.large should be more than big enough to run Zeppelin, Tableau, Superset, etc. with 10 concurrent users pulling a report with 300k rows. 300k isn't really that much.
A good way to speed things up and squeeze more concurrent users on with your existing infrastructure is to speed up your data sources. That is where a lot of the aggregation calculations happen. Taking a look at your ETL's and trying to aggregate ahead of time can help, as well as make sure your data scientists aren't running massive queries slowing down your database server.
Am need of your suggestion for scenario below :
one of our clients has 8 postgres DB servers used as OLTP and now wants to generate MIS reports/dashboards integrating all the data in the servers.
- There are around 100 reports to be generated
- There would be around 50k rows added to each of these databases
- the reports are to be generated for once every month
- they are running all there setup in baremetals
- they don't want to use hadoop/spark , since they think the maintainabilty will be higher
- they want to use opensource tech to accomplish this task
with all said above, one approach would be to write scripts to bring aggregated data into one server
and then manually code the reports with frontend javascript.
is there any better approach using ETL tools like Talend,Pentaho etc.
which ETL tool would be best suited for this ?
community editions of any ETL tool would suffice the above requirement..?
I know for the fact that the commercial offering of any of the ETL tools will not be in the budget.
could you please let me know your views on this.
Thanks in Advance
Deepak
Sure Yes. I did similar things successfully a dozen time in my life.
My suggestion is to use Pentaho-Data-Integrator (or Talend) to collect the data in one place and then filter, aggregate and format the data. The data volume is not an issue as long as you have a decent server.
For the reports, I suggest to produce them with Pentaho-Report-Designer so that they can be send by mail (with Pentaho-DI) or distributed with a Pentaho-BI-server.
You can also make javascript front end with Pentaho-CDE.
All of these tools are mature, robust, easy to use, have a community edition and well supported by the community.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
A medical practice has approached us about using Filemaker as a fully-fledged EMR system with a HEAVY emphasis on using iPads to enter patient records, photos, digital signatures etc which can obviously be accessed on desktops as well. Ultimately they would like such a system to replace their current EMR and takeover all billing operations, patient scheduling and so forth. They only use Macs in their practice.
We have very little experience with Filemaker but found this discussing the Pros and Cons of it however it seems that Filemaker has come a long way since 2009 when that question was asked...
So overall I'm just trying to work out if Filemaker is suitable for such an application or what would be the pros and cons of using a combination of FMP12 and FM Go.
(Sorry if I've done anything wrong - first question...)
Thanks!
As a FileMaker Developer myself, I would say go for it. I agree with Mikhail - You WILL see results faster than any other platform. You can make changes yourself easily and live or you can get a FileMaker developer - just like you would need to get a developer for any application.
With an off-the-shelf application, they tend to be quite inflexible, however I am sure there are systems out there that allow some customisation.
FileMaker is a very capable product. We have written many applications for vertical markets, such as law firms and even a Harley Street plastic surgeon who gathers patient data on an iPad and even sketches the suggested surgery on a picture of the patient.
For those who think FileMaker is a baby, have a look at http://www.businessmancrm.com - this is a full ERP system used all over the world. This is not an advert, but a demonstration of what is possible with FileMaker.
Dollar for dollar, FileMaker will win hands down... and when it comes to time frames, there is no contest. We are open minded - We constantly look for other products to develop applications for ourselves and customers and we have not found anything more viable just yet.
Pros:
Extremely quick environment
Cross platform
Integrate other SQL data sources into application
ODBC Support
Remote Access
Can be run from a USB stick if needed!
Thousands of developers around the world
Large community
FileMaker Inc. have made a profit every single quarter since existence, therefore are stable and do have the backing of Apple!
Reasonable Cost
Make changes yourself
Easy to backup, supports incremental backup
Easy to secure and encrypt data on a network
Supports terminal server
FMGo is free
Cons
High level language (not low level with layout object control) - However does support plugins
Requires FileMaker client (unless a web application/interface is built in PHP or using IWP - Instant Web Publishing)
Proprietary Database (however can easily link into MySQL, MSSQL and Oracle)
Honestly, not worth it.
It's a very clunky front-end for a database.
If you do decide to pick it up your basically stuck with paying a Filemaker developer for the rest of its existence.
One of my clients at the moment has had it for the last ~6+? years after only being with them for the last 8 months i'm trying very hard to push them away from it and onto a newer system.
I can suggest looking at Mastercare EMR, Profile and MMEX.
FileMaker is perfectly capable of it, of course, and I expect you'll be getting first results much faster than with any other approach, especially with the iPad. There's quite a few EMRs out there written in FileMaker. There are downsides, of course; it was always targeted to end users so it ended up fairly inconventional from a common programmer's point of view. Many programmers dislike this. Being end-user it suffers from many simplifications (well, not exactly suffers, actually; this makes development faster as there's fewer choices), but people always want something special so there's a huge number of workarounds to overcome these simplifications. These workarounds vary from relatively harmless to very hairy ones.
For example, to sign documents on iPad you need to add a webviewer control pointed to a generated HTML page via the "data:" protocol. The page is going to have a JavaScript that captures user's touches, paints them on a canvas, and serializes this into a string. Later a script will capture the string, store it in a FileMaker field, and change the generated HTML to use this string so the JavaScript can redraw the signature. This one is relatively simple and since the functionality cannot be obtained in any other way, it's in wide use; there's even a commercial module for around $300. A complex app may consists of dozens of such workarounds; anyone who is not a FileMaker developer won't be able to understand why you need a webviewer to capture a signature or why you use a strange contraption of invisible tabs to display what looks like a simple pop-up list. I.e. it's not like you read a book and work from there; be ready to read quite a few blogs and frequent forums and mailing lists.
That said, it's a good product nonetheless with unique capabilities (that iPhone/iPad client, for one); paired with a good developer it can be very powerful.
Having developed an EMR system at a recent position for 3 years, I can tell you from experience that the requirements for a true EMR system may quickly outgrow the scope of what is easy to do in FileMaker. A few really big, important EMR features come to mind immediately:
Insurance Eligibility verification: is there going to be a way to hit all of the major payers' web services or a third party aggregator to verify insurance eligibility from the iPad?
Insurance Card OCR: sure you can snap a photo of an insurance card, but now you have back office staff typing that information in from an image. We implemented OCR of insurance cards in our EMR and it was a huge cost and time saver.
Security / Privacy concerns: HIPAA compliance is a big deal, and is FileMaker suitably transparent to be compliant? Is there any way to audit who looks at a record? How is the data transferred across the wire?
E-prescribing: All modern EMR's support electronic prescriptions, which carries a complex set of rules and implementation details along with it, I would want to be certain FileMaker could be integrated with an e-prescribing gateway before proceeding.
My main concern with using any off the shelf, cross platform tool to approach a problem as big and complex as an EMR would be getting painted into a corner down the road, having invested a bunch of time and money into a solution that may leave you unable to implement a feature or requirement, whereas paying the up front price of developing a native iOS app (and web apps and whatever else you need to integrate with) would eliminate that possibility, but obviously cost more.
We use MongoDB database add-on on Heroku for our SaaS product. Now that Amazon launched DynamoDB, a cloud database service, I was wondering how that changes the NoSQL offerings landscape?
Specifically for cloud based services or SaaS vendors, how will using DynamoDB be better or worse as compared to say MongoDB? Are there any cost, performance, scalability, reliability, drivers, community etc. benefits of using one versus the other?
For starters, it will be fully managed by Amazon's expert team, so you can bet that it will scale very well with virtually no input from the end user (developer).
Also, since its built and managed by Amazon, you can assume that they have designed it to work very well with their infrastructure so you can can assume that performance will be top notch. In addition to being specifically built for their infrastructure, they have chosen to use SSD's as storage so right from the start, disk throughput will be significantly higher than other data stores on AWS that are HDD backed.
I havent seen any drivers yet and I think its too early to tell how the community will react to this, but I suspect that Amazon will have drivers for all of the most popular languages and the community will likely receive this well - and in turn create additional drivers and tools.
Using MongoDB through an add-on for Heroku effectively turns MongoDB into a SaaS product as well.
In reality one would be comparing whatever service a chosen provider has compared to what Amazon can offer instead of comparing one persistance solution to another.
This is very hard to do. Each provider will have varying levels of service at different price points and one could consider the option of running it on their own hardware locally for development purposes a welcome option.
I think the key difference to consider is MongoDB is a software that you can install anywhere (including at AWS or at other cloud service or in-house) where as DynamoDB is a SaaS available exclusively as hosted service from Amazon (AWS). If you want to retain the option of hosting your application in-house, DynamoDB is not an option. If hosting outside of AWS is not a consideration, then, DynamoDB should be your default choice unless very specific features are of higher consideration.
There's a table in the following link that summarizes the attributes of DynamoDB and Cassandra:
http://www.datastax.com/dev/blog/amazon-dynamodb
Something that needs improvement on DynamoDB in order to become more usable is the possibility to index columns other than the primary key.
UPDATE 1 (06/04/2013)
On 04/18/2013, Amazon announced support for Local Secondary Indexes, which made DynamoDB f***ing great:
http://aws.amazon.com/about-aws/whats-new/2013/04/18/amazon-dynamodb-announces-local-secondary-indexes/
I have to be honest; I was very excited when I heard about the new DynamoDB and did attend the webinar yesterday. However it's so difficult to make a decision right now as everything they said was still very vague; I have no idea the functions that are going to be allowed / used through their service.
The one thing I do know is that scaling is automatically handled; which is pretty awesome, yet there are still so many unknowns that it's tough to really make a great analysis until all the facts are in and we can start using it.
Thus far I still see mongo as working much better for me (personally) in the project undertaking that I've been working on.
Like most DB decisions, it's really going to come down to a project by project decision of what's best for your need.
I anxiously await more information on the product, as for now though it is in beta and I wouldn't jump ship to adopt the latest and greatest only to be a tester :)
I think one of the key differences between DynamoDB and other NoSQL offerings is the provisioned throughput - you pay for a specific throughput level on a table and provided you keep your data well-partitioned you can always expect that throughput to be met. So as your application load grows you can scale up and keep you performance more-or-less constant.
Amazon DynamoDB seems like a pretty decent NoSQL solution. It is fast, and it is pretty easy to use. Other than having an AWS account, there really isn't any setup or maintenance required. The feature set and API is fairly small right now compared to MongoDB/CouchDB/Cassandra, but I would probably expect that to grow over time as feedback from the developer community is received. Right now, all of the official AWS SDKs include a DynamoDB client.
Pros
Lightning Fast (uses SSDs internally)
Really (really) reliable. (chances of write failures are lower)
Seamless scaling (no need to do manual sharding)
Works as webservices (no server, no configuration, no installation)
Easily integrated with other AWS features (can store the whole table into S3 or use EMR etc)
Replication is managed internally, so chances of accidental loss of data is negligible.
Cons
Very (very) limited querying.
Scanning is painful (I remember once a scanning through Java ran for 6 hours)
pre-defined throughput, which means sudden increase beyond the set throughput will be throttled.
throughput is partitioned as table is sharded internally. (which means if you had a throughput for 1000 and its partitioned in two and if you are reading only the latest data(from one part) then your throughput of reading is 500 only)
No joins, Limited indexing allowed (basically 2).
No views, triggers, scripts or stored procedure.
It's really good as an alternative to session storage in scalable application. Another good use would be logging/auditing in extensive system. NOT preferable for feature rich application with frequent enhancement or changes.
Related question: What is the most efficient way to break up a centralised database?
I'm going to try and make this question fairly general so it will benefit others.
About 3 years ago, I implemented an integrated CRM and website. Because I wanted to impress the customer, I implemented the cheapest architecture I could think of, which was to host the central database and website on the web server. I created a desktop application which communicates with the web server via a web service (this application runs from their main office).
In hindsight this was rather foolish, as now that the company has grown, their internet connection becomes slower and slower each month. Now, because of the speed issues, the desktop software times out on a regular basis, the customer is left with 3 options:
Purchase a faster internet connection.
Move the database (and website) to an in-house server.
Re-design the architecture so that the CRM and web databases are separate.
The first option is the "easiest", but certainly not the cheapest long term. Second option; if we move the website to in-house hosting, the client has to combat issues like overloaded/poor/offline internet connection, loss of power, etc. And the final option; the client is loathed to pay a whole whack of cash for me to re-design and re-code the architecture, and I can't afford to do this for free (I need to eat).
Is there any way to recover from when you've screwed up the design of a distributed system so bad, that none of the options work? Or is it a case of cutting your losses and just learning from the mistake? I feel terrible that there's no quick fix for this problem.
You didn't screw up. The customer wanted the cheapest option, you gave it to them, this is the cost that they put off. I hope you haven't assumed blame with your customer. If they're blaming you, it's a classic case of them paying for a Chevy while wanting a Mercedes.
Pursuant to that:
Your customer needs to make a business decision about what to do. Your job is to explain to them the consequences of each of the choices in as honest and professional a way as possible and leave the choice up to them.
Just remember, you didn't screw up! You provided for them a solution that served their needs for years, and they were happy with it until they exceeded the system's design basis. If they don't want to have to maintain the system's scalability again three years from now, they're going to have to be willing to pay for it now. Software isn't magic.
I wouldn't call it a screw up unless:
It was known how much traffic or performance requirements would grow. And
You deliberately designed the system to under-perform. And
You deliberately designed the system to be rigid and non adaptable to change.
A screw up would have been to over-engineer a highly complex system costing more than what the scale at the time demanded.
In fact it is good practice to only invest as much as can currently be leveraged by the business, using growth to fund further investment in scalability, should it be required. It is simple risk management.
Surely as the business has grown over time, presumably with the help of your software, they have also set aside something for the next level up. They should be thanking you for helping grow their business beyond expectations, and throwing money at you so you can help them carry through to the next level of growth.
All of those three options could be good. Which one is the best depends on cost benefits analysis, ROI etc. It is partially a technical decision but mostly a business one.
Congratulations on helping build a growing business up til now, and on to the future.
Are you sure that the cause of the timeouts is the internet connection, and not some performance issues in the web service / CRM system? By timeout I'm going to assume you mean something like ~30 seconds, in which case:
Either the internet connection is to blame and so you would see these sorts of timeouts to other websites (e.g. google), which is clearly unacceptable and so sorting the internet is your only real option.
Or the timeout is caused either by the desktop application, the web serice, or due to exessively large amounts of information being passed backwards and forwards, in which case you should either address the performance issue how you might any other bug, or look into ways of optimising the Desktop application so that less information is passed backwards and forwards.
In sort: the architecture that you currently have seems (fundamentally) fine to me, on the basis that (performance problems aside) access for the company to the CRM system should be comparable to accesss for the public to the system - as long as your customers have reasonable response times, so should the company.
Install a copy of the database on the local network. Then let the client software communicate with the local copy and let the database software do the synchronization between the local database server and the database on the webserver. It depends on which database you use, but some of them have tools to make that work. In MSSQL it is called replication.
First things first how much of the code do you really have to throw away? What language did you use for the Desktop client? Something .NET and you may be able to salvage a good chuck of the logic of the system and only need to redo the UI and some of the connections.
My thoughts are that 1 and 2 are out of the question, while 1 might be a good idea it doesn't solve the real problem. And we as engineers should try and build solutions not dependent on the client when ever possible. And 2 makes them get into something they aren't experts at and it is better to keep the hosting else where.
Also since you mention a web service is all you are really losing the UI? You can alway reuse the webservices for the web server interface.
Lastly you could look at using a framework to help provide a simple web based CRUD to start and then expand from there.
Are you sure the connection is saturated? You could be hitting all sorts of network, I/O and database problems... Unless you've already done so, use wireshark to analyze the traffic; measure the throughput and share the results with us.