Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am using postgreSQL to store and process data for a research project. I can program in SQL, R, and Python but am not a software developer or system administrator. I find myself constantly aggregating data and then wanting to see the individual records contributing to a single cell in the aggregation. The records contain text fields and I use CASE and LIKE statements to determine how these will be counted. I'm looking for a GUI that will allow me to quickly move between different levels and kinds of aggregation so I don't lose access to details when looking at the big picture. I believe the answer to my question involves OLAP and/or faceted search but would like recommendations for specific products, open source and turnkey if possible.
thank you,
-david
icCube is not open source but allow for going from the big picture to the details (either via drilldown or drillthrough). Depending on your PostgreSQL model the work to setup the cube model might be minimal. Note once the model has been setup you've the full power of MDX analysis for more challenging requests.
Maybe Power Pivot from Microsoft is a tool that would be right for you. For Excel 2010, it is a plugin that you can download free of charge from Microsoft. For Excel 2013 and Excel as part of Office365 (the cloud based MS Office), it is already contained. Older versions of Excel are not supported. The tool is an OLAP solution aimed to be used by business users without support from IT staff. Data is saved in the Excel workbook in an internal, compressed format optimized for fast analysis (millions of rows are not a problem), and you use a formula language very much like that used within standard Excel to define calculations, while you analyze the data script free with point and click pivot tables.
Basically, you don't want to lose any of your detailed data, to allow for the drill-down OLAP operation.
In a data warehouse, the grain of say, customer orders, would be order line item, ie the most detailed.
What you should do is to figure out which aggregates to pre-calculate, and use a tool to do automate that for you. The aggregated data would go in its own tables.
A smart OLAP cube will realize when you should use an aggregate and re-write your query to use the aggregated data instead.
Check out Pentaho Aggregation Designer, as well as Mondrian OLAP server/Saiku pivot tables. All FOSS.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Looking for some advice... just finished a ETL pipeline where all data ends up in Amazon Athena. The data is produced via the click stream of high volume mobile apps (so essentially it’s lots and lots of raw events). I want to build a number of dashboards for the business that show different metrics/KPIs depending on the requirements. However, since we’re talking about huge volumes of data I’m not sure the best way to do this? Here’s an example:
I want a dashboard that shows all the MAUs (monthly active users), along with certain pages that perform particularly well and the most popular navigation routes through the app. My thinking is that I’d want a custom query per graph I.e. one query that is counting the distinct IDs each day (and then refreshing every 24hr)... another query for a graph that produces a breakdown of counts per page and truncates... etc
The main reason for thinking this is otherwise I’d be pulling in huge amounts of raw data just to calculate a simple metric like MAUs (not even sure extract would work - certainly wouldn’t be efficient).
Is this completely the wrong approach? Any suggestions/feedback?
Thanks in advance!
It sounds like you have multiple unrelated SQL queries that you want to run once per day, and update in Tableau once per day.
There's always a pull-push between the processing at the source and in the visualization engine.
Set up a Tableau server extract for each Athena SQL query. Build your dashboards, and schedule your extracts to refresh daily. Like an OLAP cube, this will process all the aggregates your dashboards need with the refresh, for better dashboard performance.
Alternatively, if you feel you don't need all the detail in Tableau, then build your aggregates in SQL, so that your Tableau data sources are smaller.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a project with a requirement of coming up with a huge amount of data.
For this we are looking for a data store to save and fetch a huge amount of data. The database is easy, there is one object for vouchers and a one to many relation to transactions. One voucher has ~ 10 - 100 transaction.
Sometimes it is possible that the system has to generate several thousand voucher in a short time, and it also possible that the system writes or delete several thousand transaction.
And it is very important that the applications returns quickly if a voucher is valid or not (easy search request).
I have looked several blogs to find the best database for this and on the shortlist is
MongoDB
Elastic Search
Cassandra
My favourite is Elastic Search but I found several blogs which says ES is not reliable enough to use as a primary data store.
I also read some blogs that say mongodb has problems to run in cluster.
Do you have experience with Cassandra for a job like this? Or do you prefer any other database?
I've some experience on MongoDB, but I'll go agnostic on this.
There are MANY factors that goes in game when you say that you want a fast database. You have to think about indexing, vertical or horizontal scaling, relational or nosql, writing performance vs reading performance, and if you choose any of them should think about reading preferences, balancing, networking... The topics goes from the DB to the hardware.
I'd suggest go for a database you know, and that you can scale, admin and tune well.
In my personal experience, I've had no problems running MongoDB on cluster (sharding), may be problems comes due to a bad administration or planning, and that's why I suggest going for a database you know well.
The selection of the database is the least concern in designing a huge database that needs high performance. Most Nosql and Relational databases can be made to run this type of application effectively. The hardware is critical, the actual design of your database and your indexing is critical, the types of queries you run need to be performant.
If I were to take on a project that required a very large database with high performance, the first and most critical thing to do is to hire a database expert who has worked with those types of systems for several years. This is not something an application developer should EVER do. This is not the job for a beginner or even someone like me who has worked with only medium sized databases, albeit for over 20 years. You get what you pay for. In this case, you need to pay for real expertise at the design stage because database design mistakes are difficult to fix once they have data. Hire a contractor if you don't want a permanent emplyee, but hire expertise.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am currently on design phase of a MMO browser game, game will include tilemaps for some real time locations (so tile data for each cell) and a general world map. Game engine I prefer uses MongoDB for persistent data world.
I will also implement a shipping simulation (which I will explain more below) which is basically a Dijkstra module, I had decided to use a graph database hoping it will make things easier, found Neo4j as it is quite popular.
I was happy with MongoDB + Neo4J setup but then noticed OrientDB , which apparently acts like both MongoDB and Neo4J (best of both worlds?), they even have VS pages for MongoDB and Neo4J.
Point is, I heard some horror stories of MongoDB losing data (though not sure it still does) and I don't have such luxury. And for Neo4J, I am not big fan of 12K€ per year "startup friendly" cost although I'll probably not have a DB of millions of vertexes. OrientDB seems a viable option as there may be also be some opportunities of using one database solution.
In that case, a logical move might be jumping to OrientDB but it has a small community and tbh didn't find much reviews about it, MongoDB and Neo4J are popular tools widely used, I have concerns if OrientDB is an adventure.
My first question would be if you have any experience/opinion regarding these databases.
And second question would be which Graph Database is better for a shipping simulation. Used Database is expected to calculate cheapest route from any vertex to any vertex and traverse it (classic Dijkstra). But also have to change weights depending on situations like "country B has embargo on country A so any item originating from country A can't pass through B, there is flood at region XYZ so no land transport is possible" etc. Also that database is expected to cache results. I expect no more than 1000 vertexes but many edges.
Thanks in advance and apologies in advance if questions are a bit ambiguous
PS : I added ArangoDB at title but tbh, hadn't much chance to take a look.
Late edit as of 18-Apr-2016 : After evaluating responses to my questions and development strategies, I decided to use ArangoDB as their roadmap is more promising for me as they apparently not trying to add tons of hype features that are half baked.
Disclaimer: I am the author and owner of OrientDB.
As developer, in general, I don't like companies that hide costs and let you play with their technology for a while and as soon as you're tight with it, start asking for money. Actually once you invested months to develop your application that use a non standard language or API you're screwed up: pay or migrate the application with huge costs.
You know, OrientDB is FREE for any usage, even commercial. Furthermore OrientDB supports standards like SQL (with extensions) and the main Java API is the TinkerPop Blueprints, the "JDBC" standard for Graph Databases. Furthermore OrientDB supports also Gremlin.
The OrientDB project is growing every day with new contributors and users. The Community Group (Free channel to ask support) is the most active community in GraphDB market.
If you have doubts with the GraphDB to use, my suggestion is to get what is closer to your needs, but then use standards as more as you can. In this way an eventual switch would have a low impact.
It sounds as if your use case is exactly what ArangoDB is designed for: you seem to need different data models (documents and graphs) in the same application and might even want to mix them in a single query. This is where a multi-model database as ArangoDB shines.
If MongoDB has served you well so far, then you will immediately feel comfortable with ArangoDB, since it is very similar in look and feel. Additionally, you can model graphs by storing your vertices in one (or multiple) collections, and your edges in one or more so-called "edge-collections". This means that individual edges are simply documents in their own right and can hold arbitrary JSON data. The database then offers traversals, customizable with JavaScript to match any needs you might have.
For your variations of the queries, you could for example add attributes about these embargos to your vertices and program the queries/traversals to take these into account.
The ArangoDB database is licensed under the Apache 2 license, and community as well as professional support is readily available.
If you have any more specific questions do not hesitate to ask in the google group
https://groups.google.com/forum/#!forum/arangodb
or contact
hackers (at) arangodb.org
directly.
Neo4j's pricing is actually quite flexible, so don't be put away by the prices on the website.
You can also get started with the community edition or personal edition for a long time.
The Neo4j community is very active and helpful and quickly provide support and help for your questions. I think that's the biggest plus besides performance and convenience. I
n general using a graph model
Regarding your use-case:
Neo4j is used exactly for this route calculation scenario by one of the largest logistic companies in the world where it routes up to 4000 packages per second across the country.
And it is used in other game engines, like here at GameSys for game economy simulation and in another one for the routing (not in earth coordinates but in game-world-coordinates using Neo4j-Spatial).
I'm curious why you have only that few nodes? Are those like transport portals? I wonder where you store the details and the dynamics about the routes (like the criteria you mentioned) are they coming from the outside - in memory state of the game engine?
You should probably share some more details about your model and the concrete use-case.
And it might help to know that both Emil, one of the founders of Neo4j and I are old time players of multi user dungeons (MUDs), so it is definitely a use-case close to our heart :)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
My boss asked me to develop an order system for our company's salesmen. Our company has almost 100,000 items to sale. In order to improve the performance, we will ask salesmen to download all data from sql server to iPhone local sqllite one time per week, and build index.
I'm a windows mobile developer, and it's very easy to use RDA to download data from sql server to local sqlce. The size in windows mobile device is about 20M.Now I need to do the same thing in iPhone.
I'm a newbie at iPhone development. Please give me some ideas about this project.Any input will be appreciated
Here is some information on using SQLite in iOS: iOS offline data storage tutorial
You'll probably need to export the DB as SQL and download it from the server, then import the SQL into SQLite.
As another answerer suggested, you could expose a REST interface on the server--assuming your server is setup to export the contents of the entire product database. Then there are any number of third party tools for importing JSON data (eg: via REST) into CoreData. Or if your REST data isn't too complicated it's not hard to parse it and directly add it to CoreData.
I personally recommend CoreData rather than using sqlite directly--iOS makes it very easy to do so. But it's also a matter of personal choice and I know lots of people prefer to use sqlite directly (especially if they want to build some cross-platform code, eg: to make an Android version which shares the same DB schema and logic).
There's probably many ways to do this, but I would go with build a rest api for the server data. Then on the iPhone side of things, make a network call to access the data.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
If a business needed to connect to a various database and generate PDF mostly reports, what are some good tools. Commercial or Opensource. Non technical users should also be able to generate various reports with good looking charts and tabular data through a report designer tool. As well, we should be able to deploy these charts on the web and generate HTML or PDF.
We looked at various tools like Adobe LiveCycle and haven't looked at Crystal Reports.
I am more the technical person and not really the business guy and I would mind something more techie like Eclipse's BIRT (business reporting tool). Everything looks good with Birt and does exactly what we might need but the charts don't look that impressive.
And with Crystal Reports, once you bring in those vendors, they sell a bunch of stuff that you normally don't need and it is impossible to get stuff done. But I could be wrong.
Commercial and for large applications:
BI Publisher
Telerik reporting looks great. Main advantage is that you can create your reports , and store them in dll assemblies that can be used on the web and on the desktop viewer. In the same time, with all exports needed.
Disadvantage is that report designer is still in Visual Studio. DotNet only.
I used abcPDF by webSuperGoo product. It was fine and simple for quick develoment.
I used this about 5 years ago so it should be a bit more up to date by now.
Livecycle Forms is a good choice. The whole thing is about constructing PDF documents starting with a template which you design and injecting data into the template in the form of XML. The final result is a flat or interactive PDF. Livecycle Forms was designer to produce interactive PDFs but it is pretty simple to flatten the documents when rendering is completed.
You may have to write a decent amount of custom code to construct the XML documents, but the PDF construction capabilities are pretty impressive. If you have a complicated workflow, the Livecycle package also has a workflow designer that you can use.
See this question for Java solutions...
Here's my honest (yet biased) answer copied from there:
i-net Crystal-Clear
Simple and easy-to-use API of both the report engine and the Java report viewer.
Can export into any major format like PDF, HTML, SVG, XLS, etc., as well as into a Java applet viewer. (See samples)
Comes with a free and powerful graphical report template designer. (See video guide)
Installs as a WAR file on your application server or can be used as a library within your own application.
Great technical support (you usually get an answer in minutes or hours rather than days or weeks)
Charts based on JFreeChart (so includes Stock charts).
Can read Crystal Reports templates. (for a lot of customers, this is the killer feature since you don't have to re-create all your old Crystal Reports templates)
Great and competitive pricing - effectively costing "less than open source" if you count in support costs - which you definitely should.
Free 90-day trial.
[full disclosure: Yes, I work for the company that produces this. But it's still my honest (though subjective and biased) answer to the question. ;)]