Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a project with a requirement of coming up with a huge amount of data.
For this we are looking for a data store to save and fetch a huge amount of data. The database is easy, there is one object for vouchers and a one to many relation to transactions. One voucher has ~ 10 - 100 transaction.
Sometimes it is possible that the system has to generate several thousand voucher in a short time, and it also possible that the system writes or delete several thousand transaction.
And it is very important that the applications returns quickly if a voucher is valid or not (easy search request).
I have looked several blogs to find the best database for this and on the shortlist is
MongoDB
Elastic Search
Cassandra
My favourite is Elastic Search but I found several blogs which says ES is not reliable enough to use as a primary data store.
I also read some blogs that say mongodb has problems to run in cluster.
Do you have experience with Cassandra for a job like this? Or do you prefer any other database?
I've some experience on MongoDB, but I'll go agnostic on this.
There are MANY factors that goes in game when you say that you want a fast database. You have to think about indexing, vertical or horizontal scaling, relational or nosql, writing performance vs reading performance, and if you choose any of them should think about reading preferences, balancing, networking... The topics goes from the DB to the hardware.
I'd suggest go for a database you know, and that you can scale, admin and tune well.
In my personal experience, I've had no problems running MongoDB on cluster (sharding), may be problems comes due to a bad administration or planning, and that's why I suggest going for a database you know well.
The selection of the database is the least concern in designing a huge database that needs high performance. Most Nosql and Relational databases can be made to run this type of application effectively. The hardware is critical, the actual design of your database and your indexing is critical, the types of queries you run need to be performant.
If I were to take on a project that required a very large database with high performance, the first and most critical thing to do is to hire a database expert who has worked with those types of systems for several years. This is not something an application developer should EVER do. This is not the job for a beginner or even someone like me who has worked with only medium sized databases, albeit for over 20 years. You get what you pay for. In this case, you need to pay for real expertise at the design stage because database design mistakes are difficult to fix once they have data. Hire a contractor if you don't want a permanent emplyee, but hire expertise.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I'm sorry since I'm new to this.. I'm currently working on my startup which basically like food delivery system and I used Flutter for my app. I learned Flutter using Firebase as a backend. However, there are many sources that recommend use MongoDB as a backend database which I have zero knowledge about it. But I think that MongoDB just only offer for database crud operation. So I think for the database crud operation my app should use MongoDB. However, I think MongoDB is quite complicated when it involves authentication. So, which is better approach for me? should I use Firebase for authentication and MongoDB as the database or is it better to use only one platform for the backend whether its a Firebase or MongoDB? If I mix these two, does it will affect the pricing? Is there any ways that can make me clear which to choose.
MongoDB is an open source NoSQL database management program, which is quite useful for working with large sets of distributed data. It is mostly useful for big data applications and other processing jobs involving data that doesn't fit well in a rigid relational model.
It is an absolutely right approach to use Firebase Auth for just login or signup and the rest on MongoDB. There are 2 ways you can implement the Firebase Auth:
1. Using the SDK provided by Firebase
2. Using the Admin Auth API
You may select any of the above two approaches to save your UID on your custom Backend which might be MongoDB.
Contrarily, Firebase can also be used as backend. It provides the back-end server, a great database and analytics solution, and useful integrations with other Google products. Its free to use, affordable subscription options, wisely designed backend solution guarantees project scalability and data security makes it a great choice for backend.
However, for the vast majority of apps and use-cases, Firebase is an excellent choice. You can start with its free tier and don’t need to worry about maintenance or scalability. It’s great for small to medium developers as it allows them to lower initial costs while focusing on providing the best user experience.
When working on a heavy real-time app like chat, or some other highly collaborative experience, Firebase is still an option, though it might be a bit pricey.
However, the recommendation is always to consider your budget, the required feature set, and how much maintenance you’re willing to do on your own before making a decision.
You might also refer to this documentation, which will guide you with the pros and cons of choosing Firebase as backend.
Flutter: can I mix Firebase Auth with Mongodb Databases?
Check this similar post. If you still have doubts, feel free to ask.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm tasked with investigating for our firm a full-stack solution where we'll be using a NoSQL database backend. It'll most likely be fed from a data warehouse and/or operational data store of some type in near-realtime (hopefully :). It will be used mainly by our mobile and web applications via REST.
A few requirements/assumptions:
It will be read-only (in the near term) and consumed by clients in REST format
It has to be scalable
Fast response time
Enterprise support - or if lacking actual support, something industry proven if open-source (basically management wants to hold
someone accountable if something in the stack fails)
Minimal client data transformations - i.e: data should be stored in as close to ready-to-use format as possible
Service API Management of some sort will most likely be needed (eg: 3scale)
Services will be used internally, but solution shouldn't prevent us from exposing them externally as a longterm goal
Micro-services are preferable (provided sufficient API management is in place)
We have in-house expertise in Java and Grails for our mobile/portal solutions
Some of the options I was tossing around were:
CouchDB: inherently returns REST - no need for translation layer - as
long as clients speak REST, we're all good
MongoDB: need a REST layer in between client and DB - haven't found a widely used one based on my investigation (the ones on Mongo's site all seem in their infancy - i.e: RestHeart)
Some questions I have:
Do I need an appserver? Or any layer in between the client and DB
for performance/caching reasons? I was thinking a reverse-proxy like
nginx would be a good idea for this?
Why not use CouchDB in this solution if it supports REST out of the box?
I'm struggling with deciding between which NoSQL DB to use, whether or not I need a REST translation layer, appserver, etc. I've read the pros and cons of each and mostly they say go Mongo - but for what I'm trying to do the lack of a mature REST layer is concerning.
I'm just looking for some ideas, tips, lessons learned that anyone out there would be willing to share.
Thanks!
The problem with exposing the database directly to the client is that most databases do not support permission control which is as fine-grained as you want it to be. You often can not allow a client to view and edit its own data while also forbidding it from viewing and editing any data of other users or even worse from the server itself. At least not when you still want a sane database schema.
You will also often find yourself in the situation that you have a document with several fields of which only some are supposed to be under the control of the user and others are not. I can, for example, edit the content of this answer, but I can not edit the time it was posted, the name it was posted under or its voting score. So far I have never seen a database system which can handle permission for individual fields (when anyone has: feel free to post in the comments).
You might think about trying to handle this on the client and just don't offer any user interface for editing said fields. But that will only work in a trusted environment. When you have untrusted users, they could create a clone of your client-sided application which does expose this functionality. There is no way for you to tell the difference between the genuine client and a clone, especially not when you don't have a smart application server (and even then it is practically impossible).
For that reason it is almost always required to have an application server between clients and database which handles authentication and permission management of the clients and only forwards those requests to the persistence layer which are permitted.
I totally agree with the answer from #Philipp. In the case of using CouchDB you will minimum want to use a proxy server in front to enable SSL.
Almost all of your requirements can be fulfilled by CouchDB. Especially the upcoming v2 will give you the "datacenter-needs".
But it's simply very complex to answer what should be the right tool for you purpose. If you get some business model requirements on top like lets say: throttling - then you will definitely need an application server middleware like http://mcavage.me/node-restify/
Maybe it's a good idea to spend some money to professionals like
http://www.neighbourhood.ie/couchdb-support/ ? (I'm not involved)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am currently on design phase of a MMO browser game, game will include tilemaps for some real time locations (so tile data for each cell) and a general world map. Game engine I prefer uses MongoDB for persistent data world.
I will also implement a shipping simulation (which I will explain more below) which is basically a Dijkstra module, I had decided to use a graph database hoping it will make things easier, found Neo4j as it is quite popular.
I was happy with MongoDB + Neo4J setup but then noticed OrientDB , which apparently acts like both MongoDB and Neo4J (best of both worlds?), they even have VS pages for MongoDB and Neo4J.
Point is, I heard some horror stories of MongoDB losing data (though not sure it still does) and I don't have such luxury. And for Neo4J, I am not big fan of 12K€ per year "startup friendly" cost although I'll probably not have a DB of millions of vertexes. OrientDB seems a viable option as there may be also be some opportunities of using one database solution.
In that case, a logical move might be jumping to OrientDB but it has a small community and tbh didn't find much reviews about it, MongoDB and Neo4J are popular tools widely used, I have concerns if OrientDB is an adventure.
My first question would be if you have any experience/opinion regarding these databases.
And second question would be which Graph Database is better for a shipping simulation. Used Database is expected to calculate cheapest route from any vertex to any vertex and traverse it (classic Dijkstra). But also have to change weights depending on situations like "country B has embargo on country A so any item originating from country A can't pass through B, there is flood at region XYZ so no land transport is possible" etc. Also that database is expected to cache results. I expect no more than 1000 vertexes but many edges.
Thanks in advance and apologies in advance if questions are a bit ambiguous
PS : I added ArangoDB at title but tbh, hadn't much chance to take a look.
Late edit as of 18-Apr-2016 : After evaluating responses to my questions and development strategies, I decided to use ArangoDB as their roadmap is more promising for me as they apparently not trying to add tons of hype features that are half baked.
Disclaimer: I am the author and owner of OrientDB.
As developer, in general, I don't like companies that hide costs and let you play with their technology for a while and as soon as you're tight with it, start asking for money. Actually once you invested months to develop your application that use a non standard language or API you're screwed up: pay or migrate the application with huge costs.
You know, OrientDB is FREE for any usage, even commercial. Furthermore OrientDB supports standards like SQL (with extensions) and the main Java API is the TinkerPop Blueprints, the "JDBC" standard for Graph Databases. Furthermore OrientDB supports also Gremlin.
The OrientDB project is growing every day with new contributors and users. The Community Group (Free channel to ask support) is the most active community in GraphDB market.
If you have doubts with the GraphDB to use, my suggestion is to get what is closer to your needs, but then use standards as more as you can. In this way an eventual switch would have a low impact.
It sounds as if your use case is exactly what ArangoDB is designed for: you seem to need different data models (documents and graphs) in the same application and might even want to mix them in a single query. This is where a multi-model database as ArangoDB shines.
If MongoDB has served you well so far, then you will immediately feel comfortable with ArangoDB, since it is very similar in look and feel. Additionally, you can model graphs by storing your vertices in one (or multiple) collections, and your edges in one or more so-called "edge-collections". This means that individual edges are simply documents in their own right and can hold arbitrary JSON data. The database then offers traversals, customizable with JavaScript to match any needs you might have.
For your variations of the queries, you could for example add attributes about these embargos to your vertices and program the queries/traversals to take these into account.
The ArangoDB database is licensed under the Apache 2 license, and community as well as professional support is readily available.
If you have any more specific questions do not hesitate to ask in the google group
https://groups.google.com/forum/#!forum/arangodb
or contact
hackers (at) arangodb.org
directly.
Neo4j's pricing is actually quite flexible, so don't be put away by the prices on the website.
You can also get started with the community edition or personal edition for a long time.
The Neo4j community is very active and helpful and quickly provide support and help for your questions. I think that's the biggest plus besides performance and convenience. I
n general using a graph model
Regarding your use-case:
Neo4j is used exactly for this route calculation scenario by one of the largest logistic companies in the world where it routes up to 4000 packages per second across the country.
And it is used in other game engines, like here at GameSys for game economy simulation and in another one for the routing (not in earth coordinates but in game-world-coordinates using Neo4j-Spatial).
I'm curious why you have only that few nodes? Are those like transport portals? I wonder where you store the details and the dynamics about the routes (like the criteria you mentioned) are they coming from the outside - in memory state of the game engine?
You should probably share some more details about your model and the concrete use-case.
And it might help to know that both Emil, one of the founders of Neo4j and I are old time players of multi user dungeons (MUDs), so it is definitely a use-case close to our heart :)
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
We have an internal software development team in my company that are building a set of software frameworks and some infrastructure (database hosting) to be used by other teams. Since they are not delivering software directly to any external client, they are trying to figure out how to chargeback to recoup their costs / budget They are trying to determine the correct payment model for other teams to use their services. So essentially we would be paying for the following resources:
Using the software frameworks
Using the infrastructure (servers - disk, memory, etc)
The support people in this group to maintain the infrastructure
The developers that are building the frameworks.
This seems like a generic problem that must exist with other software shops so I wanted to see if there was some standard or a suggestion here?
The fact that you are considering this kind of relationship within a shop is probably very symptomatic of our times. If you think the framework team is useful, then measure its usefulness, in a somewhat fine-grained way, and just give the team the budget to keep being useful!
For example, have other teams assess their use of specific features in the framework and their wishlists, and check that the framework team only works on parts of the code relevant to those.
Paying for services could have some problematic drawbacks, as with any such arbitrary metric. For example, it gives other teams an incentive not to use their services: it encourages homegrown indrastructure, custom code(so you get a more potent NIH syndrome).
What about estimating the cost of each tasks?
Every company I worked at, used a classical man/day system.
When a task has to be done, the human resource cost is assessed in man/days, that is, how many days would it takes if only one worker was on this task?
With that in mind, with an estimate of the time spent to do a task (often done by the more experienced people) and since you (should) know how much a worker costs, it's easy to find how much doing the task would cost (time estimated x cost of worker). That way, some very basic and redundant tasks, like reading a new database, a new server, creating new accounts... can easily have a basic man day cost.
And to optimize the budget the company can easily look at the tasks that are the most redundant and expensive to automatize them and reduce the overall costs of a team. Of course, the automation of a task is a project in itself and need to be estimated. And depending on the productivity gain of project some priorization has to be done.
I think that the main driver of this cost assessment is having projects/tasks and time tracking system. That way it's easy to produce statistics on what takes times.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I need to store huge amount of binary files (10 - 20 TB, each file ranging from 512 kb to 100 MB).
I need to know if Redis will be efficient for my system.
I need following properties in my system:
High Availability
Failover
Sharding
I intend to use a cluster of commodity hardware to reduce costing as much as possible. Please suggest pros and cons of building such a system using Redis. I am also concerned about high ram requirements of Redis.
I would not use Redis for such a task. Other products will be a better fit IMO.
Redis is an in-memory data store. If you want to store 10-20 TB of data, you will need 10-20 TB of RAM, which is expensive. Furthermore, the memory allocator is optimized for small objects, not big ones. You would probably have to cut your files in various small pieces, it would not be really convenient.
Redis does not provide an ad-hoc solution for HA and failover. A master/slave replication is provided (and works quite well), but with no support for the automation of this failover. Clients have to be smart enough to switch to the correct server. Something on server-side (but this is unspecified) has to switch the roles between master and slaves nodes in a reliable way. In other words, Redis only provides a do-it-yourself HA/failover solution.
Sharding has to be implemented on client-side (like with memcached). Some clients have support for it, but not all of them. The fastest client (hiredis) does not. Anyway, things like rebalancing has to be implemented on top of Redis. Redis Cluster which is supposed to support such sharding capabilities is not ready yet.
I would suggest to use some other solutions. MongoDB with GridFS can be a possibility. Hadoop with HDFS is another one. If you like cutting edge projects, you may want to give the Elliptics Network a try.