What are advantages of using Sphinx as a search engine for data in a PostgreSQL database? - postgresql

I've liked the idea of using PostgreSQL's built-in text search features to keep my database queries and search queries all in one place.
But are there advantages to using a dedicated search engine and indexer -- and Sphinx in particular -- that I might be missing if I rely solely on Postgres's native search facilities? What are they?

For the moment being Sphinx is faster, as it is a specialized system.
On the other hand, PostgreSQL is also evolving. Take a look at this message from Oleg Bartunov, announcing coming performance improvements to the FTS. And this is the presentation that was given by him on the latest PostgreSQL conference in Prague.
Just look through and decide whether your projects fits into the planned delivery terms, according to the info I can find these changes will be available in 9.3, which is planned “sometime” around comming summer.

Related

NoSQL Database for ECommerce

I will be constructing an ecommerce site, and would like to use a no-sql database, which will fit well with the plans for the app. But when it comes to which database would fit the job, im not sure. After comparing various DB's, the ones that seem best might be either mongo, couch, or even orientdb. I have seen arguments for all of them to be used or not used compared to something like MySQL. But between themselves (nosql databases), which one would fit well with an ecommerce solution?
Note, for the use case, i wont be having thousands of transactions a second. Or similarly high write rates. they will be moderate sure, but at a level that any established database could handle.
CouchDB: Has master to master replication, which I could really use. If not, I will still have to implement the same functionality in code anyways. I need to be able to have a users database, sync with the mothership. (users will have their own, potentially localhost database, that could sync with the main domains server). Couch is also fast, once your queries have been stored in the db.As i will probably have a higher need for read performance. Though not by a lot.
MongoDB: queries are very easy and user friendly. Also, with the fact that end users may need to query for certain things at a given time that I may not be able to account for ahead of time, this seems like it may be a better fit. I dont have to pre-store my queries in the db. Does support atomic transactions, though only when writing to a single document at a time.
OrientDB: A graph database. much different that most people are used to, but with the needs, it could fit very well too. Orient has the benefits of being schemaless, as well as having support for ACID transactions. There is a lot of customer, and product relationships that a graph database could be great with. Orient also support master to master replication, similar to couchdb.
Dont get me wrong, I can see how to build this traditionally with something like MySQL, but the ease and simplicity of a nosql solution, is very attractive. Although, in my case, needing a schemaless solution, would be much easier in nosql rather than mysql. a given product may have more or less items, than another. and avoiding recreating a table whenever a new field is added, is preferrable.
So between these 3 (or even others you think may be better), what features in each could potentially work for, or against me in regards to an ecommerce based site, when dealing with customer transactions?
Edit: The reason I am not using an existing solution, is because with the integrated features I need, there are no solutions available out there. We are also aiming to use this as a full product for our company. There will be a handful of other integrations than just sales. It is also going to be working with a store's POS system.
Since e-commerce can encompass everything from shopping carts through to membership and recurring subscriptions, it is hard to guess exactly what requirements and complexity you are envisioning.
When constructing an e-commerce site, one of the early considerations should be investigating whether there is already an established e-commerce product or toolkit that could meet your requirements. There are many subtleties to processes like ordering, invoicing, payments, products, and customer relationships even when your use case appears to be straightforward. It may also be possible to separate your application into the catalog management aspects (possibly more custom) versus the billing (potentially third party, perhaps even via a hosted billing/payment API).
Another consideration should be who you are developing the e-commerce site for: is this to scratch your own itch, or for a client? Time, budget, and features for a custom build can be difficult to estimate and schedule .. and a niche choice of technology may make it difficult to find/hire additional development expertise.
A third consideration is what your language(s) of choice are for developing your application. Some languages will have more complete/mature/documented drivers and/or framework abstractions for the different databases.
That said, writing an e-commerce system appears to be a rite of passage for many developers ;-).
Edit: a lot has changed since this answer was originally posted in 2012 and you should definitely refer to current product information. For example, MongoDB has had support for Decimal128 values since MongoDB 3.4 (2016) and multi-document transactions since MongoDB 3.6 (2017).
Check the comparison of different available NoSql databases here. Suit your requirement as per that.
MongoDB 4 now multi-document ACID transactions! That makes it suitable for e-Commerce!
Check out: https://www.mongodb.com/transactions

What are the pros and cons of DynamoDB with respect to other NoSQL databases?

We use MongoDB database add-on on Heroku for our SaaS product. Now that Amazon launched DynamoDB, a cloud database service, I was wondering how that changes the NoSQL offerings landscape?
Specifically for cloud based services or SaaS vendors, how will using DynamoDB be better or worse as compared to say MongoDB? Are there any cost, performance, scalability, reliability, drivers, community etc. benefits of using one versus the other?
For starters, it will be fully managed by Amazon's expert team, so you can bet that it will scale very well with virtually no input from the end user (developer).
Also, since its built and managed by Amazon, you can assume that they have designed it to work very well with their infrastructure so you can can assume that performance will be top notch. In addition to being specifically built for their infrastructure, they have chosen to use SSD's as storage so right from the start, disk throughput will be significantly higher than other data stores on AWS that are HDD backed.
I havent seen any drivers yet and I think its too early to tell how the community will react to this, but I suspect that Amazon will have drivers for all of the most popular languages and the community will likely receive this well - and in turn create additional drivers and tools.
Using MongoDB through an add-on for Heroku effectively turns MongoDB into a SaaS product as well.
In reality one would be comparing whatever service a chosen provider has compared to what Amazon can offer instead of comparing one persistance solution to another.
This is very hard to do. Each provider will have varying levels of service at different price points and one could consider the option of running it on their own hardware locally for development purposes a welcome option.
I think the key difference to consider is MongoDB is a software that you can install anywhere (including at AWS or at other cloud service or in-house) where as DynamoDB is a SaaS available exclusively as hosted service from Amazon (AWS). If you want to retain the option of hosting your application in-house, DynamoDB is not an option. If hosting outside of AWS is not a consideration, then, DynamoDB should be your default choice unless very specific features are of higher consideration.
There's a table in the following link that summarizes the attributes of DynamoDB and Cassandra:
http://www.datastax.com/dev/blog/amazon-dynamodb
Something that needs improvement on DynamoDB in order to become more usable is the possibility to index columns other than the primary key.
UPDATE 1 (06/04/2013)
On 04/18/2013, Amazon announced support for Local Secondary Indexes, which made DynamoDB f***ing great:
http://aws.amazon.com/about-aws/whats-new/2013/04/18/amazon-dynamodb-announces-local-secondary-indexes/
I have to be honest; I was very excited when I heard about the new DynamoDB and did attend the webinar yesterday. However it's so difficult to make a decision right now as everything they said was still very vague; I have no idea the functions that are going to be allowed / used through their service.
The one thing I do know is that scaling is automatically handled; which is pretty awesome, yet there are still so many unknowns that it's tough to really make a great analysis until all the facts are in and we can start using it.
Thus far I still see mongo as working much better for me (personally) in the project undertaking that I've been working on.
Like most DB decisions, it's really going to come down to a project by project decision of what's best for your need.
I anxiously await more information on the product, as for now though it is in beta and I wouldn't jump ship to adopt the latest and greatest only to be a tester :)
I think one of the key differences between DynamoDB and other NoSQL offerings is the provisioned throughput - you pay for a specific throughput level on a table and provided you keep your data well-partitioned you can always expect that throughput to be met. So as your application load grows you can scale up and keep you performance more-or-less constant.
Amazon DynamoDB seems like a pretty decent NoSQL solution. It is fast, and it is pretty easy to use. Other than having an AWS account, there really isn't any setup or maintenance required. The feature set and API is fairly small right now compared to MongoDB/CouchDB/Cassandra, but I would probably expect that to grow over time as feedback from the developer community is received. Right now, all of the official AWS SDKs include a DynamoDB client.
Pros
Lightning Fast (uses SSDs internally)
Really (really) reliable. (chances of write failures are lower)
Seamless scaling (no need to do manual sharding)
Works as webservices (no server, no configuration, no installation)
Easily integrated with other AWS features (can store the whole table into S3 or use EMR etc)
Replication is managed internally, so chances of accidental loss of data is negligible.
Cons
Very (very) limited querying.
Scanning is painful (I remember once a scanning through Java ran for 6 hours)
pre-defined throughput, which means sudden increase beyond the set throughput will be throttled.
throughput is partitioned as table is sharded internally. (which means if you had a throughput for 1000 and its partitioned in two and if you are reading only the latest data(from one part) then your throughput of reading is 500 only)
No joins, Limited indexing allowed (basically 2).
No views, triggers, scripts or stored procedure.
It's really good as an alternative to session storage in scalable application. Another good use would be logging/auditing in extensive system. NOT preferable for feature rich application with frequent enhancement or changes.

Is nosql a right tool for multy level forum like comment system?

I want to build a web app similar to Reddit.com, where you have multy level of comments, lots of reads and writes. I was wondering if nosql and mongoDB in particular is the right tool for this?
Comments -- it's really thing for nosql database, no doubt. You avoiding multiple joins to itself. And it's means that your system can scale out!
With mongodb you can store all hierarchy within one document. Some peoples can say that here will be problems with atomic updates, but i guess that it's not a problem because of you can load and save back entire comments tree. In any way you can easy redesign your system later to support atomic updates and avoid issues with concurrency.
Reddit itself uses Cassandra. If you want something "similar to reddit.com," maybe you should look at their source -- https://github.com/reddit/reddit/wiki.
Here's what David King (ketralnis) said earlier this year about the Cassandra 0.7 release: "Running any large website is a constant race between scaling your user base and scaling your infrastructure to support it. Our traffic more than tripled this year, and the transparent scalability afforded to us by Apache Cassandra is in large part what allowed us to do it on our limited resources. Cassandra v0.7 represents the real-life operations lessons learned from installations like ours and provides further features like column expiration that allow us to scale even more of our infrastructure."
However, Rick Branson notes that Reddit doesn't take full advantage of Cassandra's features, so if you were to start from scratch, you'd want to do some things differently.

Can mongodb be used as an embedded database?

I am working on a RSS reader application. And I need to find a backend database. I want the database be embedded because I don't want the users to install a database server.
I know SQLite is a good choice, but I am wondering if there are any other nosql choices?
(I don't yet have 50 rep points to comment on, and build upon, the accepted answer; otherwise I would, sorry!)
You can embed MongoDB in your OEM solution but there are two things to consider:
It is written in C++, so if you are coding in a different language you might need to write a wrapper that launchers the database process separately.
MongoDB is licensed under Gnu AGPL-3.0 which is a copy left server license. The accepted answer, and the Google group quote, both correctly state that this would normally force you to also be AGPL licensed. However, they MongoDb states that the intention of the license is to allow refinements to their code to be submitted back, and that your product will remain separate. This makes me think that the normal copy left rules don't apply.
The goal of the server license is to require that enhancements to MongoDB be released to the community. Traditional GPL often does not achieve this anymore as a huge amount of software runs in the cloud. For example, Google has no obligation to release their improvements to the MySQL kernel – if they do they are being nice.
To make the above practical, we promise that your client application which uses the database is a separate work. To facilitate this, the mongodb.org supported drivers (the part you link with your application) are released under Apache license, which is copyleft free. Note: if you would like a signed letter asserting the above promise please request via email.
Source: http://www.mongodb.org/display/DOCS/Licensing
According to the Google Group, yes it can, but it doesn't cover how exactly.
Yes, but it isn't pretty and will
force your app to be AGPL licensed. If
you are interested take a look at how
the tools handle the --dbpath option.
Source: http://groups.google.com/group/mongodb-user/browse_thread/thread/463956a93d3fb734?pli=1
If you're using .NET, one option might be RavenDB, which is a document database, and can be embedded.
Please checkout https://github.com/Softmotions/ejdb
This project being developed to resolve this issue.
How about Couchbase Lite? It's an open source, embeddable document database. While it can function as a standalone document database, its real value is in its ability to synchronize with remote document databases. It may be aimed at iOS / Android, but it can run on anything with a JVM.
https://github.com/couchbase/couchbase-lite-java
There is no straight forwarding way to use MongoDB as an embedded library in terms of a well-reusable library. Eliot - head of 10gen - spoke of "it would be nice to have one" - but there is nothing available that could be reused in a sane way.
Looks like a lot of OEMs are trying to get Mongo on to their hardware and devices for real-time processing. A link from MongoDBs website
I usually use Buildroot to create a cross-compiled Embedded Linux root file-system along with all the user space packages.
I noticed that MongoDB is one of the packages that's already integrated as one of the Buildroot builtin packages.
You may check out MongoDB make file for some hints regarding how to built it for Embedded Linux.

How do I properly test my database performance with high load demand?

I have found a lot of topics about stress-testing web application.
My goals are different, it's to test only database (sybase sql anywhere 9).
What I need:
Some tool to give a diagnostic of all sqls and find a bottleneck. I wish I could macro-view the entire system easily.
Best practices to design/build a good sql queries.
The system issues are:
20GB database size.
2-5 request per second
Thousands sql spread in the code (this messy can be solved only rewriting the system).
The quickest way would actually be to upgrade your SQL Anywhere to v10 or (better) v11, as the latest releases include a complete performance diagnostic toolset. See the documentation here for more details.
several open source tools are listed here:
http://www.opensourcetesting.org/performance.php