running sphinx on multiple machines or a single instance - sphinx

I have an app , using ruby and thinking sphinx . right now each one of my web servers running a local sphinx daemon.
I don't know if it's better to run a single instance of sphinx and have all the machines to point to this one or keep the current scenario ?
the problem with the current one if the fact each machine does index build and rebuild which is a way is redundant and cause an extra load on the data base.(i'm running index and delta every 30 minutes) , on the other hand I don't what will be the down site of sphinx over the network .
What's the best approach ?

Related

MongoDB on machine or MongoDB Atlas

Suppose I want to host my e-commerce website on GCP/AWS/Digitalocean. I am using MongoDB as database.
Then which will be better?
Installing mongodb on the remote machine (GCP/AWS/Digitalocean)
or
Using MongoDB Atlas for deploying, managing the database
Can you mention the pros and cons of both and in which situation I should use these methods?
This is a really tricky question to answer as every other person will have a different opinion, ill try to answer based on my own experience (aws ec2 instances).
Atlas vs EC2 cons:
I would say the number one "con" of atlas is the cost, it is far more expensive than running your own ec2 (at-least at my scale which around 1TB of data), this varies according to amount of data, instances size, backup routine and more.
you have less "control", clearly you wont be able to access the actual server on which the instance is running on.
Pros:
you have less "control", matching argument 2 in the cons this can be seen as a pro, you don't have to maintain the server and all that comes with it.
do your own math on wether this is a pro or not.
I'm gonna stop here as i can keep listing endless minor differences that are at the end of the day mostly opinion oriented.
I would say though (again based on my own experience) that atlas is great, i would greatly consider using it especially if the following conditions are met:
i don't have a person experienced with aws ec2 instances on my team. (this will cause you to spend alot of time just starting out).
small scale, atlas cost is not that high on small scale and it gives you alot of power and saves you some headaches allowing you to push your product foward.

SqlBase and Gupta windows to the sky

Anybody who can advise or have experience on the possibility to have an SqlBase database in a cloud environment and run a Gupta application which is stored on local PCs?
Thanks.
We have some experience running a SQL-Database (Oracle, SqlServer, SqlBase) on a remote Server connected over WAN. Most often data access is very slow and you have to write your application carefully.
The reason for the slowness is usually not the bandwidth but the number of hops an IP-packet takes. Each hop adds some milliseconds of delay which oftens sums up to a painful experience. So it's ok to get one big blob from a database. It's also ok to fetch large resultsets. But when there are a lot of smaller queries it will get very slow.
There are two solutions to this problem:
1) Use a dedicated line from client to server if possible.
2) Write your application in a way that minimizes the number of queries.

Syncing two mongodb's

Im trying to figure out how to solve following use case with usage of mongodb.
There is a main server, where users can upload/edit/... contents. Sometimes the users want to work independently of the main, so they get a defined set of entries of mongodb of the main server and can deploy own local environment. But after some time, they want to get an update for entries from main server and/or submit their own updates or new insertions
Is there any out of box solution for that? I thought that the standard replica sets can be used, but the documentation is so sparse, so i cant really understand if it can work, if users are e.g. 1 month offline

MongoDB Fail Over with one address

I would like to know if it is at all possible to have mongodb fail overs only using a single address. I know replica sets are typically used for this while relying on the driver to make the switch over, but I was hoping there may be a solution out there that would allow one address or hostname to automatically change over when the mongodb instance was recognized as being down.
Any such luck? I know there are solutions for MySQL, but I haven't had much luck with finding something for MongoDB.
Thanks!
Yes it is possible, the driver holds a cache map of your replica set which it will query for a new primary when the set suffers an election. This map is refreshed once every so often however, if your application restarts (process is quit or something, or each request of PHP fork mode) then the driver has no choice but to refresh its map. At this point you will suffer connectivity problems.
Of course the best thing to do is to add a seedlist.
Using a single IP defies the redundancy that is in-built into MongoDB.

postgres full text search

I am doing a web-project based on asp.net mvc framework. As db I am using postgre SQL. The question is how to organize searching in my application. One option would be using of .net libraries such as lucene.net. Another option is to use the Postgre full text search. So what is the best option?
I haven't run postgres in production, but I have played with it on a test DB with pretty significant (I think) amounts of data. Indexing about 600,000 rows of text strings averaging 3 words creates a full text index that's 120MB. Queries are very fast against this index after the first one for each search term. It looks like the index for each term has to be pulled from disk into memory. I haven't yet found a way to pull in the whole index into memory at startup. Some of the initial slowness may be disk IO related since I am running on a single laptop HD. I am also not sure if a 120MB index would need 120MB of DB memory or more.
For a production app, we are using Lucene for Java and it is performing very well - subsecond responses with several GB of index data. The advantage of Lucene that I see is 1) that it's DB independent and 2) distributable. For #1, this may not be an issue, but it means that you can use the same indexing code no matter what your underlying DB is. For #2, it depends on how big the application would be. Lucene (and Hadoop especially) are designed to be multithreaded so you can store the index on a shared drive and have multiple machines running searches at once (note that indexing is still single threaded). Whether you want this or not depends on what your architecture is. Would you rather have 1 big DB or 1 small / medium DB plus a few smaller indexing servers supporting it?