Lucene.net Server Farm / Multiple Servers - lucene.net

I haven't thought about this issue before as I was hosting the application on just one windows server 2008 and lucene.net stores the index on its local hard drive.
(Basically, every time, when a user post something or reply something, I update the index, so the search can return the latest result. Not sure if that's the best way to do it)
Now that we are going to need another webserver with a load balancer in front, I obviously can't have each sever index their own depends on where load balancer points to, as they will be out of synch.
One option for me is to hook up the two servers and map them to a shared server that stores the indexes, but is that a suggested solution?
How do you guys managed the parsing and indexing of lucene.net in a server farm environment?
Thanks a lot

You could seperate the Lucene index engine from your web application by creating a service that delivers the functionalities, something like a WCF or Rest service.
You can also use already existing search servers.
http://lucene.apache.org/solr/
http://www.elasticsearch.org/

The way we keep our load-balanced servers in sync, each with their own copy of Lucene, is to have a task on some other server, that runs every 5 minutes commanding each load-balanced server to update their index to a certain timestamp.
For instance, the task sends a timestamp of '12/1/2013 12:35:02.423' to all the load-balanced servers (the task is submitting the timestamp via querystring to a webpage on each load-balanced website), then each server uses that timestamp to query the database for all updates that have occurred since the last update through to that timestamp, and updates their local Lucene index.
Each server also stores the timestamp in the db, so it knows when each server was last updated. So if a server goes offline, when it comes back online, the next time it receives a timestamp command, it'll grab all the updates it missed while it was offline.

Related

How Do I Optimize Zend Framework

I have a application built on Zend Framework I am trying to optimize.
I did some Xdebug profiling and although i cant say i understand every nitty gritty of the results i got, some things were quite obvious from the result.
For instance, the file Bootstrap.php seems to be the one gulping most of the time taking 4,553MS seconds which accounts for 92.49% of the total time.
And if i dig further, I could see that Zend_Application_Bootstrap_Boostrap->run takes the bulk of the time. Checking this out again, I found out that Zend_Controller_Front->Dispatch might actually be the function inside the Boostrap.php that takes time to execute.
Question is, from these indices that i have, how best can I go about Optimizing the application? If it caching, how do i go about applying Caching to this situation?
Thanks
From the look of the callgrinds, on the login page the app is spending most of it's time in curl_exec, which is to be expected if you're doing a remote login. But it is doing 10 separate curl_execs which seems excessive. I'm not familiar with the LinkedIn login auth, but is it possible your app is running the remote login code multiple times?
On the standard page request the app is spending most of its time connecting to MySQL, and it seems to be doing this twice. Are you using a remote DB server, and do you need two separate DB connections?
Assuming you are using a remote DB server and it is on the same network as your web server, there seems to be some networking issue there. I'd check the latency to that server if you can, and try connecting to the IP address instead of a hostname to see if that makes any difference (if doing this is much faster this would suggest an issue with the DNS setup on your web server).

Core Data syncronization procedure with Web service

I'm developing an application that needs to be syncronized with remote database. The database is connected to the a web-based application that user able to modify some records on the web page.(add/remove/modify) User also able to modify the same records in mobile application. So each side (server - client) must be keep the SAME latest records when an user press the sync button in mobile app. Communication between server and client is provided by Web Serives.(SOAP) and i am not able to change it because of it is strict requirements. (i know this is the worst way that can be used). And another requirement is the clients are not able to delete the server records.
I already be familiar with communicating web service (NSURLConnection), receiving data (NSData) and parsing it. But i could not figure out how the syncronization procedure should be. I have already read this answer which about how i can modify server and client sides with some extra attributes (last_updated_date and is_sync)
Then i could imagine to solve the issue like:
As a first step, client keeps try to modify the server records with sending unsyncronized ones. New recoords are directly added into DB but modified records shoud be compared depending on last_updated_date. At the end of this step, server has the latest data.
But the problem is how can manage to modify the records in mobile app. I thought it in a two way:
is the dummiest way that create a new MOC, download all records into this and change with existing one.
is the getting all modified records which are not in client side, import them into a new MOC and combine these two. But in this point i have some concerns like
There could be two items which are replicated (old version - updated version)
Deleted items could be still located in the main MOCs.
I have to connect multiple relationships among the MOCs. (the new record could have more than 4 relationships with old records)
So i guess you guys can help me to have another ideas which is the best ??
Syncing data is a non-trivial task.
There are several levels of synchronization. Based on your question I am guessing you just need to push changes back to a server. In that case I would suggest catching it during the -save: of the NSManagedObjectContext. Just before the -save: you can query the NSManagedObjectContext and ask it for what objects have been created, updated and deleted. From there you can build a query to post back to your web service.
Dealing with merges, however, is far more complicated and I suggest you deal with them on the server.
As for your relationship question; I suggest you open a second question for that so that there is no confusion.
Update
Once the server has finished the merge it pushes the new "truth" to the client. The client should take these updated records and merge them into its own changes. This merge is fairly simple:
Look for an existing record using a uniqueID.
If the record exists then update it.
If the record does not exist then create it.
Ignoring performance for the moment, this is fairly straight forward:
Set up a loop over the new data coming in.
Set up a NSPredicate to identify the record to be updated/created.
Run your fetch request.
If the record exists update it.
If it doesn't then create it.
Once you get this working with a full round trip then you can start looking at performance, etc. Step one is to get it to work :)

Syncing iOS Core Data with remote server which sends XML

My app parses XML from remote server and store the objects in Core Data(SQLite storage). So that user can browse the material when OFFLINE by reading from local storage.
User may make changes to objects when browsing offline which gets stored locally in Core Data SQLite store. Another User makes some changes to object on Remote server and it is stored there. Now when I detect internet connection, my app should sync my local storage with remote server. Which means remote server is updated with changes I made to my Core Data(SQLite storage) when I was offline and my local storage - Core Data(SQLite storage) needs to be updated with what ever changes other user made to remote server.
For example there is a forum and it is stored in my local storage so that I can read and reply when I am traveling. When later on internet is accessible. My app should automatically put all my replies stored in core data to remote server and also bring other posts on remote server into my local storage.
Remote server is sending XML which I'm parsing and storing in Coredata. My problem is how to sync it ?
How both ways communication happens when there is a change?
How to sync only data which has changed and not to IMPORT whole remote server DB and vice-versa ?
I know one of the way to do it..
add one more field to your local and server database. i.e. Timestamp.
when user change data on the local database change the Timestamp to current time.
do same on the server..i.e. When someone edit data on the server change Timestamp to current time.
When user connects to internet... check local Timestamp to Server timestamp..
case 1 Both are same - Nothing to do
case 2 local time > server time - use sql to get all the data having timestamp greater than server timestamp.. and upload it on the server...
case 3 local < server .... get all the records greater than the local timestamp and add it to the local database..
I am not sure if there is any better way... but this surely works...
One solution could be
iphone syncs the changes to the server
server merges the new and old stuff
iphone gets the new changes (from the merge) from the server
So let the server be the master which should know how to merge stuff and the clients should only download the data incrementally after some changes.
Transactions. You want to follow the ACID rules for transactions. Essentially you have to make sure that data has not be updated that you have not refreshed locally before altering your local write to update.
So the easiest way is have a user get the most recent update from the server, then overwrite that and make sure that with timestamps no othe update happens during that process. Even better yet, use blocking with threads to insure that nothing else happens.
If you Google transactions or ACID there will be a lot of info out there. It's a big area in RDBMS environments where many users can corrupt the data and locks must be held between writes and updates.

Client Server Applications for Iphone

I have a question regarding this topic.Like for Client Server Applications
1) is it necessary to load database directly into the Application.
Suppose if I have a DB in the back end and My application has to connect to that DB and display the results on the View for this do I need to Add DB into the Application directly.
2) can we access any DB or a File on the Remote server and show the required results.( with out adding that particular DB or A File into the application directly). How can we do this.
I saw a similar question in stackoverflow one answer was to use a PList, I am new to this.I am browsing the net but not able to get clear results. I lost many of my interviews because of this question.
Thanks,
1) is it necessary to load database
directly into the Application.
Suppose if I have a DB in the back end
and My application has to connect to
that DB and display the results on the
View for this do I need to Add DB into
the Application directly.
I'm not sure I understand this question. No, you don't need to load a database directly into a client in a client-server architecture. Normally, when I think of a design where a server has a database, I imagine there's some kind of way for the client to query the server for information. Perhaps it's making HTTP requests, which the server parses into a query, runs the query, and then returns the results (perhaps in XML form?).
2) can we access any DB or a File on
the Remote server and show the
required results.( with out adding
that particular DB or A File into the
application directly). How can we do
this.
Are you asking if it's possible, in general, to access a server database from a client? Yes, of course. (See above, re: HTTP Requests).
Any arbitrary file? That depends on how the server is set up. Again, HTTP is one protocol works that way; if you send an HTTP query like "GET someimage.png HTTP/1.0", the server could just be grabbing the whole file someimage.png and sending it back in the response. (Technically, it's not necessarily snarfing a whole file -- it could be creating that PNG dynamically since there's nothing in the HTTP protocol that says it must be sending an existing file -- but that's outside the scope of your question.)
I lost many of my interviews because
of this question.
Not to sound too snarky, but interviews are often won and lost not because you don't know the answer, but when you can't communicate effectively. You haven't phrased your question(s) here particularly well.

How to link MemCached server together?

I'm looking into using MemCached for a web application I am developing and after researching MemCached over the past few days, I have come across a question I could not find the answer to.
How do you link Memcached server together or how do you replicate data between MemCached server?
Additionally: Is this functionality controlled by the servers or the clients and how?
when you set several servers, the client libraries use a first hash to pick one where to store each key/data pair. that means that there's no replication, and also that every client has to use the same set of servers.
pros:
almost zero overhead, storage and bandwidth grow linearly.
server code is kept simple and reliable.
cons:
any change in the set of servers (one goes down, or you add a new one) suddenly invalidates (almost) the whole cache.
you have to be sure to use the same algorithm on every client.
if you have control to the client's code, you can simply store each key/data pair twice on two servers. just be sure to search on the same places when reading from a different client.
I've used BeITMemcached and in that you create an instance of MemcacheClient and set the servers you want to use, just as strings.
At that point the client itself determines which of the servers it has available to put different items into. You never know which an item will be in.
Check here to see how the servers handle failover.
The easiest thing is to have a repopulate mechanism. In my case, I store several hundred objects in memcache which come out of a database. I can just call repopulate and put them all back in there. Whenever I add, update or delete them to the database, I make those same calls to memcache.
http://repcached.lab.klab.org/
Also, the PHP PECL memcache client can replicate data to multiple servers, see memcache.redundancy.
It sounds like you wish to have caches that can cope with machines rebooting etc if so…
In a lot of case (assuming you are not writing Facebook) a RDMS is fast enough for caching. Just create a table that has a key and a blob column. If the RDBS server has enough ram, all the data will be in RAM and just saved to disk so as to allow recovery.
Remember this could be a separate server(s) from your main database server.
If you wish to get more fancy and are using a high-end RDMS, you may be able to set up change notifications on the queries that are used to build the “cached data” that delete out-of-date rows from the cache.
Someone you can set up triggers to clear invalid rows from the cache, however this can be very complex very quickly.
Memcached does not provide replication property. To do that, you need to add the server to memcached client server list and then hit the DB for the data to be stored in that particular server.
You should seriously consider CouchBase. It uses the memcached protocol, provides nearly the same speed, and delivers the automatic replication you're looking for. It also persists to disk so your cache will never be cold.