Sphinx version for file indexing - sphinx

I working on full text search on MS Office files, i am using postgresql and sphinx for indexation, but on our production server installed sphinx version 0.99. I can not decide if i need to update it to last version. How much does this versions differ in performance?

Absolute Performance? Not much. It hasnt got much better, it hasnt got much worse. Sphinx was pretty fast already, been no groundbreaking changes that suddenly make it so much faster.
Its more about features and fixes. But the new features allow you do stuff easier and quicker, which in turn may lead to more performance.

Related

unhappy with MySQL workbench 5.2

Is there a better option for windows out there?
I come froma MS SQL-Server environment so something that's similar to their editor would probably be most preferable.
What capabilities are you looking for.
MySQL Workbench evolves really quick and the 5.3 should be out fairly soon.
I don't think there's anything comparatively good unless you simply use phpmyadmin...
The 5.3 should be out soon with its load of fixed bugs.
What don't you like with it. Maybe I can help if I understand.
I don't think there's a better option for you if you have to work with MySQL.
I agree with you: MySQL Workbench generally could behave much better; some operations simply produce no output leaving you wondering what happened. If you're used to Microsoft SQL Server, you'll likely miss the capability to run several queries in one code editor window, producing several outputs at the same time. This is the biggest usability drawback, in my opinion.
There are some things MySQL does better though, so it's a mixed bag.

Is Lucene.net abandoned?

I'm currently testing Lucene.Net, and it's perfect for my needs but I've seen this recent post in the dev mailing list (with no answers)...
Do you think it's unsafe to start developping with this library ?
I thought it was widespread used ?
As far as I know Lucene.NET is used for RavenDB, so it should be in pretty good shape.
Also, it depends on what do you mean by "unsafe". It is hard to guarantee any OSS project will never stop, so all of them are inherently "unsafe". Same is actually true for commercial projects.
Lucene.NET seems to be a reliable project at current point (I used it in small project, so I can not guarantee that, but RavenDB seems to do just fine), so even if new development stops, it should still be possible to rely on it.
I think it all depends on longevity of your project, on your readiness to fix any issues in Lucene (if they arise), and on requirements of the project owners.

Derby vs PostgreSql Performance Comparison

We are doing research right now on whether to switch our postgresql db to an embedded Derby db. Both would be using glassfish 3 for our data layer. Anybody have any opinions or knowledge that could help us decide?
Thanks!
edit: we are writing some performance tests ourselves right now. Looking for answers more based on experience / first hand knowledge
I know I'm late to post an answer here, but I want to make sure nobody makes the mistake of using Derby over any production-quality database in the future. I apologize in advance for how negative this answer is - I'm trying to capture an entire engineering team's ill feelings in a brief Q&A answer.
Our experience using Derby in many small-ish customer deployments has led us to seriously doubt how useful it is for anything but test environments. Some problems we've had:
Deadlocks caused by lock escalations - this is the biggest one and happens to one customer about once every week or two
Interrupted I/Os cause Derby to fail outright on Solaris (may not be an issue on other platforms) - we had to build a shim to protect it from these failures
Can't handle complicated queries which MySQL/PostgreSQL would handle with ease
Buggy transaction log implementation caused a table corruption which required us to export the database and then re-import it (couldn't just drop the corrupted table), and we still lost the table in the process - thank goodness we had a backup
No LIMIT syntax
Low performance for complicated queries
Low performance for large datasets
Due to the fact that it's embedded, Derby is more of a competitor to SQLite than it is to PostgreSQL, which is an extremely mature production-quality database which is used to store multi-petabyte datasets by some of the largest websites in the world. If you want to be ready for growth and don't want to get caught debugging someone else's database code, I would recommend not using Derby. I don't have any experience with SQLite, but I can't imagine it being much less reliable than Derby has been for us and still being as popular as it is.
In fact, we're in the process of porting to PostgreSQL now.
Derby still is relatively slow in performance, but ... where ever your Java application goes your database server goes, completely platform independent. You don't even need to think about installing a DB server where your Java app is being copied to.
I was using MySQL with Java, but having an embedded implementation of your Database server sitting right within my Java App is just stunning and unprecedented productivity, freedom and flexibility.
Always having a DB server included whenever and wherever on any platform for me is just heaven !!!
Have not compared Postgresql to Derby directly. However, having used both in different circumstances, I have found Derby to be highly reliable. However you will need to pay attention to Derby configuration to ensure it suits your application needs.
When looking at the H2 databases stats site, it's worth reading follow up discussion which comes out in favour of Derby compared to the H2 conclusions. http://groups.google.com/group/h2-database/browse_thread/thread/55a7558563248148?pli=1
Some stats from the H2 database site here:
http://www.h2database.com/html/performance.html
There are a number of performance test suites that are included as part of the Derby source code distribution itself; they are used by Derby developers to conduct their own performance testing of Derby. So if you need examples of performance tests, or want additional ones, you could consider using those. Look in the subdirectory named java/testing/org/apache/derbyTesting/perf in the Derby source distribution.
I'm not sure what you mean that Derby is embedded. Certainly that is an option, but you can also install it as a separate server.

PostgreSQL OS suggestion

Hi guys we are on the way to start developing a big web platform. For db server we choosen postgresql. Would you suggest an OS for the postgresql server (we are looking for the maximum performance)?
Thanks
P.S. sorry for the bad english
I would suggest a platform that you feel comfortable with. As Jeff suggested, it is usually easier to throw faster hardware at a problem than human time.
This reasoning is based on theses main ideas :
Usually the database is only marginally faster given different OS.
The high order optimisations are usually in tuning the database or the requests. Not really in switching OS.
If you have more knowledge on a OS, you can take usually more juice out from it. Whereas if you take an OS that you are not really familiar with, but that is supposed to be faster, it might kick your back in unexpected ways.
That said, as answered before an *NIX-based OS would be better right now, since PostgreSQL has still deep roots in a *NIX world. But this is becoming less and less an issue with the 8.x line.
I would suggest *nix based, Linux would be great if it is possible because you can get the package easier with the built-in package manager (e.g apt for debian, yum for fedora, etc). Because Postgres is originally made for *nix based OS. The port to windows is only recently and as you can see on several threads here on Stackoverflow, Postgres does not perform as good on Windows as it is on *nix based OS.

When to upgrade libraries

I work with a lot of open source libraries in my daily tasks (java FYI). By the time a project comes close to maturing, many of the libraries have released newer versions. Do people generally upgrade just to upgrade, or wait until you see a specific bug? Often the release notes say things like "improved performance and memory management", so I'm not sure if it's worth it to potentially break something. But on the other hand most libraries strive and claim to not break anything when releasing new versions.
What is your stance on this? I'll admit, I am kind of addicted to upgrading the libraries. Often times it really does help with performance and making things easier.
The rule for us is to stay up to date before integration testing but to permit no changes to any library once we're past this point. Of course, if integration testing reveals flaws that are due to an issue with a library that has been fixed, then we'd go back and update. Fortunately, I cannot remember a situation where that has happened.
Update: I understand Philuminati's point about not upgrading until you have a reason. However, I think of it this way: I continuously pursue improvements in my own code and I use libraries built by people that I believe think the same way. Would I fail to use my own improved code? Well, no. So why would I fail to use the code that others have improved?
I keep what works until there is a reason to upgrade.
If the information pertaining the old version appears on secunia or securityfocus...
Otherwise - if new functionality is needed (better performance is also a 'functionality').
I'm with the lazy crowd - I can't remember ever formulating a different strategy than "upgrade when there is a reason to" - but now that I consider the question, there is something to be said about proactive upgrades.
Upgrading does make it easier for you to report a bug in the lib, should you find one. If you find a bug and have not upgraded, it's the first thing you're going to have to do before you get any help or support. You might as well do that proactively.
Especially if you have a good test suite, upgrading proactively will flush out problems early, and that is always a smart move.
It depends a lot on your deployments. If you are supporting multiple platforms then the very latest libraries may not be available on all at any given moment. I've been frustrated by trying to install something that requires the very latest version of some lib, and it's not available as a package yet.
If you deploy to customers you want to develop against libraries that are stable and widely available.