Rotating vs Restarting Sphinx - sphinx

I have a config file that takes a long time to rotate using the indexer to do so e.g.
indexer --config /home/indexer/MyConfig.conf.php --rotate idx_Big
I can live with that however sometimes when I want to test an updated config it might either break or not deliver desired results. Usually what I do then is
Revert the Config back to the original working settings
Index/Rotate again
However I am wondering if I can just copy the 'good' index out of the directory sphinx accesses (e.g. root), index/rotate the new one and then if I don't like the results, instead of doing another hour long index/rotate back to old index just copy the newly indexed idx over with the one I saved in root and then stop/restart e.g..
/usr/bin/searchd -c /home/indexer/MyConf.php --stop
/usr/bin/searchd -c /home/indexer/MyConf.conf.php
Yes I do get that the conf will still be the one with the changes I don't like but the larger question is still relevant to my personal situation; can I replace an index I rotated with a saved version and restart to get back to where I was or do I need to index/rotate the old settings to do so?

Yes you can do this.
When copying the index out, best to not copy the .spl (lock) file. Copying that back might confuse sphinx.
Also ideally its best copy the index back to the sphinx folder while searchd is shutdown. Overwriting the index files might confuse and/or crash sphinx. Not a big deal as you about to stop it and clearly dont care about the currupted index, but best less chance of confusing side effects.
If wanted to be fancy....
Could just copy the the old index back, with append '.new' to the index name (eg index.spd becomes index.new.spd), rather than overwriting the active files, and then send searchd sighup signal. Searchd will then load this new version (which is actually just the old version again) - seamlessly. This is using the same rotate uses to load a new version into searchd, just doing it 'manually'.

do I need to index/rotate the old settings to do so?
You can test whether it works. I do not know Sphinx, but seems simple:
Stop the Sphinx program/service.
Move the index folder to whatever you want to.
Backup your default configuration file, your index folder is tied to.
Create the new configuration file you want to test.
To start the Sphinx for testing as you wish.
After when you want to go back with your backup settings, the process is similar:
Stop the Sphinx program/service.
Delete the index folder.
Restore the index folder from whatever place else you have put it on last time.
Restore the backup of your default configuration file to where it belongs originally.
To start the Sphinx for testing as you wish.
Now you should not need to wait one hour for the Sphinx to index again, as you placed everything in place. The application should not be aware of your fast ninja moves.
If it not works, Sphinx is probably change other files than you are backup.

Related

optimize (disk/memory) of dockerized postgres database containing test data

Context
For local development and testing within a CI pipeline, I want a postgres docker image that contains some data sampled from production (a few tens of MBs). I will periodically rebuild this image to ensure the sampled data stays fresh.
I don't care at all about data integrity, but I care quite a bit about image size and container disk/memory usage when run. Startup time should be at most a couple of mins.
What I've built
I have a docker file that builds on top of one of the official postgres (postgis) docker images, but it actually initializes the database and uses pg_restore to insert my sample data.
Attempted optimising
I use a mutlistage build, just copying the postgres directory into the final image (this helps as I used node during the build).
I notice that the pg_xlog directory is quite large, and logically seems redundant here since I would happily checkpoint and ditch any WAL before sealing the image. I can't figure out how to get rid of it. I tried starting postgres with the following flags:
-min_wal_size=2 --max_wal_size=3 --archive_mode=off --wal_keep_segments
and running Checkpoint and waiting for a few seconds, but it doesn't seem to change anything. I also tried deleting the contents of the directory, but that seemed to break the database on its next startup.
Rather than put the actual database within the image, I could just leave a pg_dump file in the image and have the image entrypoint build the database from that. I think this would improve the image size (though I'm not clear why the database should take up much more space than the dump, unless indexes are especially big - I actually thought the dump format was less compact than the database itself, so this might offset the index size). This would obviously impact on startup time (but not prohibitively so).
Summary/Questions
Am I going about this the right way? If so, what kind of disk/memory optimizations can I use? In particular can I remove/shrink pg_xlog?
I'm using Postgres 9.5 and Postgis 2.X.
Was the server ever run with a larger max_wal_size than 3? If so, it could have "recycled" ahead a lot of wal files by renaming old ones for future use. Once those are renamed, they will never be removed until after they are used, even if max_wal_size is later reduced.
I also tried deleting the contents of the directory, but that seemed to break the database on its next startup.
You can fix that by using pg_resetxlog. Just don't get in the habit of running that blindly, it is very dangerous to run outside of a test environment.

Sphinx: How to recover realtime index from backups?

I already known how to implement a backup of realtime index by using FLUSH RTINDEX and compressing all involved files (.ram, .kill, .meta files) like below:
tar zcvf /backups/myrtbackup.tar.gz /sphinxdata/myrtindex.{*.sp*,ram,kill,meta} /sphinxdata/binlog.*
But there's question, if the system is crashed, or somehow we delete all data by mistake, how could we recover from that backup?
Following a crash, or if the server goes and need to bring online a new one etc. Just restore those files to the data folder (while searchd is not running) and start searchd.
If search wont start you might have some luck not restoring the binlog, rather just the index files.
If you delete all the data, its not really going to help you. You can't 'roll back' and go back to a specific time.
In general the sphinx 'index' designed to be created as an index over a real database somewhere else. Not as an authoritative database in itself.
Sphinxes "backups" are just not robust enough for you to be able to rely on them. Because sphinx indexes should be disposable, if they get currupted, just recreate them from the source data.
(the backups that you can do, are just 'hacks' that may help you get online quicker in case of disaster)

postgresql: Accidentally deleted pg_filenode.map

Is there any way to recover or re-create pg_filenode.map file that was accidentally deleted? Or is there any solution on how to fix this issue without affecting the database? Any suggestions to fix this issue is highly appreciated! The postgres version that we have is 9.0 running in Redhat Linux 5. Thanks!
STOP TRYING TO FIX ANYTHING RIGHT NOW. Everything you do risks making it worse.
Treat this as critical database corruption. Read and act on this wiki article.
Only once you have followed its advice should you even consider attempting repair or recovery.
Since you may have some hope of recovering the deleted file if it hasn't been overwritten yet, you should also STOP THE ENTIRE SERVER MACHINE or unmount the file system PostgreSQL is on and disk image it.
If this data is important to you I advise you to contact professional support. This will cost you, but is probably your best chance of getting your data back after a severe administrator mistake like this. See PostgreSQL professional support. (Disclaimer: I work for one of the listed companies as shown in my SO profile).
It's possible you could reconstruct pg_filenode.map by hand using information about the table structure and contents extracted from the on-disk tables. Probably a big job, though.
First, if this is urgent and valuable, I strongly recommend contacting professional support initially. However, if you can work on a disk image, if it is not time critical, etc. here are important points to note and how to proceed (we recently had to recover a bad pg_filenode.map. Moreover you are better off working on a disk image of a disk image.
What follows is what I learned from having to recover a damaged file due to an incomplete write on the containing directory. It is current to PostgreSQL 10, but that could change at any time
Before you begin
Data recovery is risky business. Always note what recovery means to your organization, what data loss is tolerable, what downtime is tolerable etc before you begin. Work on a copy of a copy if you can. If anything doesn't seem right, circle back, evaluate what went wrong and make sure you understand why before proceeding.
What this file is and what it does
The standard file node map for PostgreSQL is stored in the pg_class relation which is referenced by object id inside the Pg catalogs. Unfortunately you need a way to bootstrap the mappings of the system tables so you can look up this sort of informatuion.
In most deployments this file will never be written. It can be copied from a new initdb on the same version of Postgres with the same options passed to initdb aside from data directory. However this is not guaranteed.
Several things can change this mapping. If you do a vacuum full or similar on system catalogs, this can change the mapping from the default and then copying in a fresh file from an initdb will not help.
Some Things to Try
The first thing to try (on a copy of a copy!) is to replace the file with one from a fresh initdb onto another filesystem from the same server (this could be a thumb drive or whatever). This may work. It may not work.
If that fails, then it would be possible perhaps to use pg_filedump and custom scripting/C programming to create a new file based on efforts to look through the data of each relation file in the data directory. This would be significant work as Craig notes above.
If you get it to work
Take a fresh pg_dump of your database and restore it into a fresh initdb. This way you know everything is consistent and complete.

Good pattern for building Lucene indexes on an application server and copying to multiple web servers

We are considering an architecture where an application rebuilds (often) a large number of Lucene indexes that we are using. The last task in the rebuild would be to use a file share or FTP to copy that rebuilt index OVER the last index.
I'm a bit concerned about what happens if an end user is searching against that index during the time that we are copying a new index in.
Anyone have any thoughts, experiences, better patterns to achieve this? I'm familiar with SOLR and that would be one way to go, not as familiar with Zoie from LinkedIn. I would prefer to avoid both at this stage and go with our homegrown, fairly simple, 'just rebuild it and copy it on top' approach.
One option is to directly store the indexes directly in something like AppFabric cache rather than on the file system. Another is to create your own implementation of directory that kinda wraps around FSDirectory and monitors a separate staging directory and if it sees new indexes that are ready, blocks subsequent calls until it copies them over.

Sharing a file on an overloaded machine

I have a computer that is running Windows XP that I am using to process a great deal of data, update monitors, and bank data. Generally it is pretty loaded with work.
One particular file that has real time data is useful to a number of users. We have two programs that need this file, one that displays the numerical data and one that plots the numerical data. Any user can run an instance of either program on their machine. These programs search for the real time data file which is updated every second. They are both written in Perl and I was asked not to change either program.
Because of the large load on the computer, I am currently running the program that does calculations and creates the real time data file on a separate computer. This program simply writes the real time file onto the overloaded computer. Because Windows doesn't have an atomic write, I created a method that writes to a different extension, deletes the old real time file, and then moves the new one to the correct name. Unfortunately, as the user load on the computer increases, the writes take longer (which isn't ideal but is live-able) but more annoyingly, the time between deleting the old real time file and moving the new file to the correct name increases a great deal, causing errors with the Perl programs. Both programs check to see if the file modify time has changed (neither check for file locks). If the file goes missing they get angry and output error messages.
I imagine a first course of action would be to move this whole process away from the overloaded computer. My other thought was to create a number of copies of the files on different machines and have different users read the file from different places (this would be a real hack though).
I am new to the world networking and file sharing but I know there is a better way to do this. Frankly this whole method is a little hacked but that's how it was when I came here.
Lastly, it's worth mentioning that this same process runs on a UNIX machine and has none of these problems. For this reason I feel the blame falls on a need for an atomic write. I have been searching the internet for any work around to this problem and have tried a number of different methods (eg my current extension switching method).
Can anyone point me in the right direction so I can solve this problem?
My code is written in Python.
os.rename() says:
os.rename(src, dst)
Rename the file or directory src to dst. If dst is a directory,
OSError will be raised. On Unix, if dst exists and is a file,
it will be replaced silently if the user has permission. The
operation may fail on some Unix flavors if src and dst are on
different filesystems. If successful, the renaming will be an
atomic operation (this is a POSIX requirement). On Windows, if
dst already exists, OSError will be raised even if it is a file;
there may be no way to implement an atomic rename when dst names
an existing file.
Given that on Windows you are forced to delete the old file before renaming the new one to it, and you are prohibited from modifying the reading scripts to tolerate the missing file for a configurable timeout (the correct solution) or do proper resource locking with the producer (another correct solution), your only workaround may be to play with the process scheduler to make the {delete, rename} operation appear atomic. Write a C program that does nothing but look for the new file, delete the old, and rename the new. Run that "pseudo-atomic rename" process at high priority and pray that it doesn't get task-switched between the delete and the rename.