I am working on one project in which i have one Sphinx Config File.There are main four parts of Sphinx(source,index,indexer,searchd).In config file there is only one indexer and one searchd but source and index are multiple.Now i am starting searchd service over this config file.When new source and index added in config file, i notify searchd service with --rotate option in indexing to inform that new index is added over this service.But now suppose i am delete some source and index from config file.At that time i have to restart the searchd service to inform that particular index is no more in use.Is there any direct way to do this without restarting searchd service?
you may want to use
kill -HUP `cat /path/to/sphinx.pid`
to tell Sphinx to reload current config
Related
I am using PostgreSQL for CentOS. And i changed the data directory to store PostgreSQL data on a different disk.
nano /usr/lib/systemd/system/postgresql.service
#Environment=PGDATA=/var/lib/pgsql/data
Environment=PGDATA=/data/pgsql/data
However, after installing the package update, the contents of the configuration file were changed back to the default settings.
Do I need to check the configuration file every time I install a package update later? Or is there a way to preserve the config file?
There are two ways to deal with this:
the old way:
You create a file /etc/systemd/system/postgresql.service that contains
.include /usr/lib/systemd/system/postgresql.service
[Service]
Environment=PGDATA=/data/pgsql/data
the new way:
You create a directory /etc/systemd/system/postgresql.service.d that contains a file named (for example) pgdata.conf with the contents
[Service]
Environment=PGDATA=/data/pgsql/data
Then notify systemd with
systemctl daemon-reload
This configuration change will override the corresponding value from /usr/lib/systemd/system/postgresql.service, so the change will survive an upgrade.
I'm using Scrapy to fetch data from websites and mongodb for persistence and elasticsearch for searching purposes.
My problem is that when Scrapy inserts data to Mongodb , Elasticsearch is not aware even with the listener set to inserts, updates, and deletes.
Should I add a new plugin for Scrapy to communicate directly with Elasticsearch , if so , why doesn't the listener listen to what happens to the database? Thanks!
The rivers in elasticsearch is deprecated.
Try this you can use transporter to sync data between mongodb and elasticsearch.
How To Sync Transformed Data from MongoDB to Elasticsearch with Transporter
Installing Go
In order to install the compose transporter we need to install Go language.
sudo apt-get install golang
Create a folder for Go from your $HOME directory:
mkdir ~/go; echo "export GOPATH=$HOME/go" >> ~/.bashrc
Update your path:
echo "export PATH=$PATH:$HOME/go/bin:/usr/local/go/bin" >> ~/.bashrc
Now go to the $GOPATH directory and create the subdirectories src, pkg and bin. These directories constitute a workspace for Go.
cd $GOPATH
mkdir src pkg bin
Installing Transporter
Now create and move into a new directory for Transporter. Since the utility was developed by Compose, we'll call the directory compose.
mkdir -p $GOPATH/src/github.com/compose
cd $GOPATH/src/github.com/compose
This is where compose/transporter will be installed.
Clone the Transporter GitHub repository:
git clone https://github.com/compose/transporter.git
Move into the new directory:
cd transporter
Take ownership of the /usr/lib/go directory:
sudo chown -R $USER /usr/lib/go
Make sure build-essential is installed for GCC:
sudo apt-get install build-essential
Run the go get command to get all the dependencies:
go get -a ./cmd/...
This step might take a while, so be patient. Once it's done you can build Transporter.
go build -a ./cmd/...
If all goes well, it will complete without any errors or warnings. Check that Transporter is installed correctly by running this command:
transporter
So the installation is complete.
Create some sample data in mongoDB.
Then we have to configure the transporter.
Transporter requires a config file (config.yaml), a transform file (myTransformation.js), and an application file (application.js) to migrate our data from MongoDB to Elasticsearch.
Move to the transporter directory:
cd ~/go/src/github.com/compose/transporter
Config File
You can take a look at the example config.yaml file if you like. We're going to back up the original and then replace it with our own contents.
mv test/config.yaml test/config.yaml.00
The new file is similar but updates some of the URIs and a few of the other settings to match what's on our server. Let's copy the contents from here and paste into the new config.yaml file. Use nano editor again.
nano test/config.yaml
Copy the contents below into the file. Once done, save the file as described earlier.
# api:
# interval: 60s
# uri: "http://requestb.in/13gerls1"
# key: "48593282-b38d-4bf5-af58-f7327271e73d"
# pid: "something-static"
nodes:
localmongo:
type: mongo
uri: mongodb://localhost/foo
tail: true
es:
type: elasticsearch
uri: http://localhost:9200/
timeseries:
type: influx
uri: influxdb://root:root#localhost:8086/compose
debug:
type: file
uri: stdout://
foofile:
type: file
uri: file:///tmp/foo
Application File
Now, open the application.js file in the test directory.
nano test/application.js
Replace the sample contents of the file with the contents shown below:
Source({name:"localmongo", namespace:"foo.bar"})
.transform({filename: "transformers/addFullName.js", namespace: "foo.bar"})
.save({name:"es", namespace:"foo.bar"});
Transformation File
Let's say we want the documents being stored in Elasticsearch to have another field called fullName. For that, we need to create a new transform file, test/transformers/addFullName.js.
nano test/transformers/addFullName.js
Paste the contents below into the file. Save and exit as described earlier.
module.exports = function(doc) {
console.log(JSON.stringify(doc)); //If you are curious you can listen in on what's changed and being copied.
doc._id = doc.data._id['$oid'];
doc["fullName"] = doc["firstName"] + " " + doc["lastName"];
return doc
}
The first line is necessary to tackle the way Transporter handles MongoDB's ObjectId() field. The second line tells Transporter to concatenate firstName and lastName of mongoDB to form fullName of ES.
This is a simple transformation for the example, but with a little JavaScript you can do more complex data manipulation as you prepare your data for searching.
Executing Transporter:
If you have a simple standalone instance of MongoDB, it won't be being replicated, there'll be no oplog and the Transporter won't be able to detect the changes. To convert a standalone MongoDB to a single node replica set you'll need to start the server with --replSet rs0 (rs0 is just a name for the set) and when running, log in with the Mongo shell and run rs.initiate() to get the server to configure itself.
make sure you are in the transporter directory:
cd ~/go/src/github.com/compose/transporter
Execute the following command to sync the data:
transporter run --config ./test/config.yaml ./test/application.js
The synchronizing mechanism that you are looking for is called rivers in Elasticsearch.
In this specific case, you should sync the specific Mongodb collection that you are using to save your Scrapy data with the Elasticsearch index.
For details about how to proceed you should check out the following links:
setting up the mongodb river
mongodb river plugin
Also I recommend checking out the answers on the Elasticsearch tags here on Stackoverflow. I have found detailed answers for most of the common problems regarding implementation details.
I have a mongodb installed on windows server. I take regular backups of the data/db folder using Rackspace backup.
I created a deployment of a mongodb replica set with 3 ubuntu servers using Rackspace deployments. Now I want to move the data on windows to the empty replica set. How can I do it?
I tried copying the contents of data/db on windows to var/lib/mongodb on the primary replica set. It didn't work.
For some reason the var/lib/mongodb on the ubuntu machines does not contain data/db directory. When I create a new db the db files are created on var/lib/mongodb directory.
The difference in data directories is fine .. on Windows the default dbpath will be c:\data\db; the Ubuntu package sets the dbpath to /var/lib/mongodb instead.
Since you are starting with an empty replica set (and using a backup from a standalone server), the most straightforward approach would be to:
Stop all the mongod servers for the replica set (you definitely don't want to copy data files directly into a running instance!).
Remove any files that are already in the /var/lib/mongodb data directory.
Copy the data files from your standalone MongoDB backup into /var/lib/mongodb on one of your replica set servers. This server will become your primary to set up the rest of the replica set.
Start up this primary making sure to include a replSet name in your configuration file. You may already have this set from your "empty" replica set that you already created.
Run rs.initiate() in the mongo shell to create the initial configuration on the primary.
Start up your additional servers as members of this replica set: they need the same replSet name configured.
Use rs.add(..) to add your additional servers from the mongo shell on your primary. Assuming the add is successful (i.e. the mongods can connect to each other), this will begin the process of initial sync (copying data from the primary) and the new hosts will become secondaries after they have finishing initial sync.
This is essentially the same steps as the deploy a replica set tutorial, except you are copying over your data first.
The problem could be related to the configuration file of mongodb
locate the file mongodb.conf and edit the dbpath parameter, check if the path really exist, and if it doesn't create the missing directories. Also check permissions in that path
anyway i don't know if its the right way to just copy the datafiles in a new location, i guess you should use mongo import/export
What is the best method to transfer Sphinx real time index from one machine to another. If it was disk index, I could just move the database and re-index it again, but the index is RT. Thanks in advance!
Stop searchd on source gracefully. (ie searchd --stopwait, rather than just forceibly killing it, or crashing etc)
Copy /var/folder/indexname* to destination machine. (where using the prefix as noted in the index definition)
Copy the index definition to the destination.
Start up searchd on destination.
Most likly to work successfully if both machines have the same version of sphinx installed.
If I update a resource in my sphinx.conf file I can reindex with --rotate and everything works fine. If I update an index in my sphinx.conf or add a new index --rotate has no effect and I have to restart searchd.
Am I doing this correctly, I feel like --rotate should correctly index the new or modified index configurations.
It depends on your sphinx version. In the latest versions just about anything (except maybe the searchd config section) will work with changing the config file.
Just changing the settings on an individual index, a --rotate indexing of the particular index is enough. If you change the settings of particular index, and dont actully reindex it, searchd probably wont pickup the changes. (because it reads stuff from the index header, not direct from conf file)
I just tested adding a index, and removing a index. both happened with a seemless rotate.
Sphinx 2.0.1-beta (r2792)
Prior to 0.9.9-rc1 - a restart would be required for most config file changes.
You have to restart searchd when modifying the sphinx.conf file.
Rotate will not effect new index additions to your sphinx.conf file - it reindexes an analogous index of the original. Kind of like having a file and file-copy(1) then swapping them over.
If you modify the .conf file its sort of like declaring a brand new index.
Thus --rotate does not work if the exact index does not previously exist.
See; http://sphinxsearch.com/docs/2.0.1/ref-indexer.html