Sphinx search complains about missing .sph file for RT index - sphinx

I added a real-time (RT) index to my sphinx.conf file. I can see it when I connect to Sphinx with mysql -h 127.0.0.1 -P 3312. But when I try to do a command-line search, Shinx adds to the end of the response:
index 'drupal_rt': search error: failed to open /var/lib/sphinx/rt_drupal_nodes.sph: No such file or directory.

./search tool doesn't work with real-time indexes.
You should use mysql client instead.
If you need to use command line search, you need test.php script which you could find in the sphinx source repository under the api dir.

Related

Can't find PostgreSQL in Library (macOS 10.15.1) yet SHOW hba_file; points to library

I'm attempting to link Databox to PostgreSQL and following their directions here: https://databox.com/connect-postgresql-data-easy-steps
However at this step:
Add created user access from Databox IP to our newly created database. Add the following to pg_hba.conf and reload the database
I'm having trouble. I look for the hba_file so that I can edit it using SHOW hba_file; which gives me the following path:
/Library/PostgreSQL/12/data/pg_hba.conf
Now here's the part I'm having the most trouble with. I go into terminal and cd Library then ls -a and there is no PostgreSQL listed. And trying to tab-complete with Po returns no results. if I try cd PostgreSQL I get the following:
cd: no such file or directory: PostgreSQL
It just isn't there. I have hidden files showing and it's just...not there. But SHOW hba_file; continues to point to the same path. What's going on?

Can't restore OpenLDAP to new server

I am trying to back up the config and database files from an existing LDAP server to move it to a different freshly installed server (Ubuntu 18.04). I followed the steps given here: https://tylersguides.com/articles/backup-restore-openldap/ to use slapcat to create both config and data ldif files.
When I execute slapadd on the server side,
slapadd -n 0 -F /etc/openldap/slapd.d -l /backups/config.ldif works fine, but executing
slapadd -n 1 -F /etc/openldap/slapd.d -l /backups/data.ldif gives the following error:
Database number selected via -n is out of range
Must be in the range 0 to 0 (the number of configured databases)
All the sites I have been able to find regarding this migration process follow steps similar to the ones above, but none of them mention anything about preconfiguring the number of databases or anything like that. I'm not sure how to proceed from here.
almost sure that you did a mistake during dump and the content of the second file is the same as the config file
Try removing -F /etc/openldap/slapd.d from the second slapadd.

"mount" a PostgreSQL database from files not Backup

I've been given a project to extract data from a PostgreSQL database. I've no previous experience with PostgreSQL but the project I have is to bug fix existing code, so all the logic to connect to the engine and get data is already in place.
The problem I have is the database has been given to me in the form of the folders and files straight from the source HDD, not a backup (which isn't going to happen so "Get the customer to give you a backup instead isn't an option here).
The folders also contained the actual PostgreSQL binaries so I looked a the version (9.4.14) and downloaded the nearest (9.4.18) from the PostgreSQL site and installed it. Now all I have to do is some how is to get it to look at my given data files.
I tried the obvious of copying the contents of the data folder into the installed data folder but after the PostgreSQL service won't start.
I did find a option in the conf file:
#data_directory = 'ConfigDir'
I changed this to:
data_directory = 'C:\customer\data'
But again the service won't start after this.
The data directory used by the service is defined through the service command line which overwrites any property defined in postgresql.conf.
You need to re-create the service in order to change the data directory, e.g.:
Remove the service:
pg_ctl -unregister -N postgresql-9.1
postgresql-9.1 is the "real" name of the service, not the "Display Name". You can see that in the properties of the service inside the "services" app.
Then re-create the service with the correct data directory:
pg_ctl -register -D -D c:\customer\data -N postgresql-9.1
Another way of "debugging" startup errors in Windows, is to start Postgres from the command line (not through the service) because some errors during startup are not logged in the Postgres logfile but they are displayed on the command line. You can do that with e.g.:
pg_ctl start -D c:\customer\data`
If the bin directory is not in your PATH you need to specify the full path to it on the command line, e.g.: c:\Postgres9.1\bin\pg_ctl

Use postgresql copy command On Openshift $OPENSHIFT_DATA_DIR from within Node JS program

we are developing an app on Openshift.
we recently upgraded it and made it scalable, separating postgresql to a separate gear than the nodeJS.
in the app user can choose a csv file and upload it to the server ($OPENSHIFT_DATA_DIR).
we then execute from within Node JS:
copy uploaded_data FROM '/var/lib/openshift/our_app_id/app-root/data/uploads/table.csv' WITH CSV HEADER
since the upgrade the above copy command is broken, we are getting this error:
[error: could not open file "/var/lib/openshift/our_app_id/app-root/data/uploads/table.csv" for reading: No such file or directory]
I suppose because the pgsql is now on a separate gear it cannot access $OPENSHIFT_DATA_DIR.
can I make this folder visible to postgresql (though it is on a separate gear)?
is there any other folder that can be visible to both the DB and the APP (each on its own gear) ?
can you suggest alternative ways to achieve similar functionality ?
There is currently no shared disk space between gears within the same scaled application on OpenShift Online. If you want to store a file and access it on multiple gears, the best way would probably be to store it on Amazon S3 or some other shared file storage service that is accessible by all of your gears, or, as you have stated, store the data in the database and access it wherever you need it.
You can do this by using \COPY and psql. e.g.
first put your sql command in a file. (file.sql)
psql -h yourremotehost -d yourdatabase -p thedbport -U username -w -f file.sql
the -w eliminates the password prompt. If you need a password, you can't supply it on the command line. Instead set the environmental variable PGPASSWORD to your password. (The use of PGPASSWORD has been deprecated but it still works)
You can do this with rhc
rhc set-env PGPASSWORD=yourpassword -a yourapp
Here is a sample sql
CREATE TABLE junk(id integer, values float, name varchar(100);
\COPY junk from 'file.sql' with CSV HEADER
Notice there is NO semicolon at the end of the second line.
If you're running this command from a script in your application. The file that contains your data and the file.sql must be in your application's data directory.
ie. app-root/data

Scrapy MongoDB and Elasticsearch Synchronization

I'm using Scrapy to fetch data from websites and mongodb for persistence and elasticsearch for searching purposes.
My problem is that when Scrapy inserts data to Mongodb , Elasticsearch is not aware even with the listener set to inserts, updates, and deletes.
Should I add a new plugin for Scrapy to communicate directly with Elasticsearch , if so , why doesn't the listener listen to what happens to the database? Thanks!
The rivers in elasticsearch is deprecated.
Try this you can use transporter to sync data between mongodb and elasticsearch.
How To Sync Transformed Data from MongoDB to Elasticsearch with Transporter
Installing Go
In order to install the compose transporter we need to install Go language.
sudo apt-get install golang
Create a folder for Go from your $HOME directory:
mkdir ~/go; echo "export GOPATH=$HOME/go" >> ~/.bashrc
Update your path:
echo "export PATH=$PATH:$HOME/go/bin:/usr/local/go/bin" >> ~/.bashrc
Now go to the $GOPATH directory and create the subdirectories src, pkg and bin. These directories constitute a workspace for Go.
cd $GOPATH
mkdir src pkg bin
Installing Transporter
Now create and move into a new directory for Transporter. Since the utility was developed by Compose, we'll call the directory compose.
mkdir -p $GOPATH/src/github.com/compose
cd $GOPATH/src/github.com/compose
This is where compose/transporter will be installed.
Clone the Transporter GitHub repository:
git clone https://github.com/compose/transporter.git
Move into the new directory:
cd transporter
Take ownership of the /usr/lib/go directory:
sudo chown -R $USER /usr/lib/go
Make sure build-essential is installed for GCC:
sudo apt-get install build-essential
Run the go get command to get all the dependencies:
go get -a ./cmd/...
This step might take a while, so be patient. Once it's done you can build Transporter.
go build -a ./cmd/...
If all goes well, it will complete without any errors or warnings. Check that Transporter is installed correctly by running this command:
transporter
So the installation is complete.
Create some sample data in mongoDB.
Then we have to configure the transporter.
Transporter requires a config file (config.yaml), a transform file (myTransformation.js), and an application file (application.js) to migrate our data from MongoDB to Elasticsearch.
Move to the transporter directory:
cd ~/go/src/github.com/compose/transporter
Config File
You can take a look at the example config.yaml file if you like. We're going to back up the original and then replace it with our own contents.
mv test/config.yaml test/config.yaml.00
The new file is similar but updates some of the URIs and a few of the other settings to match what's on our server. Let's copy the contents from here and paste into the new config.yaml file. Use nano editor again.
nano test/config.yaml
Copy the contents below into the file. Once done, save the file as described earlier.
# api:
# interval: 60s
# uri: "http://requestb.in/13gerls1"
# key: "48593282-b38d-4bf5-af58-f7327271e73d"
# pid: "something-static"
nodes:
localmongo:
type: mongo
uri: mongodb://localhost/foo
tail: true
es:
type: elasticsearch
uri: http://localhost:9200/
timeseries:
type: influx
uri: influxdb://root:root#localhost:8086/compose
debug:
type: file
uri: stdout://
foofile:
type: file
uri: file:///tmp/foo
Application File
Now, open the application.js file in the test directory.
nano test/application.js
Replace the sample contents of the file with the contents shown below:
Source({name:"localmongo", namespace:"foo.bar"})
.transform({filename: "transformers/addFullName.js", namespace: "foo.bar"})
.save({name:"es", namespace:"foo.bar"});
Transformation File
Let's say we want the documents being stored in Elasticsearch to have another field called fullName. For that, we need to create a new transform file, test/transformers/addFullName.js.
nano test/transformers/addFullName.js
Paste the contents below into the file. Save and exit as described earlier.
module.exports = function(doc) {
console.log(JSON.stringify(doc)); //If you are curious you can listen in on what's changed and being copied.
doc._id = doc.data._id['$oid'];
doc["fullName"] = doc["firstName"] + " " + doc["lastName"];
return doc
}
The first line is necessary to tackle the way Transporter handles MongoDB's ObjectId() field. The second line tells Transporter to concatenate firstName and lastName of mongoDB to form fullName of ES.
This is a simple transformation for the example, but with a little JavaScript you can do more complex data manipulation as you prepare your data for searching.
Executing Transporter:
If you have a simple standalone instance of MongoDB, it won't be being replicated, there'll be no oplog and the Transporter won't be able to detect the changes. To convert a standalone MongoDB to a single node replica set you'll need to start the server with --replSet rs0 (rs0 is just a name for the set) and when running, log in with the Mongo shell and run rs.initiate() to get the server to configure itself.
make sure you are in the transporter directory:
cd ~/go/src/github.com/compose/transporter
Execute the following command to sync the data:
transporter run --config ./test/config.yaml ./test/application.js
The synchronizing mechanism that you are looking for is called rivers in Elasticsearch.
In this specific case, you should sync the specific Mongodb collection that you are using to save your Scrapy data with the Elasticsearch index.
For details about how to proceed you should check out the following links:
setting up the mongodb river
mongodb river plugin
Also I recommend checking out the answers on the Elasticsearch tags here on Stackoverflow. I have found detailed answers for most of the common problems regarding implementation details.