Issues with running two instances of searchd - sphinx

I have just updated our Sphinx server from 1.10-beta to 2.0.6-release, and now I have run into some issues with searchd. Previously we were able to run two instances of searchd next to each other by specifying two different config-files, i.e:
searchd --config /etc/sphinx/sphinx.conf
searchd --config /etc/sphinx/sphinx.staging.conf
sphinx.conf listens to 9306:mysql41, and 9312, while sphinx.staging.conf listens to 9307:mysql41 and 9313.
After we updated to 2.0.6 however, a second instance is never started. Or rather.. the output makes it seem like it starts, and a pid-file is created etc. But for some reason only the first searchd instance keeps running, and the second seems to shutdown right away. So while trying to run searchd --config /etc/sphinx/sphinx.conf twice (if that was the first one started) complains that the pid-file is in use, trying to run searchd --config /etc/sphinx/sphinx.staging.conf (if that is the second started instance) "starts" the daemon again and again, only no new process is created..
Note that if I switch these commands around when first creating the process, then sphinx.conf is the instance not really started.
I have checked, and rechecked, that these ports are only used by searchd.
Does anyone have any idea of what I can do/try next? I've installed it from source on ubuntu 10.04 LTS with:
./configure --prefix /etc/sphinx --with-mysql --enable-id64 --with-libstemmer
make -j4 install

Note to self: Check the logs!
RT-indices use binary logs to enable crash recovery. Since my old config files did not specify a path for where these should be stored, both instances of searchd tried to write to the same binary logs. The instance started last was of course not permitted to manipulate these files, and thus exited with a fatal error:
[Fri Nov 2 17:13:32.262 2012] [ 5346] FATAL: failed to lock
'/etc/sphinx/var/data/binlog.lock': 11 'Resource temporarily unavailable'
[Fri Nov 2 17:13:32.264 2012] [ 5345] Child process 5346 has been finished,
exit code 1. Watchdog finishes also. Good bye!
The solution was simple, ensure to specify a binlog_path inside the searchd configuration section of each configuration file:
searchd
{
[...]
binlog_path = /path/to/writable/directory
[...]
}

Related

Backup Daemon error - ops manager 3.4

I have installed ops manager and setup the configuration for backup. When it tries to sync the mongodb deployment, it is giving an error as it could not find mongod in /opt/mongodb/mms/mongodb-releases.
Here is the error below, this is the error thrown by backup daemon- backup.jobs.590664394c9f732dd6c88b7c.tax
Failed to start mongod
com.xgen.svc.brs.util.GenericMongoManager$MongoManagerConfigException: Could not find mongod. Found /opt/mongodb/mms/mongodb-releases/mongodb-linux-x86_64-rhel70-3.2.8/bin, but did not find /opt/mongodb/mms/mongodb-releases/mongodb-linux-x86_64-rhel70-3.2.8/bin/mongod.
com.xgen.svc.brs.util.GenericMongoManager$Purpose.<init>(GenericMongoManager.java:132)
com.xgen.svc.brs.util.MongoManager$MongoDPurpose.<init>(MongoManager.java:331)
com.xgen.svc.brs.util.MongoManager$HeadPurpose.<init>(MongoManager.java:477)
com.xgen.svc.brs.job.ReplicaSetJob.startMongo(ReplicaSetJob.java:103)
com.xgen.svc.brs.job.ReplicaSetJob.startMongo(ReplicaSetJob.java:80)
com.xgen.svc.brs.job.IncrementalSyncJob.doWork(IncrementalSyncJob.java:82)
Can you please show how can it be resolved?
Restart the backup daemon (only the backup daemon).
It should normally download the mongodb releases at the very beginning.
2017-05-05T00:00:47.558+0000 [MongoDbReleaseAutoDownload Thread] INFO com.xgen.svc.brs.autoDownloader.MongoDbReleaseAutoDownloadThread [MongoDbReleaseAutoDownloadThread.java.runInternal:105] - MongoDbReleaseAutoDownload run completed.
If it does not work, look at the daemon log (by default /opt/mongodb/mms/logs/daemon.log).

Hawq init failed -- "postgres" is needed by initdb

After I build incubator-hawq on Centos7.1, I tried to init it. But the error below occurs:
20160516:18:10:43:002036 hawqinit.sh:host-172-16-0-105:hawqadmin-[INFO]:-Loading hawq_toolkit...
ALTER ROLE
20160516:18:10:44:001766 hawq_init:host-172-16-0-105:hawqadmin-[INFO]:-20160516:18:10:43:002036 hawqinit.sh:host-172-16-0-105:hawqadmin-[INFO]:-Loading hawq_toolkit...
20160516:18:10:44:001766 hawq_init:host-172-16-0-105:hawqadmin-[INFO]:-Master init successfully
20160516:18:10:44:001766 hawq_init:host-172-16-0-105:hawqadmin-[INFO]:-Init segments in list: ['hawq-master']
20160516:18:10:44:001766 hawq_init:host-172-16-0-105:hawqadmin-[DEBUG]:-Start to init segment on node 'hawq-master'
20160516:18:10:44:001766 hawq_init:host-172-16-0-105:hawqadmin-[INFO]:-Total segment number is: 1
fgets failure: Success
The program "postgres" is needed by initdb but was either not found in the same directory as "/usr/hawq/bin/initdb" or failed unexpectedly.
Check your installation; "postgres -V" may have more information.
20160516:18:10:45:002318 hawqinit.sh:host-172-16-0-105:hawqadmin-[ERROR]:-Postgres initdb failed
20160516:18:10:45:002318 hawqinit.sh:host-172-16-0-105:hawqadmin-[ERROR]:-Segment init failed on host-172-16-0-105
20160516:18:10:45:001766 hawq_init:host-172-16-0-105:hawqadmin-[INFO]:-20160516:18:10:45:002318 hawqinit.sh:host-172-16-0-105:hawqadmin-[ERROR]:-Postgres initdb failed
20160516:18:10:45:002318 hawqinit.sh:host-172-16-0-105:hawqadmin-[ERROR]:-Segment init failed on host-172-16-0-105
20160516:18:10:45:001766 hawq_init:host-172-16-0-105:hawqadmin-[ERROR]:-HAWQ init failed on hawq-master
20160516:18:10:46:001766 hawq_init:host-172-16-0-105:hawqadmin-[INFO]:-0 of 1 segments init successfully
20160516:18:10:46:001766 hawq_init:host-172-16-0-105:hawqadmin-[ERROR]:-Segments init failed, exit
When I type the command, the below shows:
[hawqadmin#host-172-16-0-105 hawqAdminLogs]$ postgres -V
postgres (HAWQ) 8.2.15
Any advice? Thanks!
If "postgres -V" works, that means the postgres binary is good.
Before you do "hawq init cluster", please make sure:
1) $GPHOME in greenplum_path.sh is correctly set to the directory of hawq binary, i.e, /usr/hawq in your case
2) source $GPHOME/greenplum_path.sh
3) check if initdb and postgres binary is in $GPHOME/bin
From the error you pasted above, 2 possible causes:
(1) The binary postgres called is not /usr/hawq/bin/postgres, You can use which postgres to check the path.
(2) The dependent lib for postgres may be wrong. You can use ldd for linux or otool for mac to print all dependent lib paths, and check them.
Moreover, if any error when init hawq, please check log in ~/hawqAdminLogs/, you may find out the specific error message.
Hope it will help you to find out the root cause.
Recently I faced same error while initializing cluster.
Postgres -V showed correct version, which postgres showed /usr/local/hawq/bin/postgres, also the path was already set, still faced above error.
Finally resolved by setting LD_LIBRARY_PATH to /usr/local/hawq/lib/ and sourced it via .bashrc file.
Looks like you might have installed hawq binaries in different directory . Please check the following
1.Make sure you have all the right PATH set
Check hawq initdb binaries are there in /usr/hawq/bin/ directory
make sure you have successed compile hawq and install them
check postgres is in the same dir with initdb
if there are more than 1 postgres in your pc, make sure the path of postgres(the same dir with initdb) is in your PATH.

NFS mount points are going off/NFS compound failed for server mashost

We have an application in solaris during specific test case we will generate heap dump which will be written in to the server at specific path during this case we are getting following error in trace file
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /ossrc/upgrade/JREheapdumps/java_pid16092.hprof ...
Dump file is incomplete: I/O error
and in /var/adm/messages we could see
Oct 28 13:00:10 ossuas2 nfs: [ID 733954 kern.info] NOTICE: [NFS4][Server: mashost][Mntpt: /ossrc/upgrade]NFS server mashost not
responding; still trying
Oct 28 13:02:53 ossuas2 nfs: [ID 733954 kern.info] NOTICE: [NFS4][Server: mashost][Mntpt: /usr/local]NFS server mashost not
responding; still trying
Oct 28 13:04:53 ossuas2 nfs: [ID 733954 kern.info] NOTICE: [NFS4][Server: mashost][Mntpt: /etc/opt/ericsson]NFS server mashost not
responding; still trying
Can anyone please help here why we are getting this problem and can any tell us can an application cause this impact on mashost ..????
First things first, check out the NFS service w/ svcbundle and svcs -- when it crashes, run:
# svcs -x nfs/client
on the client, and
# svcs -x nfs/server
on the server. I would expect one or both to be in a "maintenance" state. (You may see it fails to start properly at all). If it is in a maintenance mode, you should see a row marked "Reason:" that says why.
You might see "offline" -- in that case, startd will attempt to reboot the service multiple times and, if it fails after five attempts or hangs indefinitely, places it into "maintenance" state and stops restarting.
Check the logs in
/var/svc/log/<service-name FMRI>.log
There will be one on your client machine under "network-nfs-client:default" (probably, may have a name other than 'default' if it's been changed manually), and one on the server under "network-nfs-server:default"
See what you can glean from those.
svcbundle is all the time taking snapshots as backups of services, so you can try reverting to one of those.
# svcs -s nfs/server:default
svc:/network/nfs/server:default> listsnap
svc:/network/nfs/server:default> revert start [name_of_snapshot]
svc:/network/nfs/server:default> quit
# svcadm refresh nfs/server:default
# svcadm restart nfs/server:default
Make sure to include the ":default" tag, or if you saw a different tag from "svcs nfs/server" include it, that name defines an instance of the service, every running service is an instance.
If the process is failing to boot, you might have to look at the XML manifest under /lib/svc/manifest/network/nfs/ -- inside, you'll see dependencies (and services dependent on this one), then "exec_method"s, which define how the service starts, stops and restarts.
Instead of snapshots, you can can also restore it to default: use svccfg -s <FMRI> delete to clear it, then svcadm refresh <FMRI> and svcadm enable <FMRI>.
If the service is in maintenance state, once you've isolated and fixed the problem, you can manually clear that state by running svcadm clear <FMRI>.

python-memcache memcached -- I installed on centos virtualbox but it get/set never seem to work

I'm using python. I did a yum install memcached followed by a easy_install python-memcached
I used the simple test program from the Help(memcache). When I wasn't getting the proper answers I threw in some print statements:
[~/test]$ cat m2.py
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
x = mc.set("some_key", "Some value")
print 'Just set a key and value into the cache (suposedly)'
value = mc.get("some_key")
print 'Just retrieved that value from the cache using the key'
print 'X %s' % x
print 'Value %s' % value
[~/test]$ python m2.py
Just set a key and value into the cache (suposedly)
Just retrieved that value from the cache using the key
X 0
Value None
[~/test]$
The question now is, what have I failed to do in my installation? It appears to be working from an API perspective but it fails to put anything into the memcache share area.
I'm using a virtualbox vm running centos
[~]# cat /proc/version
Linux version 2.6.32-358.6.2.el6.i686 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Thu May 16 18:12:13 UTC 2013
Is there a daemon that is supposed to be running? I don't see an obvious named one when I do a ps.
I tried to get pylibmc installed on my vm but was unable to find a working installation so for now will see if I can get the above stuff working first.
I discovered if i ran straight from the python console GUI i get a bit more output if I set debug=1
>>> mc = memcache.Client(['127.0.0.1:11211'], debug=1)
>>> mc.stats
{}
>>> mc.set('test','value')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
0
>>> mc.get('test')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
When I try to use per the example telnet to connect to the port i get a connection refused:
[root#~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root#~]#
I tried the instructions I found on the net for configuring telnet so localhost wouldn't be disabled:
vi /etc/xinetd.d/telnet
service telnet
{
flags = REUSE
socket_type = stream
wait = no
user = root
server = /usr/sbin/in.telnetd
log_on_failure += USERID
disable = no
}
And then ran the commands to restart the service(s):
service iptables stop
service xinetd stop
service iptables start
service xinetd start
service iptables stop
I ran with both cases (iptables started and stopped) but it has no effect. So I am out of ideas. What do I need to do to make it so the PORT will be allowed? if that is the problem?
Or is there a memcached service that needs to be running that needs to open up the port ?
well this is what it took to get it working: ( a series of manual steps )
1) su -
cd /var/run
mkdir memcached # this was missing
In the memcached file I added "-l 127.0.0.1" to the OPTIONS statement. It's apparently a listen option. Do this for steps 2 & 3. I'm not certain which file is actually used at runtime.
2) cd /etc/sysconfig
cp memcached memcached.old
vi memcached
3) cd /etc/init.d
cp memcached memcached.old
vi memcached
4) Try some commands to see if the server starts now
/etc/init.d/memcached start
/etc/init.d/memcached status
/etc/init.d/memcached stop
/etc/init.d/memcached restart
I tried opening a browser, but it never seemed to actually display anything so I don't really know how valid this approach is. I'm not running apache or anything like this so perhaps its not relevant to my cause. Perhaps I would have to supply a ?key=blah or something.
5) http://127.0.0.1:11211
6) Now it should be ready to go. If one runs the test shown with the following it should work. At least it did for me. doing the help(memcache) will display a simple program. just paste that in and it should work just fine.
[~]$ python
>>> import memcache
>>> help(memcache)

Unable to run Mongo shell (Mac)

I'm new to web development and I wanted to get started with some RoR (using Locomotive CMS).
One of the things Locomotive asks for is to have Mongodb. I installed using homebrew by following this link http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/
It installs fine but then im not able to run it!
When I type 'mongo' on terminal I get the following output :
"MongoDB shell version: 2.4.3
connecting to: test
Mon May 6 11:12:28.927
JavaScript execution failed:
Error: couldn't connect to server
127.0.0.1:27017 at src/mongo/shell/mongo.js:L112
exception: connect failed"
BACKGROUND TO HELP DEBUGGING ( on Terminal) :
1.When I type in mongod I get the following :
"all output going to: /usr/local/var/log/mongodb/mongo.log"
Ownership of mongo.log :
-rw-r--r-- 1 username admin 22133 May 6 11:13 mongo.log
2.When I input mongod --fork I get the following :
about to fork child process, waiting until server is ready for connections.
forked process: 77566
all output going to: /usr/local/var/log/mongodb/mongo.log
ERROR: child process failed, exited with error number 100
3.Typing mongod --help gives the following warning:
* WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
4.I have a folder called data (which acts as amongodb database, is this where it should be?)in root (PATH : /data) Ownership of data folder :
"drwxr-xr-x 3 username wheel 102 Apr 23 21:38 data"
5.Checking if ports are free: lsof -i :27017. Ive also tried to check for a running mongo process using activity montior and found zilch!
No output
6.Ive also tried : mongo --repair. Dint help!
Ive been stuk on this for a while, I've looked at most responses on stackoverflow and searched around to find a solution to this but nothing has helped so far!
UPDATE:
When I tried to start the mongo shell, I was getting the following l
log message from mongo.log:
5/6/13 1:33:27.616 PM com.apple.launchd:
(org.mongodb.mongod[79133])
open("/private/var/log/mongodb/output.log", ...): Permission denied
So I did a chmod777 for the particular folder and the shell launches!
Although I still get a warning when it launches as:
Server has startup warnings:
Mon May 6 13:33:27.693 [initandlisten]
Mon May 6 13:33:27.693 [initandlisten]
** WARNING: soft rlimits too low.
Number of files is 256, should be at least 1000
Any idea how I can silence these warnings?
To get the information you need to determine the cause of failure you need to look in (and post for us) the output from /usr/local/var/log/mongodb/mongo.log when it is trying to start.
However, the most common reason for the failure is the lack of the default database path - at /data/db. Either create that folder (and don't forget to make sure your user has permission to read/write to it) or specify a different path with the --dbpath option.
UPDATE: as you have since found, bad permissions on the log file can cause the issue, in a similar way to bad permissions on the data path.
In terms of the warning, the information you need is here:
https://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1
It is just that though, a warning - you can run MongoDB without an issue with those limits as long as it is not under heavy load. So, if this is a development environment, unless you plan on load testing, you should be fine