Haskell: Testing connection availability N times with a delay (scotty to mongodb) - mongodb

I have a stupid problem with scotty web app and mongodb service starting in the right order.
I use systemd to start mongodb first and then the scotty web app. It does not work for some reason. The app errors out with connect: does not exist (Connection refused) from the mongodb driver meaning that the connection is not ready.
So my question. How can I test the connection availability say three times with 0.5s interval and only then error out?
This is the application main function
main :: IO ()
main = do
pool <- createPool (runIOE $ connect $ host "127.0.0.1") close 1 300 5
clearSessions pool
let r = \x -> runReaderT x pool
scottyT 3000 r r basal
basal :: ScottyD ()
basal = do
middleware $ staticPolicy (noDots >-> addBase "static")
notFound $ runSession
routes
Although the app service is ordered after mongodb service the connection to mongodb is still unavailable during the app start up. So I get the above mentioned error.
This is the systemd service file to avoid questions regarding the correct service ordering.
[Unit]
Description=Basal Web Application
Requires=mongodb.service
After=mongodb.service iptables.service network-online.target
[Service]
User=http
Group=http
WorkingDirectory=/srv/http/basal/
ExecStart=/srv/http/basal/bin/basal
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
I don't know why connection to mongodb is not available given the correct service order.
So I want to probe connection availability withing haskell code three times with 0.5s delay and then error out. How can I do it?
Thanks.

I guess from the functions you're using that you're using something like mongoDB 1.5.0.
Here, connect returns something in the IOE monad, which is an alias for ErrorTIOErrorIO.
So the best approach is to use the retrying mechanisms ErrorT offers. As it's an instance of MonadPlus, we can just use mplus if we don't care about checking for the specific error:
retryConnect :: Int -> Int -> Host -> IOE Pipe
retryConnect retries delayInMicroseconds host
| retries > 0 =
connect host `mplus`
(liftIO (threadDelay delayInMicroseconds) >>
retryConnect (retries - 1) delayInMicroseconds host)
| otherwise = connect host
(threadDelay comes from Control.Concurrent).
Then replace connect with retryConnect 2 500000 and it'll retry twice after the first failure with a 500,000 microsecond gap (i.e. 0.5s).
If you do want to check for a specific error, then use catchError instead and inspect the error to decide whether to swallow it or rethrow it.

Related

IBM BLUEMIX BLOCKCHAIN SDK-DEMO failing

I have been working with HFC SDK for Node.js and it used to work, but since last night I am having some problems.
When running helloblockchain.js only few times works, most time I get this error when it tries to enroll a new user:
E0113 11:56:05.983919636 5288 handshake.c:128] Security handshake failed: {"created":"#1484304965.983872199","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484304965.983866102","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
Error: Failed to register and enroll JohnDoe: Error
Other times, the enroll works and the failure appears deploying the chaincode:
Enrolled and registered JohnDoe successfully
Deploying chaincode ...
E0113 12:14:27.341527043 5455 handshake.c:128] Security handshake failed: {"created":"#1484306067.341430168","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306067.341421859","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
Failed to deploy chaincode: request={"fcn":"init","args":["a","100","b","200"],"chaincodePath":"chaincode","certificatePath":"/certs/peer/cert.pem"}, error={"error":{"code":14,"metadata":{"_internal_repr":{}}},"msg":"Error"}
Or:
Enrolled and registered JohnDoe successfully
Deploying chaincode ...
E0113 12:15:27.448867739 5483 handshake.c:128] Security handshake failed: {"created":"#1484306127.448692244","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306127.448668047","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
events.js:160
throw er; // Unhandled 'error' event
^
Error
at ClientDuplexStream._emitStatusIfDone (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:189:19)
at ClientDuplexStream._readsDone (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:158:8)
at readCallback (/usr/lib/node_modules/hfc/node_modules/grpc/src/node/src/client.js:217:12)
E0113 12:15:27.563487641 5483 handshake.c:128] Security handshake failed: {"created":"#1484306127.563437122","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"#1484306127.563429661","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_epoll_linux.c","file_line":948}]}
This code worked yesterday, so I don't know what could be happening.
Does anybody know how can I fix it?
Thanks,
Javier.
ibm-bluemix
blockchain
These types of intermittent issues are usually related to GRPC. An initial suggestion is to ensure that you are using at least GRPC version 1.0.0.
If you are using a Mac, then the maximum number of open file descriptors should be checked (using ulimit -n). Sometimes this is initially set to a low value such as 256, so increasing the value could help.
There are a couple of GRPC issues with similar symptoms.
https://github.com/grpc/grpc/issues/8732
https://github.com/grpc/grpc/issues/8839
https://github.com/grpc/grpc/issues/8382
There is a grpc.initial_reconnect_backoff_ms property that is mentioned in some of these issues. Increasing the value past the 1000 ms level might help reduce the frequency of issues. Below are instructions for how the helloblockchain.js file can be modified to set this property to a higher value.
Open the helloblockchain.js file in the Hyperledger Fabric Client example and find the enrollAndRegisterUsers function.
Add “grpc.initial_reconnect_backoff_ms": 5000 to the setMemberServicesUrl call.
chain.setMemberServicesUrl(ca_url, {
pem: cert, "grpc.initial_reconnect_backoff_ms": 5000
});
Add “grpc.initial_reconnect_backoff_ms": 5000 to the addPeer call.
chain.addPeer("grpcs://" + peers[i].discovery_host + ":" + peers[i].discovery_port,
{pem: cert, "grpc.initial_reconnect_backoff_ms": 5000
});
Note that setting the grpc.initial_reconnect_backoff_ms property may reduce the frequency of issues, but it will not necessarily eliminate all issues.
The connection to the eventhub that is made in the helloblockchain.js file can also be a factor. There is an earlier version of the Hyperledger Fabric Client that does not utilize the eventhub. This earlier version could be tried to determine if this makes a difference. After running git clone https://github.com/IBM-Blockchain/SDK-Demo.git, run git checkout b7d5195 to use this prior level. Before running node helloblockchain.js from a Node.js command window, the git status command can be used to check the code level that is being used.

wsadmin script timing out when executing against DMGR via SOAP

I'm attempting to start and stop an application on a single JVM via the wsadmin console since the Web UI for IBM BPM PS Adv. doesn't allow for that kind of operation. So, I have the following script:
https://gist.github.com/predatorian3/b8661c949617727630152cbe04f78d7e
and when I run it against the DMGR from the Cell Host, I receive the following errors.
[wasadmin#server01 ~]$ cat /usr/local/bin/Run_wsadmin.sh
#!/bin/bash
#
#
#
/opt/IBM/WebSphere/AppServer/bin/wsadmin.sh -lang jython -user serviceAccount -password password $*
[wasadmin#cessoapscrt00 ~]$ time Run_wsadmin.sh -f /opt/IBM/wsadmin/wsadmin_Restart_Application.py WPS00 CRT00WPS01 redirectResource_war
WASX7209I: Connected to process "dmgr" on node CRTDMGR using SOAP connector; The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[WPS00, CRT00WPS01, redirectResource_war]"
WASX7017E: Exception received while running file "/opt/IBM/wsadmin/wsadmin_Restart_Application.py"; exception information: com.ibm.websphere.management.exception.ConnectorException
org.apache.soap.SOAPException: [SOAPException: faultCode=SOAP-ENV:Client; msg=Read timed out; targetException=java.net.SocketTimeoutException: Read timed out]
real 3m21.275s
user 0m17.411s
sys 0m0.796s
So, I'm not specifying the connection types, and using the default, which is SOAP. However, upon reading about the other Connection Types, none of them seem any better, but I attribute that to IBM Documentation vagueness. Is there an option to increase the timeout wait periods, or turn it off, or is there a better connection type?
Also running this directly on the wsadmin console, it seems that it is hanging up on gathering the application manager string.
[wasadmin#server01 ~]$ Run_wsadmin.sh
WASX7209I: Connected to process "dmgr" on node CRTDMGR using SOAP connector; The type of process is: DeploymentManager WASX7031I: For help, enter: "print Help.help()"
wsadmin>appManager = AdminControl.queryNames('cell=CRTCELL,node=WPS00,type=ApplicatoinManager,process=CRT00WPS01,*')
WASX7015E: Exception running command: "appManager = AdminControl.queryNames('cell=CRTCELL,node=WPS00,type=ApplicationManager,process=CRT00WPS01,*')"; exception information:
com.ibm.websphere.management.exception.ConnectorException
org.apache.soap.SOAPException: [SOAPException: faultCode=SOAP-ENV:Client; msg=Read timed out; targetException=java.net.SocketTimeoutException: Read timed out]
wsadmin>
You can increase timeout value in {profile}/properties/soap.client.props
com.ibm.SOAP.requestTimeout=180
If you want to turn off timeout, modify com.ibm.SOAP.requestTimeout=0
Or if you want longer timeout you can modify the value 180 to something else.
Also about your query command, I noticed that you have a typo on the MBean type, you had type=ApplicatoinManager, it should be type=ApplicationManager
HERE YOU GO -- I had the same issue. I want to override the timeout prop temporarily. This worked like a champ. Make sure you follow below steps exactly.I did some mistakes and the prop did not passed, I figured out and it works.
Copy the soap.client.props file from /properties and give it a new name such as mysoap.client.props.
Edit mysoap.client.props and update the value of com.ibm.SOAP.requestTimeout as required
Create a new Java properties file soap_override.props and enter the following line:
com.ibm.SOAP.ConfigURL=file:/mysoap.client.props
Pass soap_override.props into wsadmin using the -p option: wsadmin -p soap_override.props...
REFERENCE:
https://www.ibm.com/developerworks/community/blogs/timdp/entry/avoiding_wsadmin_request_timeouts_the_neat_way32?lang=en

uWSGI, Flask, sqlalchemy, and postgres: SSL error: decryption failed or bad record mac

I'm trying to setup an application webserver using uWSGI + Nginx, which runs a Flask application using SQLAlchemy to communicate to a Postgres database.
When I make requests to the webserver, every other response will be a 500 error.
The error is:
Traceback (most recent call last):
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
context)
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 388, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
The above exception was the direct cause of the following exception:
sqlalchemy.exc.OperationalError: (OperationalError) SSL error: decryption failed or bad record mac
The error is triggered by a simple Flask-SQLAlchemy method:
result = models.Event.query.get(id)
uwsgi is being managed by supervisor, which has a config:
[program:my_app]
command=/usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini --catch-exceptions
directory=/path/to/my/app
stopsignal=QUIT
autostart=true
autorestart=true
and uwsgi's config looks like:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
The furthest that I can get is that it has something to do with uwsgi's forking. But beyond that I'm not clear on what needs to be done.
The issue ended up being uwsgi's forking.
When working with multiple processes with a master process, uwsgi initializes the application in the master process and then copies the application over to each worker process. The problem is if you open a database connection when initializing your application, you then have multiple processes sharing the same connection, which causes the error above.
The solution is to set the lazy configuration option for uwsgi, which forces a complete loading of the application in each process:
lazy
Set lazy mode (load apps in workers instead of master).
This option may have memory usage implications as Copy-on-Write semantics can not be used. When lazy is enabled, only workers will be reloaded by uWSGI’s reload signals; the master will remain alive. As such, uWSGI configuration changes are not picked up on reload by the master.
There's also a lazy-apps option:
lazy-apps
Load apps in each worker instead of the master.
This option may have memory usage implications as Copy-on-Write semantics can not be used. Unlike lazy, this only affects the way applications are loaded, not master’s behavior on reload.
This uwsgi configuration ended up working for me:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
# the fix
lazy = true
lazy-apps = true
As an alternative you might dispose the engine. This is how I solved the problem.
Such issues may happen if there is a query during the creation of the app, that is, in the module that creates the app itself. If that states, the engine allocates a pool of connections and then uwsgi forks.
By invoking 'engine.dispose()', the connection pool itself is closed and new connections will come up as soon as someone starts making queries again. So if you do that at the end of the module where you create your app, new connections will be created after the UWSGI fork.
I am running a flask app using gunicorn on Heroku. My application started exhibiting this problem when I added the --preload option to my Procfile. When I removed that option, my application resumed functioning as normal.
Not sure whether to add this as an answer to this question or ask a separate question and put this as an answer there. I was getting this exact same error for reasons that are slightly different from the people who have posted and answered. In my setup, I using gunicorn as a wsgi for a Flask application. In this application, I was offloading some intense database operations off to a celery worker. The error would come from the celery worker.
From reading a lot of the answers here and looking at the psycopg2 as well as sqlalchemy session documentation, it became apparent to me that it is a bad idea to share an SQLAlchemy session between separate processes (the gunicorn worker and the sqlalchemy worker in my case).
What ended up solving this for me was creating a new session in the celery worker function so it used a new session each time it was called and also destroying the session after every web request so flask used a session per request. The overall solution looked like this:
Flask_app.py
#app.teardown_appcontext
def shutdown_session(exception=None):
session.close()
celery_func.py
#celery_app.task(bind=True, throws=(IntegrityError))
def access_db(self,entity_dict, tablename):
with Session() as session:
try:
session.add(ORM_obj)
session.commit()
except IntegrityError as e:
session.rollback()
print('primary key violated')
raise e

python-memcache memcached -- I installed on centos virtualbox but it get/set never seem to work

I'm using python. I did a yum install memcached followed by a easy_install python-memcached
I used the simple test program from the Help(memcache). When I wasn't getting the proper answers I threw in some print statements:
[~/test]$ cat m2.py
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
x = mc.set("some_key", "Some value")
print 'Just set a key and value into the cache (suposedly)'
value = mc.get("some_key")
print 'Just retrieved that value from the cache using the key'
print 'X %s' % x
print 'Value %s' % value
[~/test]$ python m2.py
Just set a key and value into the cache (suposedly)
Just retrieved that value from the cache using the key
X 0
Value None
[~/test]$
The question now is, what have I failed to do in my installation? It appears to be working from an API perspective but it fails to put anything into the memcache share area.
I'm using a virtualbox vm running centos
[~]# cat /proc/version
Linux version 2.6.32-358.6.2.el6.i686 (mockbuild#c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Thu May 16 18:12:13 UTC 2013
Is there a daemon that is supposed to be running? I don't see an obvious named one when I do a ps.
I tried to get pylibmc installed on my vm but was unable to find a working installation so for now will see if I can get the above stuff working first.
I discovered if i ran straight from the python console GUI i get a bit more output if I set debug=1
>>> mc = memcache.Client(['127.0.0.1:11211'], debug=1)
>>> mc.stats
{}
>>> mc.set('test','value')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
0
>>> mc.get('test')
MemCached: MemCache: inet:127.0.0.1:11211: connect: Connection refused. Marking dead.
When I try to use per the example telnet to connect to the port i get a connection refused:
[root#~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
[root#~]#
I tried the instructions I found on the net for configuring telnet so localhost wouldn't be disabled:
vi /etc/xinetd.d/telnet
service telnet
{
flags = REUSE
socket_type = stream
wait = no
user = root
server = /usr/sbin/in.telnetd
log_on_failure += USERID
disable = no
}
And then ran the commands to restart the service(s):
service iptables stop
service xinetd stop
service iptables start
service xinetd start
service iptables stop
I ran with both cases (iptables started and stopped) but it has no effect. So I am out of ideas. What do I need to do to make it so the PORT will be allowed? if that is the problem?
Or is there a memcached service that needs to be running that needs to open up the port ?
well this is what it took to get it working: ( a series of manual steps )
1) su -
cd /var/run
mkdir memcached # this was missing
In the memcached file I added "-l 127.0.0.1" to the OPTIONS statement. It's apparently a listen option. Do this for steps 2 & 3. I'm not certain which file is actually used at runtime.
2) cd /etc/sysconfig
cp memcached memcached.old
vi memcached
3) cd /etc/init.d
cp memcached memcached.old
vi memcached
4) Try some commands to see if the server starts now
/etc/init.d/memcached start
/etc/init.d/memcached status
/etc/init.d/memcached stop
/etc/init.d/memcached restart
I tried opening a browser, but it never seemed to actually display anything so I don't really know how valid this approach is. I'm not running apache or anything like this so perhaps its not relevant to my cause. Perhaps I would have to supply a ?key=blah or something.
5) http://127.0.0.1:11211
6) Now it should be ready to go. If one runs the test shown with the following it should work. At least it did for me. doing the help(memcache) will display a simple program. just paste that in and it should work just fine.
[~]$ python
>>> import memcache
>>> help(memcache)

memcached apparently resetting connections

UPDATE:
It's not memcached, it's a lot of sockets in TIME_WAIT state:
% ss -s
Total: 2494 (kernel 2784)
TCP: 43323 (estab 2314, closed 40983, orphaned 0, synrecv 0, timewait 40982/0), ports 16756
BTW, I have modified previous version (below) to use Brad Fitz's memcache client and to reuse the same memcache connection:
http://dpaste.com/1387307/
OLD VERSION:
I have thrown together the most basic webserver in Go that has handler function doing only one thing:
retrieving a key from memcached
sending it as http response to client
Here's the code: http://dpaste.com/1386559/
The problem is I'm getting a lot of connection resets on memcached:
2013/09/18 20:20:11 http: panic serving [::1]:19990: dial tcp 127.0.0.1:11211: connection reset by peer
goroutine 20995 [running]:
net/http.func·007()
/usr/local/go/src/pkg/net/http/server.go:1022 +0xac
main.maybe_panic(0xc200d2e570, 0xc2014ebd80)
/root/go/src/http_server.go:19 +0x4d
main.get_memc_val(0x615200, 0x7, 0x60b5c0, 0x6, 0x42ee58, ...)
/root/go/src/http_server.go:25 +0x64
main.func·001(0xc200149b40, 0xc2017b3380, 0xc201888b60)
/root/go/src/http_server.go:41 +0x35
net/http.HandlerFunc.ServeHTTP(0x65e950, 0xc200149b40, 0xc2017b3380, 0xc201888b60)
/usr/local/go/src/pkg/net/http/server.go:1149 +0x3e
net/http.serverHandler.ServeHTTP(0xc200095410, 0xc200149b40, 0xc2017b3380, 0xc201888b60)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x16c
net/http.(*conn).serve(0xc201b9b2d0)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x765
created by net/http.(*Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x266
I have taken care to set Linux kernel networking in such way as not to get in the way (turning off SYN flooding protection etc).
...
...
And yet on testing with "ab" (below) I'm getting those errors.
ab -c 1000 -n 50000 "http://localhost:8000/"
There is no sign whatsoever anywhere I looked that it's the kernel (dmesg, /var/log).
I would guess that is because you are running out of sockets - you never close the memc here. Check with netstat while your program is running.
func get_memc_val(k string) []byte {
memc, err := gomemcache.Connect(mc_ip, mc_port)
maybe_panic(err)
val, _, _ := memc.Get(k)
return val
}
I'd use this go memcache interface if I were you - it was written by the author of memcached who now works for Google on Go related things.
Try memcache client from YBC library. Unlike gomemcache it opens and re-uses only a few connections to memcache server irregardless of the number of concurrent requests issued via the client. It achieves high performance by pipelining concurrent requests over a small number of open connections to the memcache server.
The number of connections to the memcache server can configured via ClientConfig.ConnectionsCount.