Erlang store pool of mongodb connections - mongodb

How can i store pool of mongodb connections in erlang.
in one function i create pool of db connections
Replset = {<<"rs1">>, [{localhost, 27017}]},
Pool = resource_pool:new (mongo:rs_connect_factory (Replset), Count),
in second function i need to get connection from pool.
{ok, Conn} = resource_pool:get (Pool).
But i can not do this, because i created pool in another function.
I try to use records, but without success (
What i need to do to get it a bit global cross module?

I think the best solution is to use gen_server and store data in its state.
Another way is to use ets table.

Some points to guide you in the correct direction:
Erlang has no concept of a global variable. Bindings can only exist inside a process and that binding will be local to that process. Furthermore,
Inside a process, there is no process-local bindings, only bindings which are local to the current scope.
Note that this is highly consistent with most functional programming styles.
To solve your problem, you need a process to keep track of your resource pool for you. Clients then call this process and asks for a resource. The resource manager can then handle, via, monitors what should happen should the client die when it has a checked out resource.
The easiest way to get started is to grab devinus/poolboy from Github and look into that piece of code.

Related

Creating and accessing global state with plpgsql procedures

I'd like to call a custom EVAL function from within postgres to a redis instance (so I'm not sure if redis_fdw will work in this case.)
The other option is to use plpython with a redis library
https://pypi.org/project/redis/.
import redis
r = redis.Redis(host='localhost', port=6379, db=0)
r.evalsha(<SHA>, <ARGS>)
I'd like to be able to create a redis pool so that each call to redis doesn't have to reestablish a connection. Is there any way to set some sort of global state within postgres itself so that a pool can be initiated on the first call and then subsequent calls just use the pool?
redis_fdw only mentions support for a handful of built-in data structures, so it doesn't look like much help for an EVAL.
A global connection pool will probably not be easy. Postgres runs each connection in its own server process, and you can't pass anything between them without allocating it in shared memory (which I don't see a PL/Python API for, so I think it'd need to be done in C).
But if you can afford to create one Redis connection per Postgres connection, you should be able to reuse it across PL/Python calls by putting it in a shared dictionary:
The global dictionary SD is available to store private data between repeated calls to the same function. The global dictionary GD is public data, that is available to all Python functions within a session; use with care.
So, something like
if 'redis' not in SD:
SD['redis'] = redis.Redis(host='localhost', port=6379, db=0)
SD['redis'].evalsha(<SHA>, <ARGS>)

Why are most posix named objects designed with unlink?

Most POSIX named objects (or all?) have unlink functions. e.g:
shm_unlink
mq_unlink
They all have in common, that they remove the name of the object from the system, causing next opens to fail or create a new object.
Why is this designed like this? I know, this is connected to the "everything is a file" policy, but why not delete the file on close? Would you do this the same if you create a new interface?
I think, this has a big drawback. Say, we have a server process and several client processes. If any process unlinks the object (by mistake) all new clients would not find the server. (This can be prohibited by user permissions on the according file, but still...)
Would it not be better, if it had reference counting and the name would be removed automatically when the last object is closed? Why would you want to keep it open?
Because they are low level tools that could be used when performance matters. Deleting the object when it is not used to create it again on next use has a (slight) performance penalty against keeping it alive.
I once used a named semaphore that I used to synchronize accesses to a spool with various producers and consumers. I used an init module to create the named semaphore that was called as part of the boot process, and all other processes knew that the well known semaphore should exist.
If you want a more programmer friendly way that creates the object on demand and destroys it when it is no longer used, you can build a higher level library and encapsulate the creation/unlink operations in it. But if the system call included it, it would not be possible to build a user level library avoiding it.
Would it not be better, if it had reference counting and the name would be removed automatically when the last object is closed?
No.
Because unlink() can fail and because always unlinking a resource that can be shared between processes when all processes merely close that resource simply doesn't fit the paradigm of a shared resource.
You don't demolish a roller coaster just because there's no one waiting in line to ride it again at that moment in time.

Updating a global variable / resource in Reducers via a socket to the Hadoop job tracker

I need a global variable that can be read / set simultaneously in reducers (I am aware of the bottleneck and performance issues of such design). I tried to use Hadoop Configuration get()/set(), but I found that configuration properties need to be set before submitting the mapreduce job, and that using Configuration.set() within a reducer doesn't actually update the global property's value.
Closest thing I could find is to use a global parameter than can be read/set via a socket to the Hadoop job tracker, but I failed to find any resources illustrating how this can be done.
My question is:
How to read/set a global variable via a socket to the Hadoop job
tracker.
Is there are another way to keeping a global variable
(regardless of performance degradation)
Notes:
Hadoop counters cannot work for me since they don't support a set() function
DistributedCache won't work since it is used for distributing read-only data, while in my case I need reducers to update the value of a global variable that can be read simultaneously by other running reduce tasks.
In general the place to store reliably-consistent global variables in a Hadoop cluster is Apache ZooKeeper.
That said, it is rare to require mutable global variables in a MapReduce job. If you share your use case there is a good chance there is a simpler solution.

How do you use a Connection Pool with ActiveJDBC instead of just Base.open & close everytime?

Right now I'm just writing methods that does Base.open(), do some operation, and then Base.close(). However, this is extremely inefficient especially when lots of these method calls are made, so I'd like to use some kind of connection pool with ActiveJDBC. Is there a way to use something like a connection pool with ActiveJDBC, or some other way to approach this problem instead of doing Base.open() and Base.close() each time I access the DB?
Thanks in advance! :)
Here is an example of using ActiveJDBC with a pool: https://github.com/javalite/activejdbc/blob/master/activejdbc/src/test/java/org/javalite/activejdbc/C3P0PoolTest.java
However, you still need to open and close a connection, except you are getting a connection from pool and returning back to pool. If you provide more information on what type of application you develop, I can potentially provide better advice
--
igor

What's the right way to use erlang mongodb driver to make a db action?

I am trying official mongodb erlang driver. I read the doc and there's still something I can't understand. Hope anyone can tell me what's the right way to use it:
Every time I make an action, I just write something as follows:
{ok, Conn} = mongo:connect ({localhost, 27017}).
mongo:do (safe, master, Conn, test, fun() ->
mongo:save (foo, {'_id', 1, bbb,22, y,2}),
mongo:save (foo, {'_id', 4, bbb,22, y,2}) end).
mongo:disconnect().
Is this a right way? Everytime I finished a db action, Conn seems died. Or should I keep the Conn rather than disconnect it to reuse it for next time? There's no global variable to keep Conn so the only way I can think out to make it is to use something like gen_server and keep Conn in its State for reuse. Is this the right way?
I also see there's a connect_factory method. But I can't quite figure out the proper way to deal with it. Is connect_factory a better way than connect to deal with large amount of db actions? And how to get a workable Conn by using connect_factory?
This is a question not quite related with mongodb. I want to give every user a unique number when they visit. So I saved a counter in db, when a user visit, the counter is added by 1 and returned to the user as the unique number. But I always have concern about two users get the same number with reading db at the same time. How can I make a unique counter increased by 1 using mongodb?
Thanks very much!
Conn may die if your action had an error, otherwise it should be reusable. Note, mongo:do returns {failure, X} on error. Don't disconnect Conn if you want to reuse it. BTW, disconnect takes Conn as an argument. If you want to keep Conn in a global var then consult the Erlang documentation for ways to do this (eg. ets table, gen_server, etc). Also, check out the mvar module in this driver for a simple gen_server that holds a value.
connect_factory is used for maintaining a reusable pool of connections. Read the tutorial at http://github.com/TonyGen/mongodb-erlang. At the bottom it shows how to use connect_factory with resource_pool.
You can atomically increment using MongoDB. See http://www.mongodb.org/display/DOCS/Atomic+Operations, in particular findAndModify.