Creating and accessing global state with plpgsql procedures - postgresql

I'd like to call a custom EVAL function from within postgres to a redis instance (so I'm not sure if redis_fdw will work in this case.)
The other option is to use plpython with a redis library
https://pypi.org/project/redis/.
import redis
r = redis.Redis(host='localhost', port=6379, db=0)
r.evalsha(<SHA>, <ARGS>)
I'd like to be able to create a redis pool so that each call to redis doesn't have to reestablish a connection. Is there any way to set some sort of global state within postgres itself so that a pool can be initiated on the first call and then subsequent calls just use the pool?

redis_fdw only mentions support for a handful of built-in data structures, so it doesn't look like much help for an EVAL.
A global connection pool will probably not be easy. Postgres runs each connection in its own server process, and you can't pass anything between them without allocating it in shared memory (which I don't see a PL/Python API for, so I think it'd need to be done in C).
But if you can afford to create one Redis connection per Postgres connection, you should be able to reuse it across PL/Python calls by putting it in a shared dictionary:
The global dictionary SD is available to store private data between repeated calls to the same function. The global dictionary GD is public data, that is available to all Python functions within a session; use with care.
So, something like
if 'redis' not in SD:
SD['redis'] = redis.Redis(host='localhost', port=6379, db=0)
SD['redis'].evalsha(<SHA>, <ARGS>)

Related

Postgres SET configuration variables with TypeORM, how to persist variable during the life of the connection between calls

I have Express server with TypeORM and I use Row Security Policies https://www.postgresql.org/docs/current/ddl-rowsecurity.html. I have issue with postgres configuration settings, because postgres uses connection pool under the hood and when I set configuration values on default connection I couldn't be sure it persist during life of all calls. I have considered few approaches on implement row security policies in TypeORM:
Avallone library https://github.com/Avallone-io/rls
Own implementation
EventSubscribers https://orkhan.gitbook.io/typeorm/docs/listeners-and-subscribers
Avallone has too many new dependencies.
EventSubscribers seems to be one of the approaches with less code except I really depend on using something like #BeforeLoad (as we have https://orkhan.gitbook.io/typeorm/docs/listeners-and-subscribers#afterload). And my question is how can we achieve this?
I've seen the similar discussion here Postgres SET runtime variables with TypeORM, how to persist variable during the life of the connection between calls

what are other approaches instead of using collect() in spark scala

My piece of scala code looks like,
val orgIncInactive = orgIncLatest.filter("(LD_TMST != '' and LD_TMST is not null)").select("ORG_ID").rdd
orgIncInactive.collect.foreach(p => DenormalizedTablesMethodsUtil.hbaseTablePurge(p(0).toString, tableName, connection))
Is there any way that I can avoid using collect() here?
I tried various possibilities but I am ending up with Serializable errors.
Thanks.
Depends what you are trying to do, and what is ultimately causing the serialization error. It looks like you are trying to pass some kind of database connection into the anonymous function. That's generally going to fail for a couple of reasons. Even if you made the connection object itself serializable -- say by sub-classing the object and implementing Serializable -- database connections are not something you can share between the driver and the executors.
Instead, what you need to do is to create the connection object on each of the executors, and then use the local connection object instead of one defined in the driver. There are a couple of ways to accomplish this.
One is to use mapPartitions, which allows you to instantiate objects locally before the logic is run. See here for more on this.
Another possibility is to create a singleton object that on initialization sets a connection object to null or None. Then, you would define a method in the object like "getConnection" that checks whether the connection has been initialized. If not, it initializes the connection. Then either way it returns the valid connection.
I use the second approach more than the first, because it limits initialization to only once per executor instead of forcing it to happen once per partition.

Can I notify and listen inside PostgreSQL procedures (functions)?

I have checked the documentation (for my version 9.3):
http://www.postgresql.org/docs/9.3/static/sql-notify.html
http://www.postgresql.org/docs/9.3/static/sql-listen.html
I have read multiple discussions and blogs about notify-listen in postgres.
They all use a listening process / interface, which is not implemented inside "classic" procedure (which is function in postgres anyway). They implement it in different language and/or environment, external to the postgres server (e.g. perl, C#).
My question: Is it possible to implement listen(ing) inside postgres function (language plpgsql) ? If not (what I assume from not being to able to find such topic / example), can someone explain a bit, why it can't be done, or maybe why it does not make sense to do it that way ?
It is a classic use case for Trigger Function in case you depend on a single table: https://www.postgresql.org/docs/current/plpgsql-trigger.html

Erlang store pool of mongodb connections

How can i store pool of mongodb connections in erlang.
in one function i create pool of db connections
Replset = {<<"rs1">>, [{localhost, 27017}]},
Pool = resource_pool:new (mongo:rs_connect_factory (Replset), Count),
in second function i need to get connection from pool.
{ok, Conn} = resource_pool:get (Pool).
But i can not do this, because i created pool in another function.
I try to use records, but without success (
What i need to do to get it a bit global cross module?
I think the best solution is to use gen_server and store data in its state.
Another way is to use ets table.
Some points to guide you in the correct direction:
Erlang has no concept of a global variable. Bindings can only exist inside a process and that binding will be local to that process. Furthermore,
Inside a process, there is no process-local bindings, only bindings which are local to the current scope.
Note that this is highly consistent with most functional programming styles.
To solve your problem, you need a process to keep track of your resource pool for you. Clients then call this process and asks for a resource. The resource manager can then handle, via, monitors what should happen should the client die when it has a checked out resource.
The easiest way to get started is to grab devinus/poolboy from Github and look into that piece of code.

Namespaces in Redis?

Is it possible to create namespaces in Redis?
From what I found, all the global commands (count, delete all) work on all the objects. Is there a way to create sub-spaces such that these commands will be limited in context?
I don't want to set up different Redis servers for this purpose.
I assume the answer is "No", and wonder why wasn't this implemented, as it seems to be a useful feature without too much overhead.
A Redis server can handle multiple databases... which are numbered. I think it provides 32 of them by default; you can access them using the -n option to the redis-cli shell scripting command and by similar options to the connection arguments or using the "select()" method on its connection objects. (In this case .select() is the method name for the Python Redis module ... I presume it's named similarly for other libraries and interfaces.
There's an option to control how many separate databases you want in the configuration file for the Redis server daemon as well. I don't know what the upper limit would be and there doesn't seem to be a way to dynamically change that (in other words it seems that you'd have to shutdown and restart the server to add additional DBs). Also, there doesn't seem to be an away to associate these DB numbers with any sort of name nor to impose separate ACLS, nor even different passwords, to them. Redis, of course, is schema-less as well.
If you are using Node, ioredis has transparent key prefixing, which works by having the client prepend a given string to each key in a command. It works in the same way that Ruby's redis-namespace does. This client-side approach still puts all your keys into the same database, but at least you add some structure, and you don't have to use multiple databases or servers.
var fooRedis = new Redis({ keyPrefix: 'foo:' });
fooRedis.set('bar', 'baz'); // Actually sends SET foo:bar baz
If you use Ruby you can look at these gems:
https://github.com/resque/redis-namespace
https://github.com/jodosha/redis-store