Recognize function calls as proxy between frontend and backend - postgresql

Based on this question I found out I can't expect to find function calls as proxy using 'F' identifier.
I researched more and now I think postgreSQL uses extended query protocol to send parametric statements such as functions. (correct me if I'm wrong)
I know executing prepared-statements will use extended query protocol too (and I think there should be more statements which use this protocol)
So I think this shouldn't be the way to recognize a function call as a proxy. Is there any other way? Is it possible at all? Or am I completely lost and misunderstood everything?
by the way by recognizing function call I mean I need to recognize a function call and investigate passed parameters and function name as a third party in frontend-backend connection (between client and server)

PostgreSQL uses the extended query protocol for statements with parameters, but these parameters are not necessarily the same as function parameters.
To use examples using the C API, if you send a function call like this:
res = PQexec(conn, "SELECT myfun(42)");
it will be sent in a packet with the 'Q' (Query) identifier.
If you send it like this:
const Oid types[1] = { INT4OID };
const char * const vals[1] = { "42" };
res = PQexecParams(conn, "SELECT myfun($1)", 1, types, vals, NULL, NULL, 0);
the query will be sent in a packet with a 'P' (Parse) identifier and the parameter with be sent in the following 'B' (Bind) packet.
But that has nothing to do with function calls, the same will happen for a query like this:
SELECT val FROM mytab WHERE id = $1;
You say your goal is to listen to the frontend-backend protocol and filter out all function calls and the parameters passed to them.
That is a very difficult task; essentially it means that you have to parse the SQL statements sent to the server, and that means you have to duplicate at least part of PostgreSQL's parser. You'll have to remember some parsed statements and inject parameters from bind packets.
In addition to that, two question come to my mind:
Does it matter that this way you won't be able to catch function calls issued inside functions?
How do you determine the passed parameter in cases like this:
SELECT myfun((SELECT val FROM tab WHERE id = 42));
or this:
SELECT myfun(CAST(otherfun(42) || '0' AS integer));
Maybe there's a better way to achieve what you want, like hacking the PostgreSQL server and extracting your information at the place where the function is actually called.

Related

Call a kdb function passing another function as argument using sendSync method of qpython(kdb)

In the KDB server, we have two functions defined as
q)t:{0N!x[`min]; 0N!x[`max];}
q).up.map:{[keyList; valueList] keyList!valueList}
The KDB server, does not allow to pass dict()!() as an argument directly to a function, rather one has to use .up.map.
Calling t function from kdb would be like
q)t[.up.map[`min`max;10 20]]
I want to call the t function from qpython sendSync() method passing another function .up.map[`min`max;10 20] as an argument to t.
Unfortunately, I cannot find a solution in the qptyhon doc - https://qpython.readthedocs.io/en/latest/qpython.html#qpython.qconnection.QConnection.sendSync
Error -
When I tried sendSync() method, below error is raised -
qpython.qtype.QException: b'['
The KDB server, does not allow to pass dict()!() as an argument directly to a function, rather one has to use .up.map.
May I know why this is so? It's not a bad idea to challenge the original design before looking for workarounds. If dictionary were allowed as its parameter, it could have been as simple as
params = QDictionary(qlist(numpy.array(["min", "max"], dtype=numpy.string_), qtype=QSYMBOL_LIST),
qlist(numpy.array([10, 20], dtype=numpy.int64), qtype=QLONG_LIST))
with qconnection.QConnection(host='localhost', port=5000) as q:
q.sendSync("t", params)
If you want to do what you can do in q console via qpython, it's actually also simple: you pass the same string over. Effectively it's the same mechanism as a q client passing a string via IPC to the server, where the string is parsed and evaluated. Here you need to convert the input to the given string format in your Python code, thus not as clean as the above (although it looks more verbose).
with qconnection.QConnection(host='localhost', port=5000) as q:
q.sendSync("t[.up.map[`min`max;10 20]]")
Maybe you can use a lambda for this. That way it's just the arguments that need be serialized:
q.sendSync("{t[.up.map[x;y]]}", qlist(["min", "max"], qtype=QSYMBOL_LIST), [10, 20])
If that's not permitted, you could create it as a named wrapper function on the kdb side, which could be.
Alternatively, you could format your call with arguments as a string. A bit hacky; but workable for simple input.
q.sendSync(f"t[.up.map[`{'`'.join(['min', 'max'])};{' '.join(['10', '20'])}]]")

The best way to make Loopback GET query parameters safe?

I'm using Loopback 3.x with loopback-connector-mongodb 3.x
Apparently, many built-in endpoints can take a filter parameter which can be defined as JSON and it may contain complex filter conditions like order, where, skip etc.. For example:
GET /api/activities/findOne?filter={"where":{"id":1234}}
However, although Loopback uses an ORM, it seems the request parameters are passed to mongodb without any kind of pre-processing or escaping.
I was unable to find any Loopback API method which could help me make the value safe.
If, for example, the user puts Javascript into the where filter, or adds unsupported characters (such as null char), the app throws an exception an exits.
I'm sure I'm missing something here. What's the best way to make the value passed in filter={...} safe?
Is there a built-in API method for this?
If there isn't, are there any node module I could use?
Thanks for the help guys!
I turned off Javascript in MongoDB and wrote a little middleware to handle escaping. This is registered in middleware.json and thus it runs before every request and escapes the values.
module.exports = function createEscaper(options) {
return function queryEscape(req, res, next) {
if (req.query.filter) {
// escape various things and update the value..
}
next();
};
}
But I find it really strange that neither the MongoDB connector nor Loopback itself provides any solution for this. I mean, these parameters are defined and handled in framework code. It's kinda crazy there is no built-in escaping whatsoever.
You can create a mixin which validates the JSON you receive.
For example:
module.exports = function(Model, options) {
Model.beforeRemote('find', (ctx, instance, next) => {
// Validate the filter object
}
}

Dynamic arg types for a python function when embedding

I am adding to Exim an embedded python interpreter. I have copied the embedded perl interface and expect python to work the same as the long-since-coded embedded perl interpreter. The goal is to allow the sysadmin to do complex functions in a powerful scripting language (i.e. python) instead of trying to use exim's standard ACL commands because it can get quite complex to do relatively simple things using the exim ACL language.
My current code as of the time of this writing is located at http://git.exim.org/users/tlyons/exim.git/blob/9b2c5e1427d3861a2154bba04ac9b1f2420908f7:/src/src/python.c . It is working properly in that it can import the sysadmin's custom python code, call functions in it, and handle the returned values (simple return types only: int, float, or string). However, it does not yet handle values that are passed to a python function, which is where my question begins.
Python seems to require that any args I pass to the embedded python function be explicitly cast to one of int,long,double,float or string using the c api. The problem is the sysadmin can put anything in that embedded python code and in the c side of things in exim, I won't know what those variable types are. I know that python is dynamically typed so I was hoping to maintain that compliance when passing values to the embedded code. But it's not working that way in my testing.
Using the following basic super-simple python code:
def dumb_add(a,b):
return a+b
...and the calling code from my exim ACL language is:
${python {dumb_add}{800}{100}}
In my c code below, reference counting is omitted for brevity. count is the number of args I'm passing:
pArgs = PyTuple_New(count);
for (i=0; i<count; ++i)
{
pValue = PyString_FromString((const char *)arg[i]);
PyTuple_SetItem(pArgs, i, pValue);
}
pReturn = PyObject_CallObject(pFunc, pArgs);
Yes, **arg is a pointer to an array of strings (two strings in this simple case). The problem is that the two values are treated as strings in the python code, so the result of that c code executing the embedded python is:
${python {dumb_add}{800}{100}}
800100
If I change the python to be:
def dumb_add(a,b):
return int(a)+int(b)
Then the result of that c code executing the python code is as expected:
${python {dumb_add}{800}{100}}
900
My goal is that I don't want to force a python user to manually cast all of the numeric parameters they pass to an embedded python function. Instead of PyString_FromString(), if there was a PyDynamicType_FromString(), I would be ecstatic. Exim's embedded perl parses the args and does the casting automatically, I was hoping for the same from the embedded python. Can anybody suggest if python can do this arg parsing to provide the dynamic typing I was expecting?
Or if I want to maintain that dynamic typing, is my only option going to be for me to parse each arg and guess at the type to cast it to? I was really really REALLY hoping to avoid that approach. If it comes to that, I may just document "All parameters passed are strings, so if you are actually trying to pass numbers, you must cast all parameters with int(), float(), double(), or long()". However, and there is always a comma after however, I feel that approach will sour strong python coders on my implementation. I want to avoid that too.
Any and all suggestions are appreciated, aside from "make your app into a python module".
The way I ended up solving this was by finding out how many args the function expected, and exit with an error if the number of args passed to the function didn't match. Rather than try and synthesize missing args or to simply omit extra args, for my use case I felt it was best to enforce matching arg counts.
The args are passed to this function as an unsigned char ** arg:
int count = 0;
/* Identify and call appropriate function */
pFunc = PyObject_GetAttrString(pModule, (const char *) name);
if (pFunc && PyCallable_Check(pFunc))
{
PyCodeObject *pFuncCode = (PyCodeObject *)PyFunction_GET_CODE(pFunc);
/* Should not fail if pFunc succeeded, but check to be thorough */
if (!pFuncCode)
{
*errstrp = string_sprintf("Can't check function arg count for %s",
name);
return NULL;
}
while(arg[count])
count++;
/* Sanity checking: Calling a python object requires to state number of
vars being passed, bail if it doesn't match function declaration. */
if (count != pFuncCode->co_argcount)
{
*errstrp = string_sprintf("Expected %d args to %s, was passed %d",
pFuncCode->co_argcount, name, count);
return NULL;
}
The string_sprintf is a function within the Exim source code which also handles memory allocation, making life easy for me.

Function returning 2 types based on input in Perl. Is this a good approach?

i have designed a function which can return 2 different types based on the input parameters
ex: &Foo(12,"count") -> returns record count from DB for value 12
&Foo(12,"details") -> returns resultset from DB for value 12 in hash format
My question is is this a good approach? in C# i can do it with function overload.
Please think what part of your code gets easier by saying
Foo(12, "count")
instead of
Foo_count(12)
The only case I can think of is when the function name ("count") itself is input data. And even then do you probably want to perform some validation on that, maybe by means of a function table lookup.
Unless this is for an intermediate layer that just takes a command name and passes it on, I'd go with two separate functions.
Also, the implementation of the Foo function would look at the command name and then just split into a private function for every command anyway, right?
additionally you might consider the want to make foo return the details if you wanted a list.
return wantarray ? ($num, #details) : $num;

Getting UTF-8 Request Parameter Strings in mod_perl2

I'm using mod_perl2 for a website and use CGI::Apache2::Wrapper to get the request parameters for the page (e.g. post data). I've noticed that the string the $req->param("parameter") function returns is not UTF-8. If I use the string as-is I can end up with garbled results, so I need to decode it using Encode::decode_utf8(). Is there anyway to either get the parameters already decoded into UTF-8 strings or loop through the parameters and safely decode them?
To get the parameters already decoded, we would need to override the behaviour of the underlying class Apache2::Request from libapreq2, thus losing its XS speed advantage. But that is not even straightforward possible, as unfortunately we are sabotaged by the CGI::Apache2::Wrapper constructor:
unless (defined $r and ref($r) and ref($r) eq 'Apache2::RequestRec') {
This is wrong OO programming, it should say
… $r->isa('Apache2::RequestRec')
or perhaps forego class names altogether and just test for behaviour (… $r->can('param')).
I say, with those obstacles, it's not worth it. I recommend to keep your existing solution that decodes parameters explicitly. It's clear enough.
To loop over the request parameters, simply do not pass an argument to the param method and you get a list of the names. This is documented (1, 2), please read more carefully.