The below example demonstrates using dask delayed funtions (ref) from within postgres plpython while using "plpy.execute" (ref) to query the database.
It returns an error:
ERROR: spiexceptions.StatementTooComplex: stack depth limit exceeded
Any idea on what I'm doing wrong? I'm guessing it has something to do with delayed function's async nature and plpy.execute not liking that.
Versions:
postgresql 15
postgres's embedded python version 3.8
Example:
DO
LANGUAGE plpython3u
$$
# https://docs.dask.org/en/stable/dataframe-sql.html#delayed-functions
from dask import delayed
#delayed
def do_it():
rv = plpy.execute("select 2 as a") # << max stack depth limit
return 0
plpy.info(do_it().compute())
$$;
Traceback:
ERROR: spiexceptions.StatementTooComplex: stack depth limit exceeded
HINT: Increase the configuration parameter "max_stack_depth" (currently 7168kB), after ensuring the platform's stack depth limit is adequate.
CONTEXT: Traceback (most recent call last):
PL/Python anonymous code block, line 10, in <module>
plpy.info(do_it().compute())
PL/Python anonymous code block, line 313, in compute
PL/Python anonymous code block, line 598, in compute
PL/Python anonymous code block, line 88, in get
PL/Python anonymous code block, line 510, in get_async
PL/Python anonymous code block, line 318, in reraise
PL/Python anonymous code block, line 223, in execute_task
PL/Python anonymous code block, line 118, in _execute_task
PL/Python anonymous code block, line 7, in do_it
rv = plpy.execute("select 2 as a") # << max stack depth limit
PL/Python anonymous code block
Updates:
added traceback
made more minimal
Related
I want to manage a subprocess with the subprocess module, and I need to pipe a (really) large numbers of lines to the child stdin. I'm creating the input with a generator, and passing onto the subprocess like this:
def my_gen (end): # simplified example
for i in range(0, end):
yield f"line {i}"
with subprocess.Popen(["command", "-o", "option_value"], # simplified example
stdin = subprocess.PIPE, stdout = sys.stdout, stderr = sys.stderr) as process:
for line in my_gen(1e7):
process.stdin.write(line.encode()) # This is apparently not safe
out, err = process.communicate() # out and err will be None,
# but this closes the process gracefully, which "with" does too
This results in a Broken Pipe Error, although it does't happen all the time on every machine I've tried:
Traceback (most recent call last):
File "my_script", line 170, in <module>
process.stdin.write(line.encode())
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "path/tolib/python3.8/subprocess.py", line 171, in <module>
File "path/tolib/python3.8/subprocess.py", line 914, in __exit__
self.stdin.close()
BrokenPipeError: [Errno 32] Broken pipe
So, what's the safe way to pass input line by line from a generator to a subprocess?
Edit: I've been getting suggestions about using communicate, which is of course in the docs. That answers how to communicate safely, but it doesn't accept a generator as input.
Edit2: as Booboo pointed out, the example will throw a runtime error (not the one I was finding in my code), the call to range should be range(0, int(end)) so my_gen can accept numbers in 1e7 notation.
First of all, if you want stdout and stderr to not be piped, then either do not specify these arguments to the Popen call at all or specify their values as None, the default value if not specified (but do not specify these as sys.stdout and sys.stderr).
Why not? Looking at the source for the Popen.communicate method I can see that there is special optimized code for the case where there is only one non-None argument and when that argument is the sysin argument then Popen.communicate is implemented by simply doing a write of the past input string to the pipe and ignores any BrokenPipeError error that might occur. But by passing the stdout and stderr arguments as you are, I suspect that communicate is confused and is now starting threads to handle the processing and this is ultimately intermittently leading to your exception.
Now I believe that you can execute your writes without using communicate and also ignore the BrokenPipeError. When I tried the following code (substituting my own command being executed by Popen that writes what is being piped in to a file and using text mode), I, in fact, did not encounter any BrokenPipeError exceptions (nor do I expect to with the proper setting of stdout and stderr). So I can't swear to whether the output will still be correct if such an exception should occur.
As an aside, the range built-in function does not take a float object (at least not for me), so I don't know how you are able to specify 1e7.
I have also modified the code to add terminating newline characters at the end of each line and to process in text mode, but you should not feel constrained to do so.
import subprocess
import sys
def my_gen (end): # simplified example
for i in range(0, end):
yield f"line {i}\n"
with subprocess.Popen(["command", "-o", "option_value"], stdin=subprocess.PIPE, text=True) as process: # simplified example
for line in my_gen(10_000_000):
try:
process.stdin.write(line)
except BrokenPipeError as e:
pass
out, err = process.communicate()
Docs say to use .communicate:
Warning: Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate
If I write this:
clc
clear
close all
format long
fprintf( 1, 'Starting...\n' )
function results = do_thing()
results = 1;
end
results = do_thing()
And run it with Octave, it works correctly:
Starting...
results = 1
But if I try to run it with Matlab 2017b, it throws this error:
Error: File: testfile.m Line: 13 Column: 1
Function definitions in a script must appear at the end of the file.
Move all statements after the "do_thing" function definition to before the first local function
definition.
Then, if I fix the error as follows:
clc
clear
close all
format long
fprintf( 1, 'Starting...\n' )
results = do_thing()
function results = do_thing()
results = 1;
end
It works correctly on Matlab:
Starting...
results =
1
But now, it stopped working with Octave:
Starting...
error: 'do_thing' undefined near line 8 column 11
error: called from
testfile at line 8 column 9
This problem was explained on this question: Run octave script file containing a function definition
How to fix it without having to create a separate and exclusive file for the function do_thing()?
Is this issue fixed on some newer version of Matlab as 2019a?
The answer is in the comments, but for the sake of clarity:
% in file `do_thing.m`
function results = do_thing()
results = 1;
end
% in your script file
clc; clear; close all; format long;
fprintf( 1, 'Starting...\n' );
results = do_thing();
Accompanying explanatory rant:
The canonical and safest way to define functions is to define them in their own file, and make this file accessible in octave / matlab's path.
Octave has supported 'dynamic' function definitions (i.e. in the context of a script or the command-line) since practically forever. However, for the purposes of compatibility, since matlab did not support this, most people did not use it, and quite sensibly relied on the canonical way instead.
Matlab has recently finally introduced dynamic function definitions too, but has opted to implement them explicitly in a way that breaks compatibility with octave, as you describe above. (rant: this may be a coincidence and an earnest design decision, but I do note that it also happens to go against prior matlab conventions regarding nested functions, which were allowed to be defined anywhere within their enclosing scope).
In a sense, nothing has changed. Matlab was incompatible with advanced octave functionality, and now that it has introduced its own implementation of this functionality, it is still incompatible. This is a blessing in disguise. Why? Because, if you want intercompatible code, you should rely on the canonical form and good programming practices instead of littering your scripts with dynamic functions, which is what you should be doing anyway.
Octave's implementation of local functions in scripts is different from Matlab's. Octave requires that local functions in scripts be defined before their use. But Matlab requires that local functions in scripts all be defined at the end of the file.
So you can use local functions in scripts on both applications, but you can't write a script that will work on both. So just use functions if you want code that will work on both Matlab and Octave.
Examples:
Functions at end
disp('Hello world')
foo(42);
function foo(x)
disp(x);
end
In Matlab R2019a:
>> myscript
Hello world
42
In Octave 5.1.0:
octave:1> myscript
Hello world
error: 'foo' undefined near line 2 column 1
error: called from
myscript at line 2 column 1
Functions before use
disp('Hello world')
function foo(x)
disp(x);
end
foo(42);
In Matlab R2019a:
>> myscript
Error: File: myscript.m Line: 7 Column: 1
Function definitions in a script must appear at the end of the file.
Move all statements after the "foo" function definition to before the first local function definition.
In Octave 5.1.0:
octave:2> myscript
Hello world
42
How it works
Note that technically the functions here in Octave are not "local functions", but "command-line functions". Instead of defining a function that is local to the script, they define global functions that come into existence when the function statement is evaluated.
The following code works on both Matlab and Octave:
if exist('do_nothing') == 0
disp('function not yet defined, run script again')
else
do_nothing
end
%====
function results = do_nothing()
results = 1;
end
When run on octave, the first attempt exits with the message, but subsequent attempts succeed. On Matlab, it works the first time. While this works on both platforms, it is less than ideal, since it requires that much of the script code be placed inside an "if" statement block.
This is a simple Matlab code that I'm trying to execute.
function result = scale(img, value)
result = value .* img;
end
dolphin = imread('dolphin.png')
imshow(scale(dolphin, 1.5));
The error says:
Error: File: scale.m Line: 5 Column: 1
This statement is not inside any function.
(It follows the END that terminates the definition of the function "scale".)
What am I doing wrong here?
scale.m is a function M-file because it begins with the keyword function. The part up to end is the definition of the function. When you call scale at the MATLAB command line, it executes the code in the function. The stuff that comes after end is not part of the function, and hence cannot be executed.
If you intended to write a script with a private function scale that you want to use only within this script, then put the lines of code that read and display dolphin at the top of the file. The private functions should come after the script part. This syntax is supported since MATLAB R2016b.
Otherwise, move the dolphin code to a different M-file, which would be a simple script M-file without any function definitions. This script can then use scale, which would call the function in the file scale.m.
A third alternative, keeping all code in the same file, is to not use a script at all, and put the script code inside a function:
function f % just a random name
dolphin = imread('dolphin.png')
imshow(scale(dolphin, 1.5));
end
function result = scale(img, value)
result = value .* img;
end
(The function name doesn't need to match the file name, although the MATLAB editor will warn you if these names don't match.)
I'm trying to get started with Matlab / Octave and having a difficult time figuring how to organize a program into functions. Currently I'm trying to write a simple program that adds two numbers together and displays the result, with the adding being done by a function. I would have figured this would have worked:
% test.m
close all;
clear all;
num1 = 2;
num2 = 2;
result = myAdd(num1, num2);
disp(result); % this should display 4 ??
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function retval = myAdd(var1, var2)
retval = var1 + var2;
end
Running the above with Octave 4.0.0, I get the following errors:
error: 'myAdd' undefined near line 7 column 10
error: called from
test at line 7 column 8
I have tried also putting the function first and the test part second, and also putting the function in a separate file and having a main.m file in the same directory call the myAdd function, all result in errors.
So here are my questions:
-Does Matlab / Octave have a main equivalent ??
-How does the interpreter know where to start? Does it automatically go to the first line in the program, or is there a certain function name you can use to make it start with that function as function main() is in C/C++ ??
-In a Python program of significant size, my usual practice is to organize things as follows:
# some_python_program.py
import abc
import xyz
###################################################################################################
def main():
# stuff to get program started here
# end main
###################################################################################################
def function1():
# specific function here
# end function
###################################################################################################
def function2():
# specific function here
# end function
###################################################################################################
if __name__ == "__main__":
main()
Is there a way to do the equivalent in Matlab/Octave ??
If somebody could provide some direction as to a main equivalent and/or how to organize functions in Matlab/Octave please advise, thanks.
Matlab/Octave can be a bit confusing in this way if you're coming from a language like python. In order to define a function (without using anonymous functions), you need to create a separate file with the name of that function, which can then be called using the command line.
For example, you would like to create a function called myadd. You should create a file named myadd.m whose contents will be:
function out = myadd(a,b)
out = a+b;
end
Then, as long as your file is on your path (save it to your MATLAB folder or put it in your current working directory), you can call it from the Command Window as follows:
>> myadd(5,6)
ans =
11
Only one function will be made publicly available per file (the one whose name matches the file name). However, you can still define multiple functions per file if you plan to use only that function. For example, if you have a file named foo.m, you can do the following:
function out = foo(a,b)
out = fun(a,b);
end
function out = fun(a,b)
out = a * b;
end
This will allow you to call foo(5,6) from the Command Window, but fun(5,6) will result in an error: Undefined function or variable 'fun'.
Read more about local functions and nested functions.
Hope this is helpful!
I wanted to write my results into a file which is produced in a recursive subroutine. And I also wanted to assign the data(read) in the file to an array in my main program in fortran90.
program permutations
implicit none
call generate (position_min)
open(unit=20, file="a.dat", status="old")
do i=1,720
read(20,*)(G(i,j),j=1,6)
end do
contains
recursive subroutine generate (position)
implicit none
integer, intent (in) :: position
integer :: value
if (position > position_max) then
open(unit=20, file="a.dat", status="unknown")
write (20, *) permutation
else
call generate(position+1)
end if
end subroutine generate
end program permutations
This program gives me the following runtime error.
At line 19 of file p2.f90 (unit = 20, file = 'a.dat')
Fortran runtime error: End of file
How do I fix this?
I think the answer is primarily my comment to the question. If you look at your code (neglecting the undeclared variable issue), in particular the if-statement of the recursive subroutine, you should note that you have
if (position > position_max) then
open(unit=20, file="a.dat", status="unknown")
write (20, *) permutation
else
call generate(position+1)
end if
that is, you are only writing to file if position > position_max. Satisfying this condition writes one line to a.dat and then completes all of the previous if statements. What you probably meant to have was it writing to file each time through the recursive loop; to do that, you would want something like
open(20,file="a.dat",status="unknown")
write(20,*) permutation
close(20)
if(position > position_max) then
return
else
call generate(position+1)
endif
In running this, I found I was getting 2 extra lines (due to writing at position=position_min and at position=position_max). You probably could tweak that to get exactly 720, but I think that this part is irrelevant because you can change your read loop to the following
i=1
do
read(20,*,iostat=ierr) G(i,:)
if(ierr/=0) exit
i = i+1
enddo
A normal read returns an iostat of 0 and an end-of-file returns -1, so as long as you can read you will continue the loop and break when the EOF is found.
After fixing up the undeclared variables, adding the close(20) statement, and adjusting as I commented above, I had no problems writing and reading in the recursive subroutine.