Is that able to have a transaction-wise global variable in PostgreSQL? - postgresql

The situation is that: I have a function F1 which write into a buffer and the buffer will be write to external files when the function F1's fcinfo->flinfo->fn_mcxt is released; I also have a function F2 which depends on those external files, so when it runs, I want to make sure that all existing function F1's buffer (in this trasaction) have already write out to the external files. The two functions are independent, except when they are performed together.
As the result, I want that buffer to be a global variable in this transaction, so F2 can check it and decide if it is empty. If it is not empty, F2 can write it out manually.

Since PostgreSQL uses multiprocessing, one backend cannot see the global variables in another backend. You could write a _PG_init function that creates a shared memory segment for that purpose (see pg_stat_statements). That requires that your library is added to shared_preload_libraries.
A simpler alternative might be to use the LISTEN / NOTIFY facility of PostgreSQL to synchronize different backends.

Related

how to dump signals inside a task or function

I tried to use debug_access=all in vcs command line, but it seems I still can't dump the signals declared inside the task(). Is there any args I need to use?
AFAIK, tools do not yet allow you to dump variables with automatic lifetimes. This is because they come in and out of existence. Also, because of re-entrant behavior from threads or recursion, there might be multiple instances of the same named variable.
If these signals are inside a class method, you might be able to move them outside and make them class members. Otherwise you should be able to declare them as static variables as long as there is no re-entrant behavior.
dave_59 is correct, there's no way to do this. For tasks at least you can drive signals that are declared elsewhere inside the task. And you'll be able to monitor those signals. Functions I don't believe this is possible to modify external signals from inside a function. All inputs/outputs must be declared for a function.

Understanding higher level call to systemcalls

I am going through the book by Galvin on OS . There is a section at the end of chapter 2 where the author writes about "adding a system call " to the kernel.
He describes how using asmlinkage we can create a file containing a function and make it qualify as a system call . But in the next part about how to call the system call he writes the following :
" Unfortunately, these are low-level operations that cannot be performed using C language statements and instead require assembly instructions. Fortunately, Linux provides macros for instantiating wrapper functions that contain the appropriate assembly instructions. For instance, the following C program uses the _syscallO() macro to invoke the newly defined system call:
Basically , I want to understand how syscall() function generally works . Now , what I understand by Macros is a system for text substitution .
(Please correct me If I am wrong)
How does a macro call an assembly language instruction ?
Is it so that syscallO() when compiled is translated into the address(op code) of the instruction to execute a trap ?(But this somehow doesn't fit with concept or definition of macros that I have )
What exactly are the wrapper functions that are contained inside and are they also written in assembly language ?
Suppose , I want to create a function of my own which performs the system call then what are the things that I need to do . Do , I need to compile it to generate the machine code for performing Trap instructions ?
Man, you have to pay $156 dollars to by the thing, then you actually have to read it. You could probably get an VMS Internals and Data Structures book for under $30.
That said, let me try to translate that gibberish into English.
System calls do not use the same kind of linkage (i.e. method of passing parameters and calling functions) that other functions use.
Rather than executing a call instruction of some kind, to execute a system service, you trigger an exception (which in Intel is bizarrely called an interrupt).
The CPU expects the operating system to create a DISPATCH TABLE and store its location and size in a special hardware register(s). The dispatch table is an array of pointers to handlers for exceptions and interrupts.
Exceptions and interrupts have numbers so, when exception or interrupt number #1 occurs, the CPU invokes the 2d exception handler (not #0, but #1) in the dispatch table in kernel mode.
What exactly are the wrapper functions that are contained inside and are they also written in assembly language ?
The operating system devotes usually one (but sometimes more) exceptions to system services. You need to do some thing like this in assembly language to invoke a system service:
INT $80 ; Explicitly trigger exception 80h
Because you have to execute a specific instruction, this has to be one in assembly language. Maybe your C compiler can do assembly language in line to call system service like that. But even if it could, it would be a royal PITA to have to do it each time you wanted to call a system service.
Plus I have not filled in all the details here (only the actual call to the system service). Normally, when you call functions in C (or whatever), the arguments are pushed on the program stack. Because the stack usually changes when you enter kernel mode, arguments to system calls need to be stored in registers.
PLUS you need to identify what system service you want to execute. Usually, system services have numbers. The number of the system service is loaded into the first register (e.g., R0 or AX).
The full process when you need to invoke a system service is:
Save the registers you are going to overwrite on the stack.
Load the arguments you want to pass to the system service into hardware registers.
Load the number of the system service into the lowest register.
Trigger the exception to enter kernel mode.
Unload the arguments returned by the system service from registers
Possibly do some error checking
Restore the registers you saved before.
Instead of doing this each time you call a system service, operating systems provide wrapper functions for high level languages to use. You call the wrapper as you would normally call a function. The wrapper (in assembly language) does the steps above for you.
Because these wrappers are pretty much the same (usually the only difference is the result of different numbers of arguments), wrappers can be created using macros. Some assemblers have powerful macro facilities that allow a single macro to define all wrappers, even with different numbers of arguments.
Linux provides multiple _syscall C macros that create wrappers. There is one for each number of arguments. Note that these macros are just for operating system developers. Once the wrapper is there, everyone can use it.
How does a macro call an assembly language instruction ?
These _syscall macros have to generate in line assembly code.
Finally, note that these wrappers do not define the actual system service. That has to be set up in the dispatch table and the system service exception handler.

Adding Multiple Values to a Variable in MATLAB

I have to work with a lot of data and run the same MATLAB program more than once, and every time the program is run it will store the data in the same preset variables. The problem is, every time the program is run the values are overwritten and replaced, most likely because all the variables are type double and are not a matrix. I know how to make a variable that can store multiple values in a program, but only when the program is run once.
This is the code I am able to provide:
volED = reconstructVolume(maskAlignedED1,maskAlignedED2,maskAlignedED3,res)
volMean = (volED1+volED2+volES3)/3
strokeVol = volED-volES
EF = strokeVol/volED*100
The program I am running depends on a ton more MATLAB files that I cannot provide at this moment, however I believe the double variables strokeVol and EF are created at this instant. How do I create a variable that will store multiple values and keep adding the values every time the program is run?
The reason your variables are "overwritten" with each run is that every function (or standalone program) has its own workspace where the local variables are located, and these local variables cease to exist when the function (or standalone program) returns/terminates. In order to preserve the value of a variable, you have to return it from your function. Since MATLAB passes its variables by value (rather than reference), you have to explicitly provide a vector (or more generally, an array) as input and output from your function if you want to have a cumulative set of data in your calling workspace. But it all depends on whether you have a function or a deployed program.
Assuming your program is a function
If your function is now declared as something like
function strokefraction(inputvars)
you can change its definition to
function [EFvec]=strokefraction(inputvars,EFvec)
%... code here ...
%volES initialized somewhere
volED = reconstructVolume(maskAlignedED1,maskAlignedED2,maskAlignedED3,res);
volMean = (volED1+volED2+volES3)/3;
strokeVol = volED-volES;
EF = strokeVol/volED*100;
EFvec = [EFvec; EF]; %add EF to output (column) vector
Note that it's legal to have the same name for an input and an output variable. Now, when you call your function (from MATLAB or from another function) each time, you add the vector to its call, like this:
EFvec=[]; %initialize with empty vector
for k=1:ndata %simulate several calls
inputvar=inputvarvector(k); %meaning that the input changes
EFvec=strokefraction(inputvar,EFvec);
end
and you will see that the size of EFvec grows from call to call, saving the output from each run. If you want to save several variables or arrays, do the same (for arrays, you can always introduce an input/output array with one more dimension for this purpose, but you probably have to use explicit indexing instead of just shoving the next EF value to the bottom of your vector).
Note that if your input/output array eventually grows large, then it will cost you a lot of time to keep allocating the necessary memory by small chunks. You could then choose to allocate the EFvec (or equivalent) array instead of initializing it to [], and introduce a counter variable telling you where to overwrite the next data points.
Disclaimer: what I said about the workspace of functions is only true for local variables. You could also define a global EFvec in your function and on your workspace, and then you don't have to pass it in and out of the function. As I haven't yet seen a problem which actually needed the use of global variables, I would avoid this option. Then you also have persistent variables, which are basically globals with their scope limited to their own workspace (run help global and help persistent in MATLAB if you'd like to know more, these help pages are surprisingly informative compared to usual help entries).
Assuming your program is a standalone (deployed) program
While I don't have any experience with standalone MATLAB programs, it seems to me that it would be hard to do what you want for that. A MathWorks Support answer suggests that you can pass variables to standalone programs, but only as you would pass to a shell script. By this I mean that you have to pass filenames or explicit numbers (but this makes sense, as there is no MATLAB workspace in the first place). This implies that in order to keep a cumulative set of output from your program you would probably have to store those in a file. This might not be so painful: opening a file to append the next set of data is straightforward (I don't know about issues such as efficiency, and anyway this all depends on how much data and how many runs of your function we're talking about).

Loading Variables into Functions from Structs in Matlab

Say I have a project which is comprised of:
A main script that handles all of the running of my simulation
Several smaller functions
A couple of structs containing the data
Within the script I will be accessing the functions many times within for loops (some over a thousand times within the minute long simulation). Each function is also looking for data contained with a struct files as part of their calculations, which are usually parameters that are fixed over the course of the simulation, however need to be varied manually between runs to observe the effects.
As typically these functions form the bulk of the runtime I'm trying to save time, as my simulation can't quite run at real-time as it stands (the ultimate goal), and I lose alot of time passing variables/parameters around functions. So I've had three ideas to try and do this:
Load the structs in the main simulation, then pass each variable in turn to the function in the form of a large argument (the current solution).
Load the structs every time the function is called.
Define the structs as global variables.
In terms of both the efficiency of the system (most relevent as the project develops), and possibly as I'm no expert programmer from a "good practise" perspective what is the best solution for this? Is there another option that I have not considered?
As mentioned above in the comments - the 1st item is best one.
Have you used the profiler to find out where you code takes most of its time?
profile on
% run your code
profile viewer
Note: if you are modifying your input struct in your child functions -> this will take more time, but if you are just referencing them then that should not be a problem.
Matlab does what's known as a "lazy copy" when passing arguments between functions. This means that it passes a pointer to the data to the function, rather than creating a new instance of that data, which is very efficient memory- and speed-wise. However, if you make any alteration to that data inside the subroutine, then it has to make a new instance of that argument so as to not overwrite the argument's value in the main function. Your response to matlabgui indicates you're doing just that. So, the subroutine may be making an entire new struct every time it's called, even though it's only modifying a small part of that struct's values.
If your subroutine is altering a small part of the array, then your best bet is to just pass that small part to it, then assign your outputs. For instance,
[modified_array] = somesubroutine(struct.original_array);
struct.original_array=modified_array;
You can also do this in just one line. Conceptually, the less data you pass to the subroutine, the smaller the memory footprint is. I'd also recommend reading up on in-place operations, as it relates to this.
Also, as a general rule, don't use global variables in Matlab. I have not personally experienced, nor read of an instance in which they were genuinely faster.

hosting simple python scripts in a container to handle concurrency, configuration, caching, etc

My first real-world Python project is to write a simple framework (or re-use/adapt an existing one) which can wrap small python scripts (which are used to gather custom data for a monitoring tool) with a "container" to handle boilerplate tasks like:
fetching a script's configuration from a file (and keeping that info up to date if the file changes and handle decryption of sensitive config data)
running multiple instances of the same script in different threads instead of spinning up a new process for each one
expose an API for caching expensive data and storing persistent state from one script invocation to the next
Today, script authors must handle the issues above, which usually means that most script authors don't handle them correctly, causing bugs and performance problems. In addition to avoiding bugs, we want a solution which lowers the bar to create and maintain scripts, especially given that many script authors may not be trained programmers.
Below are examples of the API I've been thinking of, and which I'm looking to get your feedback about.
A scripter would need to build a single method which takes (as input) the configuration that the script needs to do its job, and either returns a python object or calls a method to stream back data in chunks. Optionally, a scripter could supply methods to handle startup and/or shutdown tasks.
HTTP-fetching script example (in pseudocode, omitting the actual data-fetching details to focus on the container's API):
def run (config, context, cache) :
results = http_library_call (config.url, config.http_method, config.username, config.password, ...)
return { html : results.html, status_code : results.status, headers : results.response_headers }
def init(config, context, cache) :
config.max_threads = 20 # up to 20 URLs at one time (per process)
config.max_processes = 3 # launch up to 3 concurrent processes
config.keepalive = 1200 # keep process alive for 10 mins without another call
config.process_recycle.requests = 1000 # restart the process every 1000 requests (to avoid leaks)
config.kill_timeout = 600 # kill the process if any call lasts longer than 10 minutes
Database-data fetching script example might look like this (in pseudocode):
def run (config, context, cache) :
expensive = context.cache["something_expensive"]
for record in db_library_call (expensive, context.checkpoint, config.connection_string) :
context.log (record, "logDate") # log all properties, optionally specify name of timestamp property
last_date = record["logDate"]
context.checkpoint = last_date # persistent checkpoint, used next time through
def init(config, context, cache) :
cache["something_expensive"] = get_expensive_thing()
def shutdown(config, context, cache) :
expensive = cache["something_expensive"]
expensive.release_me()
Is this API appropriately "pythonic", or are there things I should do to make this more natural to the Python scripter? (I'm more familiar with building C++/C#/Java APIs so I suspect I'm missing useful Python idioms.)
Specific questions:
is it natural to pass a "config" object into a method and ask the callee to set various configuration options? Or is there another preferred way to do this?
when a callee needs to stream data back to its caller, is a method like context.log() (see above) appropriate, or should I be using yield instead? (yeild seems natural, but I worry it'd be over the head of most scripters)
My approach requires scripts to define functions with predefined names (e.g. "run", "init", "shutdown"). Is this a good way to do it? If not, what other mechanism would be more natural?
I'm passing the same config, context, cache parameters into every method. Would it be better to use a single "context" parameter instead? Would it be better to use global variables instead?
Finally, are there existing libraries you'd recommend to make this kind of simple "script-running container" easier to write?
Have a look at SQL Alchemy for dealing with database stuff in python. Also to make script writing easier for dealing with concurrency look into Stackless Python.