rasa: training is too slow - chatbot

I have rtx 3090 gpu and i9 12th gen processor. my training is not too large as well and yet the training time is too long. When I begin the training phase it says 24 cores available but limiting to safe limit of only 8 cores. NUMEXPR_MAX_THREADS not set.

In your terminal add the NUMEXPR_MAX_THREADS to your terminal.
You can do so by writing in your CLI: export NUMEXPR_MAX_THREADS="24" if you want to use all of them. This will work until you close your terminal. You can add it permanently to your terminal profile (.bash_profile, ~/.zshrc ...)
Regarding slow execution, that depends on your rasa config choices and the number of stories/rules.
Finally, you need to pass the param use_gpu = True in your config for TedPolicy t make it train TED faster.

Related

Stop execution when RAM is filled (i.e. avoid writing to Disk)

I have this problem:
I run some large calculations before going to sleep (or work).
When I return sometimes RAM is already filled and the program starts writing to Disk, which is a problem since then computer becomes almost non responsive, also the button "Interrupt the current operation" doesn't stop mserver.exe from executing a task.
This is what I saw 10 mins after I pressed the button "Interrupt the current operation":
Not to mention that calculations are probably like 100 or even 1000 times slower when it starts using Disk instead of RAM (so it's pointless anyway).
Another problem is that I was unable to save some variables to file since in Maple I couldn't type anything while mserver.exe was executing a task and after I killed the process mserver.exe I was still unable to save those variables since Maple commands don't work when connection to kernel is lost.
So, my question: can I make it so that mserver.exe won't use Disk at all (I mean from Maple alone, not by disabling page file in Windows) and just stop execution automatically when RAM is full (just like Classic Maple does when it hits 2GB limit)?
Also it would be nice to be able to limit Maple from using processor too much, for example up to 75% or so, so that I could work on that computer without problems.
You might experiment with a few of the options available for specifying limits on the Maple (kernel, mserver) engine.
In particular,
--init-reserve-mem=memorysize
(or, possible, the -T option). See here for some more detail:
https://www.maplesoft.com/support/help/MapleSim/view.aspx?path=maple
On Linux/OSX you could pass that in a call to the maple script that launches Maple. On MS-Windows you could add that to the command string/Property in the launcher (icon).
You might try setting it to a fraction of your total RAM, eg. 50-75%, and see how it goes. Presumably you'll have some other processes running.
As far as restricting the CPU use goes, that's more of a OS issue. On Linux/OSX you could use the system's nice facility. I don't know what's avaliable on MS-Windows (built-in or 3rd party). You might be able to set the Priority of the running mserver process from the Task Manager. Or you might look at something like the START facility:
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/start

Pytesseract is too slow.. High disk I/O

I'm creating a bot for a video game, everything is working well (thanks to some stackoverflow members), but pytesseract response time is too high.
I have to read a picture of this kind every second (after editing it to turn it into black over white, very quick process that doesn't take time).
What I'm doing is dividing the picture into 9, one for each line, and then call pytesseract.image_to_string(img) for each.
This process takes about 3 seconds, and I think it can be faster, given that the text is short.
I noticed a high disk I/O in Process Hacker, see the following screenshot : Disk I/O
Last thing, I have the feeling that it's a bit better when executing the python script as administrator, but I'm not sure and it's not enough..
Do you have a solution that I can implement to make it faster ?
You need to use tesseract api instead of pytesseract, that initialize tesseract (e.g. read traineddata) each time you run ocr (and store ocr image to disk and read ocr result from disk...). For example have a look at https://github.com/zdenop/SimpleTesseractPythonWrapper/blob/master/SimpleTesseractPythonWrapper.ipynb

MATLAB: how to get the specs of the host machine

I have a MATLAB program that I intend to run on different machines. Is there a way to get, from within MATLAB itself, the following info:
Name of machine
Specs of machine, especially processor and memory configuration
Number of cores deployed for MATLAB
I know the command computer but I require more than what it outputs. I'd like to write all the info above to a text file.
You are looking for the following:
1) To check the type of computer on which MATLAB is executing, use: computer .
2) The following displays information about your Windows:
winqueryreg('HKEY_LOCAL_MACHINE',...
'Software\Microsoft\Windows NT\CurrentVersion','ProductName')
or in general, to get information about the OS, use: feature('GetOS').
3) To check number of processors, use: getenv('NUMBER_OF_PROCESSORS').
4) To check CPU information, use: feature('GetCPU').
5) To get information about cores, use: feature('numCores') .
6) To check memory used by MATLAB, total physical memory and some other information, use: memory.
Note that:
Some of the above are undocumented and taken from Yair Altman's blog.
Finally, to write data in a text file, you can use: fprintf .

Ever-increasing memory usage in netlogo headless behaviorspace

I'm trying to run a Netlogo model in behaviorspace, in headless mode, on a linux server.
My netlogo version is 5.3.1 (the 64b version).
The server has 32 cores with 64gigs of RAM.
I'm setting Xmx to 3072m.
After a few runs (~300) the memory usage is so high that I get a Java heap space error.
Surprisingly, the memory usage grows regularly, as if there were no flush-like function called between runs. And it gets to a point it shouldn't reach if I understand things well (for example, for 15 parallel threads it reaches 64000MB and beyond when it should stay around 15 * 3072 = 46080.
I'm using ca at setup so I thought everything was supposed to be flushed out between runs. I'm not opening any file from the code (I use the standard behaviorspace output, in table format, not spreadsheet), and I'm not using any extension.
I'm kind oh puzzled here. Is there something I should look at into behaviorspace specific parameterization that says to keeps track of variables, turtles, etc. between runs ? I couldn't find such a thing.
Could someone help me ?
Thanks a lot !
Thomas

To get debug outputs on client - spmd

I am running a parallelized code courtesy the MATLAB Parallel Computing Toolbox using the spmd command. Specifically, the code is like this:
spmd
out = function(data,labindex);
end
Now the function involves a library (libsvm) which gives me a trained classifier for each iteration. During the training process, there are several debug messages being printed out to the standard output by the library and somehow these are not appearing on my standard terminal - I think this is because the workers are actually on a cluster and hence the debug messages are not visible to me.
Is there anyway to reroute the debug messages ? (possibly other than writing to a file on a shared disk)
One option may be to try the Parallel Command Window. This opens a new special Command Window with one pane per lab. You'll need to run commands from the "P>>" pmode prompt in this window. More here.