Celery remote worker accessing file - celery

I have a function which should take an executable file as argument, execute it and return the result. This function should be run asynchronously so I'm using celery. I want to use multiple computers as workers so each worker should be able to access the executable file. However since the executable files are uploaded by the moderators it's not an option to put a version of each file in each worker by hand. So what would be the best way to handle this?
The only option I could thought of was storing the files in the database. the function should retrieve the file from DB and store it temporarily. Execute it ,remove the file and return the result.
Is this a good approach? Are there any better ways to handle this?

Related

Updating online Mongo Database from offline copy

I have a large Mongo database (5M documents). I edit the database from an offline application, so I store the database on my local computer. However, I want to be able to maintain an online copy of the database, so that my website can access it.
How can I update the online copy regularly, without having to upload multiple GBs of data every time?
Is there some way to "track changes" and upload only the diff, like in Git?
Following up on my comment:
Can't you store the commands you used on your offline db, and then
apply them on the online db, through a script running on SSH for
instance ? Or even better upload a file with all the commands you ran
on your offline base, to your server and then execute them with a cron
job, or a bash script ? (The only requirement would be for your bases
to have the same start point, and same state, when you execute the
script)
I would recommend to store all the queries you execute on your offline base, to do this you have many options, I can think about the following : You can set the profiling level to log all your queries.
(Here is a more detailed thread on the matter: MongoDB logging all queries)
Then you would have to extract then somehow (grep ?), or store them directly in another file on the fly, when they are executed.
For the uploading of the script, it depends on what you would like to use, but i suppose you would need to do it during low usage hours, and you could automate the task with a CRON job, and an SSH tunnel.
I guess it all depends on your constraints (security, downtime, etc..)

Move file after transaction commit

I just started using Spring Batch and I don't know how I can implement my business need.
The behavior is quite simple : I have a directory where files are saved. My batch should detect those files, import them into my database and move the file to a backup directory (or an error directory if the data can't be saved).
So I create chunks of 1 file. The reader retrieve them and the processor imports the data.
I read Spring Batch create a global transaction for the whole chunk, and only the ChunkListener is called out of the transaction. It seems to be OK, but the input parameter is a ChunkContext. How can I retrieve the file managed in the chunk ? I don't see where it's stored in the ChunkContext.
I need to be sure the DB accepts the insertions before choosing where the file must be moved. That's why I need to do that after the commit.
Here is how you can proceed:
Create a service (based on a file system watching API or something like Spring Integration directory polling) that launches a batch job for the new file
The batch job can use a chunk-oriented step to read data and write it to the database. In that job, you can use a job/step execution listener or a separate step to move files to the backup/error directory according to the success or failure of the previous step.

How to update files whenever script is scheduled to run in Heroku app

I have a simple python script that is hosted on Heroku and I'm using the Heroku Scheduler to run the script every hour/day. The script will possibly update a simple .txt file (could also be a config var if possible) when it runs. When it does run and conditions are met, I need that value stored and used when the next scheduled script runs. The value changed is simply a date.
However, since the app is containerized based on the most recent code I have on Github, it doesn't store those changes anywhere to be used again. Is there any way I can accomplish to update the file and use it every time it runs? Any simple add-ons or other solutions I can use?
Heroku Dynos have a local file system that does not survive an application restart or redeployment, therefore it cannot be used to persist data.
Typically you have 2 options:
use a database. On Heroku you can use (there is also a Free tier) Postgres
save the file on external storage (S3, Dropbox, even GitHub). See Files on Heroku for details and examples

kdb - persisting functions on kdb server & Context management

I see a lot of info regarding serializing tables on kdb but is there a suggested best practice on getting functions to persist on a kdb server? At present, I and loading a number of .q files in my startup q.q on my local and have duplicated those .q files on the server for when it reboots.
As I edit, add and change functions, I am doing so on my local dev machine in a number of .q files all referencing the same context. I then push them one-by-one sending them to the server using code similar to below which works great for now but I am pushing the functions to the server and then manually copying each .q file and then manually editing the q.q file on the server.
\p YYYY;
h:hopen `:XXX.XXX.XX.XX:YYYY;
funcs: raze read0[`$./funcs/funcsAAA.q"];
funcs: raze read0[`$./funcs/funcsBBB.q"];
funcs: raze read0[`$./funcs/funcsCCC.q"];
h funcs;
I'd like to serialize them on the server (and conversely get them when the system reboots. I've dabbled with on my local and seems to work when I put these in my startup q.q
`.AAA set get `:/q/AAAfuncs
`.BBB set get `:/q/BBBfuncs
`.CCC set get `:/q/CCCfuncs
My questions are:
Is there a more elegant solution to serialize and call the functions on the server?
Clever way to edit the q.q on the server to add the .AAA set get :/q/AAAfuncs
Am I thinking about this correctly? I recognize this could be dangerous in a prod enviroment
ReferencesKDB Workspace Organization
In my opinion (and experience) all q functions should be in scripts that the (production) kdb instance can load directly using either \l /path/to/script.q or system"l /path/to/script.q", either from local disk or from some shared mount. All scripts/functions should ideally be loaded on startup of that instance. Functions should never have to be defined on the fly, or defined over IPC, or written serialised and loaded back in, in a production instance.
Who runs this kdb instance you're interacting with? Who is the admin? You should reach out to the admins of the instance to have them set up a mechanism for having your scripts loaded into the instance on startup.
An alternative, if you really can't have your function defined server side, is to define your functions in your local instance on startup and then you send the function calls over IPC, e.g.
system"l /path/to/myscript.q"; /make this load every time on startup
/to have your function executed on the server without it being defined on the server
h:hopen `:XXX.XXX.XX.XX:YYYY;
res:h(myfunc1;`abc);
This loads the functions in your local instance but sends the function to the remote server for evaluation, along with the input parameter `abc
Edit: Some common methods for "loading every time on startup" include:
Loading a script from the startup command line, aka
q myscript.q -p 1234 -w 10000
You could have a master script which loads subscripts.
Load a database or script directory contains scripts from the startup command line, aka
q /path/to/db -p 1234 -w 10000
Jeff Borror mentions this here: https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#14623-scripts and here: https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#14636-scripts
Like you say, you can have a q.q script in your QHOME

Talend: Using tfilelist to access files from a shared network path

I have a Talend job that searches a directory and then uploads it to our database.
It's something like this: dbconnection>twaitforfile>tfilelist>fileschema>tmap>db
I have a subjobok that then commits the data into the table iterates through the directory and movies files to another folder.
Recently I was instructed to change the directory to a shared network path using the same components as before (I originally thought of changing components to tftpfilelist, etc.)
My question being how to direct it to the shared network path. I was able to get it to go through using double \ but it won't read any of the new files arriving.
Thanks!
I suppose if you use tWaitForFile on the local filesystem Talend/Java will hook somehow into the folder and get a message if a new file is being put into it.
Now, since you are on a network drive first of all this is out of reach of the component. Second, the OS behind the network drive could be different.
I understand your job is running all the time, listening. You could change the behaviour to putting a tLoop first which would check the file system for new files and then proceed. There must be some delta check in how the new files get recognized.