How do I handle multi-file projects with org-babel? - emacs

I'm trying to handle multiple files which have source blocks which depend on each other.
For example, I have a file decorators.org with some common Python decorators that I use frequently, and that I'd like to use on functions in other files in the same project.
I can think of a couple of ways to approach this, but I'm not sure which will actually work and which is the standard way of doing things:
Execute (org-babel-lob-ingest ./decorators.org), either in an emacs-lisp block or on loading the file. But then I'm not sure how to access that variable after it's been injested.
Use org-babel-load-file, but it looks like that only works with emacs-lisp source blocks
Force the files I depend on to be pre-tangled by calling org-babel-tangle-file, and them import them using normal Python import statements.
Is one of these a good approach, or is there some better way to do this that I've missed?

Related

How can I have 2 verions of Gensim for summarization in one Jupyter notebook?

I want to have 2 versions of Gensim for using summarization and keyword function from old Gensim.
How can I setup this senario?
In general, a single Jupyter notebook is backed by a single Python interpreter/environment, and popular packages at their 'official' installation paths can only be installed once.
There are a few hackish workarounds suggested in answers like:
Installing multiple versions of a package with pip
However, each workaround presents operational problems.
One approach is to install the older package to a non-standard path (directory) that's still found by Python importing logic (controlled by PYTHONPATH). For example, put/move the older copy of Gensim to a gensim_old package directory. But: this is only likely to work well with very sime (single-.py-file) packages.
With any signficant library (like Gensim) which cross-imports a lot of things from its own utility modules, using the standard paths, lots of things are likely to break unless you dig into all involved individual files to change their import paths. That's kind of kludgey & hard-to-maintain. (Though, to the extent you're just using one old version, say gensim-3.8.3 for the removed summarization feature, perhaps it'd be worth fighting through this process once, then keeping the changes around.)
Another approach is to create a totally-separate Python environment with the alternate version, and only use that other environment from the notebook by a system-call – via either something in Python-code like subprocess.call(), or the notebook-cell ! or !! magic-escapes to run a shell command. That is, you give up the ability to run individual interactive lines of Python in that alt environment - but could still send it batches of data, and either capture the console output or observe its output files to continue processing in your notebook.
I'd expect this to be a better option – cleaner & more-maintainable – provided that either the old-version-functionality (summarization) or new-version-functionality (whatever else) can be condensed into one (or a few) single-step scripts.
Another option would be to try to completely copy the gensim.summarization source code files to some new location inside your own project – performing whatever (few, minor) edits are necessary to ensure it works from the alternate location.
One of the reasons that functionality was removed was that its approach to things like tokenization was not consistent/integrated with other Gensim practices – which actually means it's likely to be a little easier to keep it working (given its use of its own idiosyncratic approaches) separately.
Personally I'd rank these three options desirability as:
(best) Section off the summarization tasks to be run via subprocess executions in a separate Python environment, which has only the older package installed.
(maybe ok) Copy the 10 .py files that implement the gensim.summarization' to your own local module. Edit lightly as necessary to ensure they still work. (That should mainly be updating import` lines, but might reuire a few other adaptations to other Python 3.x/Gensim 4.x changes.)
(probably too messy) Install the whole old package to a non-standard directory, edit lots of files to ensure anything you're using still works.
Finally, note that the main reason the feature was removed is that it did not offer very impressive or adaptable results. While I've seen some people say it's worked OK for their applications, I've never seen even so much as a demo where its practices/algorithm – which can only extract some subset of important sentences, never paraphrase – gave impressive results.
So unless you already know that its approach works well for your needs, don't get your hopes up! Good luck.

In Powershell, is it a good practice to write multiple functions in a single script file?

I was told to write multiple actions using powershell script. Actions such as Apppool creation, SQL updation, File editing and etc.
I am going to write such a bulk thing in script first time.
So i would like to know the best practice before writting them.
Is it a good practice to write all the function in a single file?
I am thinking at least 10 functions i may need to write. Assuming each function may have 10 lines of code.
Consider modules: the simplest format is a manifest (.psd1) and a single script file (.psm1) containing all the functions, aliases, ... the module exports (plus any internal helpers).
In this case you are clearly putting multiple connected functions in one file. Even if much of the code is only dot-sourced into the script module they are still logically in one entity.
On the other hand using scripts in your path to execute without having to load before hand would tend (as per Adriano's comment to the question) to support one function (at script scope rather than a function statement) makes sense.
Therefore: there is no one "good practice": it all depends on the details of the circumstance.
Be pragmatic, truth come from action, no from words ;O)
So begin, by the beginining :
1) Does the thing you want to do exists somewhere on internet EX PoshCode (if so you can adapt it)
2) Think about your functions (not to much) object : reuse the code (write your algorith in pseudo code)
3) Use internet to look for the functions even existing
4) Wrote all functions in the same file as the main code to test them. During this phase you'll discover new functions and parameters to add or to remove from existing ones
5) Once you have tested your code, put the reusable functions (and the ones they depend on) into one or multiple module.
My solutions will be to create a custom Module where will be possible add function later.
You can save your single file with all functions as mymodule.psm1 in mymodule folder under this path $env:psmodulepath.
then add-module mymodule (or better call it in you $profile to have it ready when console is up)

Is it better to put the defpackage in a separate file when creating packages

The example below is given in Paul Grahams ANSI Common Lisp as an example of doing encapsulation:
(defpackage "CTR"
(:use "COMMON-LISP")
(:export "COUNTER" "INCREMENT" "CLEAR"))
(in-package ctr)
;function definitions here
However in Peter Seibels Practical Common Lisp, link here, he says:
Because packages are used by the reader, a package must be defined
before you can LOAD or COMPILE-FILE a file that contains an IN-PACKAGE
expression switching to that package. Packages also must be defined
before other DEFPACKAGE forms can refer to them...
The best first step toward making sure packages exist when they need
to is to put all your DEFPACKAGEs in files separate from the code that
needs to be read in those packages
So he recommends creating two files for every package, one for the defpackage and one for the code. The files containing defpackages should start with (in-package "COMMON-LISP-USER").
To me it seems like putting the defpackage in the same file, before the in-package and code, is a good way to ensure that the package is defined before used. So the first method, collecting everything into one file seems easier. Are there any problems with using this method for package creation?
I think that using a separate file for defpackage is a good habit
because:
You don't « pollute » your files with defpackage.
It makes it easier to find the exported/shadowed/... symbols, you
know you just have to look at package.lisp.
You don't have to worry about the order when you use ASDF.
(defsystem :your-system
:components ((:file "package")
... the rest ...))`
Peter Seibel says so ;)
EDIT:
I forgot to mention quickproject which facilitates the creation of
new CL projects.
REPL> (quickproject:make-project "~/src/lisp/my-wonderful-project/"
:depends-on '(drakma cl-ppcre local-time))`
This command will create a directory "~/src/lisp/my-wonderful-project/"
and the following files:
package.lisp
my-wonderful-project.asd (filled)
my-wonderful-project.lisp
README.txt
And thus, I think it's good to use the same convention.
I tend to use multiple source code files, a single "packages.lisp" file and a singe "project.asd" system definition file for most of my projects. If the project requires multiple packages, they're all defined in "packages.lisp", with the relevant exports in place exported.
There is this reason for putting DEFPACKAGE in its own file: if you have a large package, then you might have several groups of related functions, and you might want to have separate source files per function group. Then all the source files would have their own IN-PACKAGE at the top, but they would all "share" the external DEFPACKAGE. Then as long as you get the DEFPACKAGE loaded first, it doesn't matter the order you load the other source files.
An example I'm currently working on has multiple classes in the package, and the source files are broken up to be per class, each having a class definition and the related generic function and method definitions.

How to split long Perl code into several files without too much manual editing?

How do I split a long Perl script into two or more different files that can all access the same variables - without having to rename all shared variables from e.g. $count to $::count (or $main::count which is the same)?
In other words, what's the best and simplest way to split the Perl script into several files without having to import a lot of variables/functions and/or do a lot of manual editing.
I assume it has something to do with making the code part of the same package/scope/namespace, but my experiments so far have failed.
I am not sure it makes a difference, but the script is used for web/CGI purposes and will be running under mod_perl.
EDIT - Background:
I kind of knew I would get that response. The reason I want to split up the file is the following:
Currently I have a single very old and very long Perl file. I know it is not following Perl best practices but it works.
The problem is, I need to distribute the data files it uses between different web servers, first of all for performance reasons. There will be one "master" server and one or several "slaves".
About 20% of the mentioned Perl file contains shared functions, 40% has the code need to run on the master server and 40% on the slave servers. Therefore, I would like to split the code into three files: 1. shared, 2. master-only, 3. slave-only. On the master server, 1 and 2 will be loaded, on the slaves, 1 and 3 will be loaded.
I assume this approach would use less process RAM and, more importantly, I would minimize the risk of not splitting the code correctly (e.g. a slave process calling a master data file). I don't see a great need for modularization, as the system works and the code does not need a lot of changes or exchanges with other projects.
EDIT 2 - Solution:
Found the solution I was looking for here:
http://www.perlmonks.org/?node_id=95813
In cases where the main package is in ownership of the variable, the
actual word 'main' can be ommitted to yield something like: $::var
It is possible to get around having to fully qualify variable names
when strict is in use. Applying a simply use vars to your script, with
the variable names as it arguments will get around explicit package
names.
Actually, I ended up repeating the our ($count, etc...) statement for the needed variables instead of use vars ();
Do let me know if I am missing something vital - apart from not going with modules! :)
#Axeman, Thanks, I will accept your answer, both for your effort and for sending me in the right direction.
Unless you put different package statements in their files, they will all be treated as if they had package main; at the top. So assuming that the scripts use package variables, you shouldn't have to do anything. If you have declared them with my (that is, if they are lexically scoped variables) then you would have to make sure that all references to the variables are in the same file.
But splitting scripts up for length is a rotten substitute for modularization. Yes, modularization helps keep code length down, but modularization if the proper way to keep code length down--for all the reasons that you would want to keep code-length down, modularization does it best.
If chopping the files by length could really work for you, then you could create a script like this:
do '/path/to/bin/part1.pl';
do '/path/to/bin/part2.pl';
do '/path/to/bin/part3.pl';
...
But I kind of suspect that if the organization of this code is as bad as you're--sort of--indicating, it might suffer from some of the same re-inventing the wheel that I've seen in Perl-ignorant scripts. Just offhand (I might be wrong) but I'm thinking you would be surprised how much could be chopped from the length by simply substituting better-tested Perl library idioms than for-looping and while-ing everything.

Config file handling in Perl

There are plenty of Modules in the Config:: Namespace on CPAN, but they are all limited in ond way or another.
I'm currently using Config::Std, which is fine most of the time, however it makes certain things difficult:
more than two levels of nested directives
handling of multiple values per key
conf.d directories, i.e. multiple config files which are merged into one big config hash
Config::Std generates a blessed hashref after parsing the config, so all my applications are coded to use a hashref for configuration. I'd prefer not having to change this.
What I am looking for is a universal, lightweight Config Module that produces a hashref.
My Question is: Which Config Modules should I consider for replacing Config::Std?
Config::Any (for loading several files and flattening to a hash) and its Config::General backend (for arbitrarily nested configuration items and multiple values per key à la Apache httpd)
You didn't state where your data is coming from. Are you reading in a configuration file and running into the limit of the configuration file itself?
Config::Std is a great module. However, it was meant to read and write Windows Config/INI files, and Windows Config/INI files are very flat and simple formats. Thus, I wouldn't expect Config::Std to do much more.
If you're using Windows Config/INI files right now, but may need to read more complex data structures in the future, Config::Any is a good way to go. It'll handle Windows Config/INI files and using the same programming interface, read and write XML, YAML, and JSON file structures too.
If you're merely trying to keep a complex data structure in your program and don't care about reading and writing configuration files, I would recommend looking at XML::Simple for the very simple reason that it is ...well... simple and can handle all sorts of data structures. Plus, XML::Simple is a very commonly used module, so there's lots of help on the Internet if you have any questions about the module, and it is actively supported.
You could use Config::Any, but I find it more complex to use, and harder to configure. In fact, you have to install XML::Simple (or a similar module) in order to use it. The advantage of Config::Any is that it is a single interface for all sorts of configuration file formats. That way, you don't have to hack through your program if you decide to switch form Windows Config/INI to XML or YAML.
So, if you're working with Windows Config/INI files now, and need a more complex data structure: Look at Config::Any.
If you're merely wanting a simple way to track complex data structures, look at XML::Simple.
YAML will handle that and more.
And here's the website for the protocol.