ESS workflow for R project/package development - emacs

Can anyone share his experience on workflow for R peject development under ESS? I tried several times to learn emacs but I have not get it yet. I can understand ESS as an editor, but is there a project view in ESS? what's the efficient ways to set up/view R project directory, coding, and testing, and how's ESS has an edge to facilitate the whole process?
Do you use ESS as a good R editor only or tend to emulate a R IDE environment within ESS?
Thanks for any advices.

It sounds like you're asking two separate questions.
One question concerns workflow and the other concerns using ESS.
As I use StatET and Eclipse, I'll just share my experience regarding the workflow aspect of your question.
As with Vincent I also follow something like the workflow set out by Josh Reich here (also see Hadley's useful comments):
Workflow for statistical analysis and report writing
Although it can vary between projects, I tend to have a couple of main R files
import.R: this imports data files and does any necessary cleaning and manipulation
analyse.R: This generates the output that I need for any final report
main.R: This calls import.R and analyse.R
The aim is for import.R and analyse.R to represent the complete and final workflow for producing the final results of any analyses.
In terms of a directory structure for an analysis project, I'll often also have the following folders
data: for storing any raw data files
meta: for storing meta data, such as variable labels, scoring systems for tests, recoding information, etc.
output: for storing any graphics, tables, or text generated by my analyses that I might want to incorporate into an external program
temp: When exploring the data and brainstorming analyses, I like to type code into files instead of using the console. I tend to label these temp1.R, temp2.R, temp3.R. I store these in a temp folder. That way I have a permanent record that's easily accessible. If the analyses become final they get incorporated into one of the main R files (i.e., import.R or analysis.R)
functions: If I think that a function will be needed across a couple of projects, I often place it one function per file or a set of related functions in a file in a folder called functions. This makes it relatively easy to reuse functions across projects, when the formal requirements of package development are more than needed.
library: If I want to create some general functions that I think will be project specific, I'll place them in this folder
save: A folder to store any saved R objects
StatET and Eclipse make it easy to interact with such a file system.
Of course, given all the R gurus that use ESS and Emacs, I'm sure it also handles interactions with the file system well.

I'm not exactly sure what you expect as an answer on this one. I, for one, have stolen (and adapted) a system that was suggested here a little while ago (by Josh Reich):
Create a folder for every project, and split up your work in a bunch of different .R files:
Load.R for getting your raw data into R;
Prep.R for cleaning the data, recoding variables, etc.;
Func.R for coding any custom functions you will need for evaluation; and
Eval.R for running your final stuff.
If that doesn't fit your style, just change it.
Then, you can either have a master file to call each of the parts one after each other (good for reproducibility), or save at different stages and have the individual scripts load the appropriate data (good if some of the prep work is very computationally/time intensive).
**
On a different note, the trick that is posted at the link really helped me get into ESS. It turns Shift-Enter into a one-stop-ESS-shop: http://www.kieranhealy.org/blog/archives/2009/10/12/make-shift-enter-do-a-lot-in-ess/

Others have given you some good ideas about how to setup your directory/file structure for a project.
You also asked about "project views," in which case you might want to look into the Emacs Code Browser (ECB).
You can find some screen shots of it in action on its site, here:
http://ecb.sourceforge.net/screenshots/index.html

Related

VSCode how to clear workspaceState data globally?

I have implemented a backup per workspace functionality in my extension using workspaceState. Since the data can be sensitive - I'd like to clear all workspaceStates on extension deactivation/uninstall.
The ExtensionContext provides no ability to clear all extension related data across different workspaces with their workspaceStates.
So I've considered saving data on the ExtensionContext globalState, tagging each entry with a workspace id. Problem is that the workspace namespace doesn't provide a way to uniquely identify the current workspace. I thought about hashing workspace name and path but both of these things are changeable and any change will destroy the pointer to the data. This is exactly why I cant just write files to internal folders. The only other solution I have is to write the backup data directly to the workspace and I'd like to avoid that.
How does VSCode maintain the knowledge of which workspaceState belongs to which workspace? How can I tie data to the workspace but have access from anywhere else in VSCode?
Side note: You should avoid saving sensitive data in general. And if necessary, try to encrypt it.
Anyway:
I don't have the full answers but i was researching something similar (An extension I use crashes due to now invalid settings in the WorkspaceState).
I found the Storage for the Workspace state in this folder (windows):
%appdata%\Code\User\workspaceStorage\
In there, you find a lot of folders with hex-based names. Inside those folders, I always found 2 Files named state.vscdb and state.vscdb.backup.
There usually is a 3rd file called workspace.json which helps you figure out if you are in the correct workspace. (but you'd have to iterate through all the folders - maybe there is a way to figure out the folder name coming from the extension API?)
If you open the state.vscdb-file you find something that looks quite like a serialized object set in my eyes. It does have some Seperator chars of unknown function. But you also find full paths or names in there that clearly origin from different modules of VSC - Including the extensions.
I don't need to worry about the other cached stuff i'm just gonna delete the whole folder to fix my current issue. But I'm pretty sure, one can figure out the way the file is built and edit out your sensitive data if one has to.
The state.vscdb.backup-file looks pretty much like what the name is telling you: they probably just make a copy of the other file every few minutes so you have a fallback position.
To add to the conversation, there are two SQLite state databases:
<user-data-dir>\User\globalStorage\state.vscdb
<user-data-dir>\User\workspaceStorage\<workspace.id>\state.vscdb
Depending on how VS Code was launched you could have a Single Folder Workspace or a Multi-Folder Workspace that is global or local. Globally, the data lives here:
Linux: $HOME/.config/Code/
OS X: $HOME/Library/Application Support/Code/
Windows: %APPDATA%\Code\
Locally, the data will be in the .vscode folder of the current workspace.
In my situation:
I open a new workspace.
Set it up as I want it to start every time.
Makes copies of the two SQLite databases.
Copy over the databases before launching VS Code.
This leads to a clean VS Code state.
To see how workspace.id is generated check this link.

Removing annotations from a Modelica model

I'm developing a Modelica library and need to produce a document with source code listings. I'd like to be able to include the source of the Modelica models without annotations.
I could manually edit them out, but I'm looking for a more automated strategy. I'm guessing the most convenient and straightforward approach is to use some tool to save .mo files with no annotations and include those in my document (I'm using \lstinputlisting in LaTeX).
Is it possible to do this? I have access to Dymola, OpenModelica and JModelica. Dymola is obviously capable of producing such a listing, as it's able to include it in the automatically generated documentation (File > Export > HTML...). I've been looking into scripting with Dymola and OpenModelica, but haven't found a way to do this either.
JModelica seems like it could be a good option, but I don't have experience working with Python. If this is possible and someone gives me some pointers, I'm willing to look into it myself. I found a mention to a prettyprint function that might do the job, but I'm not sure where to start. I can't even find reference to that function in the latest documentation.
It would also be more convenient for me to find a way of doing it with Dymola/OpenModelica (whether through the UI or by using a script). Have I missed something?
I think you could use saveTotalModel("total.mo", MyModelName) in OpenModelica. This will strip most annotations (not ones used for code generation if I remember correctly) and pretty-print the source code including all dependencies. Then you just copy-paste the models/packages that you want to include in the listing. Or if you prefer, you can do something like the following to only include code for a particular model:
loadModel(Modelica);
loadFile("MyModel.mo");
saveTotalModel("total.mo", MyModel.A.B);
clear();
loadFile(MyModel);
str := list(MyModel.A.B);
writeFile("MyModel.A.B.listing", str);

How to handle environment-specific application configuration organization-wide?

Problem
Your organization has many separate applications, some of which interact with each other (to form "systems"). You need to deploy these applications to separate environments to facilitate staged testing (for example, DEV, QA, UAT, PROD). A given application needs to be configured slightly differently in each environment (each environment has a separate database, for example). You want this re-configuration to be handled by some sort of automated mechanism so that your release managers don't have to manually configure each application every time it is deployed to a different environment.
Desired Features
I would like to design an organization-wide configuration solution with the following properties (ideally):
Supports "one click" deployments (only the environment needs to be specified, and no manual re-configuration during/after deployment should be necessary).
There should be a single "system of record" where a shared environment-dependent property is specified (such as a database connection string that is shared by many applications).
Supports re-configuration of deployed applications (in the event that an environment-specific property needs to change), ideally without requiring a re-deployment of the application.
Allows an application to be run on the same machine, but in different environments (run a PROD instance and a DEV instance simultaneously).
Possible Solutions
I see two basic directions in which a solution could go:
Make all applications "environment aware". You would pass the environment name (DEV, QA, etc) at the command line to the app, and then the app is "smart" enough to figure out the environment-specific configuration values at run-time. The app could fetch the values from flat files deployed along with the app, or from a central configuration service.
Applications are not "smart" as they are in #1, and simply fetch configuration by property name from config files deployed with the app. The values of these properties are injected into the config files at deploy-time by the install program/script. That install script takes the environment name and fetches all relevant configuration values from a central configuration service.
Question
How would/have you achieved a configuration solution that solves these problems and supports these desired features? Am I on target with the two possible solutions? Do you have a preference between those solutions? Also, please feel free to tell me that I'm thinking about the problem all wrong. Any feedback would be greatly appreciated.
We've all run into these kinds of things, particularly in large organizations. I think it's most important to manage your own expectations first, and also ask whether it's really necessary to tell every system and subsystem on a given box to "change to DEV mode" or "change to PROD mode". My personal recommendation is as follows:
Make individual boxes responsible for a different stage - i.e. "this is a DEV box", and "this is a PROD box".
Collect as much of the configuration that differs from box to box in one location, even if it requires soft links or scripts that collect the information to then print out.
A. This way, you can easily "dump this box's configuration" in two places and see what differs, for example after a new deployment.
B. You can also make configuration changes separate from software changes, at least to some degree, which is a good way to root out bugs that happen at release time.
Then have everything base its configuration on something/somewhere that is not baked-in or hard-coded - just make sure to collect and document it in that one location. It almost doesn't matter what the mechanism is, which is a good thing, because some systems just don't want to be forced to use some mechanisms or others.
Sorry if this is too general an answer - the question was very general. I've worked in several large software-based organizations before, and this seemed to be the best approach. Using a standalone server as "one unit of deployment" is the most realistic scenario (though sometimes its expensive), since applications affect each other, and no matter how careful you are, you destabilize a whole system when you move any given gear or cog.
The alternative gets very complex very quickly. You need to start rewriting the applications that you have control over in order to have them accept a "DEV" switch, and you end up adding layers of kludge to the ones you don't have control over. Usually, the ones you don't have control over at least base their properties on something defined on a system-wide level, unless they are "calling the mothership for instructions".
It's easier to redirect people to a remote location and have them "use DEV" vs "use PROD" than it is to "make this machine run like DEV" vs "make this machine run like PROD". And if you're mixing things up, like having a DEV task run together on the same box as a PROD task, then that's not a realistic scenario anyways: I guarantee that eventually you will be granting illegal DEV-only access to somebody on PROD, and you'll have a DEV task wipe out a PROD database.
Hope this helps. Let me know if you'd like to discuss more specifics involved.
I personally prefer solution 2 (the app should know itself, by its configuration, what environment it is running in). With solution 1 (pass the environment name as a startup parameter) the danger of using the wrong environment specifier is much too high. Accessing the TEST database from PROD code and vice versa may cause mayhem, if the two installed code bases are not of the same version, as is often the case.
My current project uses solution 1, but I don't like that. A previous project I worked on used a variation of solution 2: The build process generated one setup file for every environment, making sure that they contained the same code base but appropriate configuration paramters. That worked like a charm, but I know it contradicts the paradigm that the "exact same build files must be deployed everywhere".
I think I have asked a related, self-answered, question, before I read this one : How to organize code so that we can move and update it without having to edit the location of the configuration file? . So, on that basis, I provide an answer here. I don't like the idea of "smart" application (solution 1 here) for such a simple task as finding environment settings. It seems a complicated framework for something that should be simple. The idea of an install script (solution 2 here) is powerful, but it is useful to allow the user to change the content of the config file, but would it allow to change the location of this config file? What is this "central configuration service", where is it located? My answer is that I would go with option 2, if the goal is to set the content of the configuration file, but I feel that the issue of the location of this configuration file remains unanswered here.
If you're using JSON to store/transmit configuration (or can use JSON in your pre-deploy process to output to some other format) you can annotate key/property names for environment/context-specific values with arbitrary or environment-specific suffixes, and then dynamically prefer/discriminate them at build/deploy/run/render -time, while leaving un-annotated properties alone.
We have used this to avoid duplicating entire configuration files (with the associated problems well known) AND to reduce repetition. The technique is also perfect for internationalization (i18n) -- even within the same file, if desired.
Example, snippet of pre-processed JSON config:
var config = {
'ver': '1.0',
'help': {
'BLURB': 'This pre-production environment is not supported. Contact Development Team with questions.',
'PHONE': '808-867-5309',
'EMAIL': 'coder.jen#lostnumber.com'
},
'help#www.productionwebsite.com': {
'BLURB': 'Please contact Customer Service Center',
'BLURB#fr': 'S\'il vous plaît communiquer avec notre Centre de service à la clientèle',
'BLURB#de': 'Bitte kontaktieren Sie unseren Kundendienst!!1!',
'PHONE': '1-800-CUS-TOMR',
'EMAIL': 'customer.service#productionwebsite.com'
},
}
... and post-processed (in this case, at render time) given dynamic, browser-environment-known location.hostname='www.productionwebsite.com' and navigator.language of 'de'):
prefer(config,['www.productionwebsite.com','de']); // prefer(obj,string|Array<string>)
JSON.stringify(config); // {
'ver': '1.0',
'help': {
'BLURB': 'Bitte kontaktieren Sie unseren Kundendienst!!1!',
'PHONE': '1-800-CUS-TOMR',
'EMAIL': 'customer.service#productionwebsite.com'
}
}
If a non-annotated ('base') property has no competing annotated property, it is left alone (presumably global across environments) otherwise its value is replaced by an annotated value, if the suffix matches one of the inputs to the preference/discrimination function. Annotated properties that do not match are dropped entirely.
You can mix and match this behaviour to annotate configuration to achieve distinctions of global, default, specific that are (assuming you're sensible) readable with zero/minimal duplication.
The single, recursive prefer() function (as we're calling it, lacking the need or desire to make an entire project/framework out of it) we've developed so far (see jsFiddle, with inline docs) goes a bit further than this simple example, and (explained in greater detail here) handles deeply-nested configuration objects, as well as preferential ordering and (if you need to stay flat) combination of suffixes.
The function relies on JS ability to reference object properties as strings, dynamically, and tolerate # and & delimiters in property names which are not valid in dot-notation syntax but consequently (help) prevent developers from breaking this technique by accidentally referring to pre-processed/annotated attributes in code (unless they, non-conventionally don't prefer to use dot-notation.)
We have yet to have this break anything for us, nor have we been schooled on any fundamental flaws of this technique, beyond irresponsible/unintended usage or investment/fondness for existing frameworks/techniques that pre-exist. We have also not profiled it for performance (we only tend to run this once per build/session, etc.) so in your own usage, YMMV.
Most configurations transmitted client-side of course would not want to contain sensitive pre-production values, so one could (should!) use the same function to generate a production-only version (with no annotations) in pre-deploy, while still enjoying a SINGLE configuration file upstream in your process.
Further, if you're doing this for i18n, you may not want the entire wad going over the wire, so could process it server-side (cached or live, etc.) or pre-process it in build/deploy by splitting into separate files, but STILL enjoying a single source of truth as early in your workflow as possible.
We have not explored implementing the same function in Java (or C#, PERL, etc.) assuming it's even possible (with some exotic reflection maybe?) but a build environment that includes NodeJS could farm that step out easily.
Well if it suits your needs and you have no problem of storing the connection strings in the source control repository, you could create files like:
appsettings.dev.json
appsettings.qa.json
appsettings.staging.json
And choose the right one in the deployment script and rename it to the actual appsettings.json, which is then read by your app.

Using table-of-contents in code?

Do you use table-of-contents for listing all the functions (and maybe variables) of a class in the beginning of big source code file? I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.. but some complex tasks require a lot of code. I'm not sure is it really worth it spending your time subdividing implementation into multiple of files? Or is it ok to create an index-listing additionally to the class/interface declaration?
EDIT:
To better illustrate how I use table-of-contents this is an example from my hobby project. It's actually not listing functions, but code blocks inside a function.. but you can probably get the idea anyway..
/*
CONTENTS
Order_mouse_from_to_points
Lines_intersecting_with_upper_point
Lines_intersecting_with_both_points
Lines_not_intersecting
Lines_intersecting_bottom_points
Update_intersection_range_indices
Rough_method
Normal_method
First_selected_item
Last_selected_item
Other_selected_item
*/
void SelectionManager::FindSelection()
{
// Order_mouse_from_to_points
...
// Lines_intersecting_with_upper_point
...
// Lines_intersecting_with_both_points
...
// Lines_not_intersecting
...
// Lines_intersecting_bottom_points
...
// Update_intersection_range_indices
for(...)
{
// Rough_method
....
// Normal_method
if(...)
{
// First_selected_item
...
// Last_selected_item
...
// Other_selected_item
...
}
}
}
Notice that index-items don't have spaces. Because of this I can click on one them and press F4 to jump to the item-usage, and F2 to jump back (simple visual studio find-next/prevous-shortcuts).
EDIT:
Another alternative solution to this indexing is using collapsed c# regions. You can configure visual studio to show only region names and hide all the code. Of course keyboard support for that source code navigation is pretty cumbersome...
I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.
Correct.
but some complex tasks require a lot of code
Incorrect. While a "lot" of code be required, long runs of code (over 25 lines) are a really bad idea.
actually not listing functions, but code blocks inside a function
Worse. A function that needs a table of contents must be decomposed into smaller functions.
I'm not sure is it really worth it spending your time subdividing implementation into multiple of files?
It is absolutely mandatory that you split things into smaller files. The folks that maintain, adapt and reuse your code need all the help they can get.
is it ok to create an index-listing additionally to the class/interface declaration?
No.
If you have to resort to this kind of trick, it's too big.
Also, many languages have tools to generate API docs from the code. Java, Python, C, C++ have documentation tools. Even with Javadoc, epydoc or Doxygen you still have to design things so that they are broken into intellectually manageable pieces.
Make things simpler.
Use a tool to create an index.
If you create a big index you'll have to maintain it as you change your code. Most modern IDEs create list of class members anyway. it seems like a waste of time to create such index.
I would never ever do this sort of busy-work in my code. The most I would do manually is insert a few lines at the top of the file/class explaining what this module did and how it is intended to be used.
If a list of methods and their interfaces would be useful, I generate them automatically, through a tool such as Doxygen.
I've done things like this. Not whole tables of contents, but a similar principle -- just ad-hoc links between comments and the exact piece of code in question. Also to link pieces of code that make the same simplifying assumptions that I suspect may need fixing up later.
You can use Visual Studio's task list to get a listing of certain types of comment. The format of the comments can be configured in Tools|Options, Environment\Task List. This isn't something I ended up using myself but it looks like it might help with navigating the code if you use this system a lot.
If you can split your method like that, you should probably write more methods. After this is done, you can use an IDE to give you the static call stack from the initial method.
EDIT: You can use Eclipse's 'Show Call Hierarchy' feature while programming.

Open source MIDI libraries

I would like to know about open source libraries that could be used to perform some simple tasks on MIDI files:
reading a file one note - or chord - at a time;
extracting a given instrument to re-encode it separately in a new file;
allow to produce a "customizable" score -- by that I mean that I should be able to alter the way the sheet music is produced from the midi using the libraries ... I assume this will require an interaction with Lilypond or Musixtex.
I don't really have a preferred language, as long as it is not too painful to make the app cross-platform. Other advice is welcome -- better to learn it now rather than when I've already written a lot of code. So far, I've been trying to dig in MuseScore's (C++) source code, but it seems that GUI code permeates most files and although spotting relevant files was easy, it is difficult for me to extract just what I need (I'm only aiming for a command line application right now, I'll see about interfaces later).
Any ideas?
Thanks!
Well, I'm just getting started, but this (in Python) seems promising.
If you're still working on the project and language isn't a problem, you might try Python's cross-platform music21 which can parse midi files into Note, Chord, Instrument, etc., objects, lets you manipulate the scores, and then R/T back to MIDI or output to Lilypond, etc. (full disclosure, I'm the author of the toolkit; but I don't know of many others in any language that will take MIDI in and put Lilypond out while giving you a chance to treat the MIDI elements as objects to manipulate in the meantime.).
Sample code to screw up all the instrument sounds in a MIDI file and then play it and make a lilypond.pdf from it:
import music21
mf = music21.converter.parse('pathToMidiFile.mid')
for x in mf.recurse():
if 'Instrument' in x.classes:
x.midiProgram = (x.midiProgram * 2) % 128
mf.show('midi')
mf.show('lily.pdf')
Hope that helps.