How to maintain a small repository of bash/python scripts [closed] - version-control

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
For the past several years, I've been making small (single file, 1-500 line) scripts (mostly bash & python) to automate random tasks (usually scientific data analysis). Most of these end up being one-offs, but sometimes I want to go back and revisit/change something, or end up with a rather unwieldy script that could benefit from some sort of version control. I should note that all of these scripts are done solely on my own, and don't necessarily need to be share-able.
Which type of versioning (SVN,CVS,git,Mercurial..) Has the simplest command structure/syntax for my use case? More importantly, the machines I connect to are behind rather finicky kerberos walls, so I'm not looking for any sophisticated server-based implementation.
I found this thread from 2010 asking a similar question, though it didn't really talk about specific options, just whether or not I should be using a single repository.
In short, which versioning system allows for simple same-directory approach with minimal bells & whistles (only checkouts and commits needed)?

Should I set up some sort of subversion/CVS/git repository and just throw everything in?
Yes.
For your use-case, I suppose, SVN can be best choice (with URL-based access to every object in repo you can easy and fast get access to any single file any revision of file and for your linear history "not the best" merge in SVN isn't problem). Local file:///-based repository will require minimum of maintenance. You can use single-repository, flat tree (all files in /trunk)

Related

Why should use Version Control software rather a wordprocessor [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
A word-processor has most if not all the features of a version control software without the gobbledegook and the complexity. You can set a word-processor to always keep history and probably save as versions every time you save. You could have an online word-processor- if one doesn't exist then it sounds like a great opportunity- with general access to allow multiple users to access it. Git and others are acknowledged to have multiple issues but I can't see a word-processor having big issues so why the preference for version control software?
Word processors, as far as I know, do not track versions of directory structures (trees) of files as a whole... .they only track single files. A version control system treats a "snapshot" of a whole tree of files as a single unit.
Online word processors do not support multiple authors working on the same file independently... instead they assume that multiple authors are collaborating in real time, working on exactly the same thing, which is not the usual workflow for software development.
Word processors do not support the concept of branches, which are a powerful tool for many software development use cases

What are the benefits of using a tool like Chef vs. using a makefile/shell script for deployment? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have heard good things about Chef, was curious about all of the benefits before I devote time to learning a new tool. Not looking to turn this into opinion thread, looking for a list of additional features it has over makefile/shell script.
Chef, and Ansible/Puppet/Salt too (collectively called CAPS), are all based on the structure of "describe the desired state of the system and the tool will make it happen".
A script or Makefile is generally a procedural system, run this, then run that, etc. That means you need to keep a mental model of system from each step to the next, and if that ever deviates from the real system (ex, a directory you are trying to set the owner of doesn't exist) your script usually breaks.
With some stuff this is easy, like yum/apt-get install as they are internally idempotent, you can run them every time and if the package is already installed, it just does nothing.
CAPS systems take that principle (idempotence) and apply it to all management tasks. This has for the most part resulted in less brittle configuration management as you only need to tell the tool what the end result should look like and it will take care of figuring out the delta from the current state.

Is there any value in putting subroutines in a module if they are specific to your program? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I started coding in Perl a few years ago. Back then I thought to simplify my code and make it more manageable I would group subroutines together in .pm files. For example, subs related to generating a report would go into "Report.pm".
Now I'm looking back on my code and since the vast majority of my subs are program-specific, I'm wondering if there's any valid reasons to do it the way I did? The programs I write are generally ~8K lines of code and my code runs (always) on very powerful servers. Today, the concept of having a main .pl file plus 5 *.pm files just seems like more files to manage and now I'm wondering if I should have just put it all into a single .pl file?
I'm not familiar with Perl, but to some extent this is similar in every programming language.
I see three reasons to split program into multiple files: productivity, reuse (which is badly-disguised productivity, or at least it should be) and clarity.
You say that you have only program-specific code, so you don't gain any reuse (at least external). My experience is that almost always generic things are already in some library and most of the code is program specific. With the size of the program it gets more important to reuse "internally", but only you can know whether you repeat yourself.
Productivity (in a more manual sense) depends on tooling. If you can click on a function call and jump to its definition, even in a different file, or rename it everywhere and most important, prepare a distribution without manually going through all the files, you don't think about having multiple files as an extra chore. If you don't have those things, each extra files bring extra work.
Clarity - If you have everything in one file, it's much easier to create one huge monolith that depends on lot of things and after a while it is hard to change. If you split it into reasonable modules where you can test "leaf" modules independently, you will have a much easier time refactoring and changing when requirements change.

single script for scientific paper on github? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am posting this question despite it possibly being off-topic, since I can't find a better place to ask:
I am publishing a scientific paper and use some analysis code which I want to be publicly available. I wrote a general purpose analysis library (Matlab) and put it onto github. Then there is a little script that uses that library for the specific purpose of this very paper. What is the best way to publish that script?
I see the following options where to publish this script:
publish into a new repository with only one file which is referenced in the paper (isn't that an overkill?)
append analysis script as supplementary information to the paper (not very accessible and usable for other people)
add to the same repository as the library (does not make sense since the library is general purpose while the script is for a single specific purpose)
Happy about any feedback, re-directions or discussions.
I don't know if the best method, but here's what I did with one of my own Matlab libraries, SHCTools, to make it publicly available for a journal article:
Created new branch of the repository (as opposed to an entirely new repository). This way the two are co-located, but the paper-specific branch can remain stable allowing readers to replicate results even after the main repository changes significantly.
Added a notice to the main branch's README.md file linking to the new stable branch.
Added a folder to the new stable branch containing M-files that re-create the figures in may paper (you could do the same with examples).
Adding a script as a supplemental resource (perhaps inside an examples or contrib directory) would seem like an acceptable and reasonably standard arrangement.
For a free-standing script, perhaps consider publishing it as a gist; this is a secondary service of Github for simple standalone snippets.

Compare Harvest to other source control systems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
From the top, "source control" seems like a bad way to describe CA Harvest; it's a deployment control system, and it's actually pretty good at just deploying code. I've found it to be lacking when doing source control tasks, though.
If you've used Harvest;
what did it do right?
what couldn't it do?
what did it do with a workaround so hackish it took 3x longer than you'd expect?
(Someone correct me if I'm wrong.) Harvest seems awesome for deployment control, enforcing steps along a deployment lifecycle, and getting a chain of approval for deployments to production. That said, it's missing on the developer-friendly side.
It seems like I need to use the Workareas; they let me put all the code on my local machine, so I can do development.
With Workareas, I can only synchronize from the repository, but not get a report of what just sync'ed in; I don't know what changed, or who changed it, or why.
To add comments to checkins using Workareas, you have to manually enable the functionality in the preferences, which is a huge red flag to me.
I can't seem to figure out how to find out what changed since a specific time; what changed since Friday at 5 PM, for example?
There aren't any atomic commits; I can't commit files as a group, then roll the group back later if something goes wrong. I can do it as a package, but that's heavyweight; a package should be able to contain hundreds of atomic commits/groups.
And worst of all, it's entirely unsupported by Stack Overflow and/or any other question-and-answer site I can find. If I can't figure it out... I'm shooting blind.
We're currently migrating away from Harvest.
Configuration management and code deployment. We have a pretty good process flow going.
Branching and merging. Horrible SCM tool really.
?