Must I cite my own code when writing academic papers? - citations

I am writing a work-term report in scientific paper format and I've been citing all software and source code that I have used in my project, but when it comes to scripts I have written, must I create citations for those as well?
It may be worth noting that all the code itself is not central to the paper, only that I've used it during my project (bioinformatics).

Related

How can i beautify my code in Matlab (tabs, deleting unnecessary spaces etc.)? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
i'm looking to find a way to beautify my code in MATLAB. I'm talking about tabs, deleting unnecessary spaces etc., the way Eclipse does it with Ctrl+Shift+F
The smart indentation (ctrl + I) is probably all you need (as #Matteo V and #Cris Luengo already mentioned).
However, there are a few other neat tricks that you might want to have a look at if you are really into code development:
Well, first have a look at the Improve Code Readability site of MATLAB. You could use the Apply smart indenting while typing option in the Preferences > MATLAB > Editor/Debugger > Language > Indenting section (it should be turned on per default but I like the Indent all functions setting). There are a bunch of other settings that you may want to explore
If you dig yourself deeper in the MATLAB IDE, you will notice that you can adjust almost everything to your preferences, but the way is not always documented in the web.... however, the local documentation (call doc) contains the info you may be looking for, see this blog-post
I am not aware of an automatic detection of double spaces or similar but you might end up writing your own little callback-function. Most languages ignore this anyway (perhaps except for Python). Code readability is usually a topic that the programmer(s) should care about... and not the machines ;)
Further tips:
respect the Right-hand text limit, which is the vertical gray line in your editor and shall indicate how many characters a single line of code should have as maximum. If it is a comment, wrap it. If it is an expression, try to outsource some commands to a dedicated variable
use equally long variable names. (There is no style guide as in Python, which says that you should use normal words and underscores etc) E.g. if you have two variables describing a commanded and a measured velocity, you could call them v_cmd and v_act and your code perfectly aligns if you apply the same manipulations to both variables ;)
use section. With %% (the space is important) at the beginning of a line in the editor, you create a section (you'll note the slight yellow background color and the bold writing that follows this command). It is convenient to structure your code. You can even run entire sections Editor-Tab > Run > Run section
Although there are programmers claiming that a good code speaks for itself (and therefore doesn't need any comments), to my experience writing comments has never been a bad idea. It improves the readability of your code
The answer might have been a bit elaborate for such an innocent question ... oO

Cite a paper using github markdown syntax

I am coding a readme for a repo in github, and I want to add a reference to a paper. What is the most adequate way to code in the citation? e.g. As a blockquote, as code, as simple text, etc?
Suggestions?
I agree with Horizon_Net that it depends on personal preference.
I like to have something which looks similar to LaTeX.
An example is provided below.
Note that it demonstrates a numeric citation style.
Alphabetic or reading style are possible too.
For numeric citation style, higher numbers should appear later in the text, and this can make satisfying numeric citation style cumbersome.
To avoid this problem, I typically use an alphabetic citation style.
"...the **go to** statement should be abolished..." [[1]](#1).
## References
<a id="1">[1]</a>
Dijkstra, E. W. (1968).
Go to statement considered harmful.
Communications of the ACM, 11(3), 147-148.
"...the go to statement should be abolished..." [1].
References
[1]
Dijkstra, E. W. (1968).
Go to statement considered harmful.
Communications of the ACM, 11(3), 147-148.
On GitHub flavored Markdown and most other Markdown flavors, you can actually click on [1] to jump to the reference.
Apologies for taking Dijkstra his sentence out of context.
The full sentence would make this example more difficult to read.
EDIT:
If the references all have a stable link, it is also possible to use those:
The field of natural language processing (NLP) has become mostly dominated by deep learning approaches
(Young et al., [2018](https://doi.org/10.1109/MCI.2018.2840738)).
Some are based on transformer neural networks
(e.g., Devlin et al, [2018](https://arxiv.org/abs/1810.04805)).
The field of natural language processing (NLP) has become mostly dominated by deep learning approaches
(Young et al., 2018).
Some are based on transformer neural networks
(e.g., Devlin et al, 2018).
From what I know, there is no built-in mechanism for this. This leads to more subjective opinions, depending on personal preference. I personally like to have a separate section called references. An example would look like the following
References
some reference
another reference
Update
Another way would be to use simple HTML embedded in your Markdown.
The alternative (to pure markdown) is to use the new GitHub integration from Aug. 2021:
Enhanced support for citations on GitHub
GitHub now has built-in support for CITATION.cff files.
This new feature enables academics and researchers to let people know how to correctly cite their work, especially in academic publications/materials.
Originally proposed by the research software engineering community, CITATION.cff files are plain text files with human- and machine-readable citation information.
When we detect a CITATION.cff file in a repository, we use this information to create convenient APA or BibTeX style citation links that can be referenced by others.
How this works
Under the hood, we’re using the ruby-cff RubyGem to parse the contents of the CITATION.cff file and build a citation string that is then shown in GitHub when someone browses a repository with one of those files1.
Now, that is helping others making your Github paper easily citable.
But, as documented, to "add a reference to a paper":
Citing something other than software
If you would prefer the GitHub citation information to link to another resource such as a research article, then you can use the preferred-citation override in CFF with the following types.
Resource
Type
Research article
article
Conference paper
conference-paper
Book
book
Extract
preferred-citation:
type: article
authors:
- family-names: "Lisa"
given-names: "Mona"
orcid: "https://orcid.org/0000-0000-0000-0000"

How do you include authorship and version info into your MATLAB functions [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
This question is related to my previous one: MATLAB m-file help formatting.
What do you usually write to describe authorship of your own function? Do you put it at the end of the function body or right after the help text before any code?
How do you include version information? Is it possible to automatically update version after function modification?
This is what I usually include:
% My Name <my#email>
% My company
% Created: September 2010
% Modified: October 2010
Please share your thoughts, ideas?
I have a function in the MATLAB Central File Exchange that helps you document your function in a standard way, and works with version control software (CVS and Subversion; not git) to automatically update the author field and time of modification.
You just type new at the command prompt, then the name of the function, and it sorts out the rest.
The basic template for documentation that I use is
function [outputArgs] = TestFunction(inputArgs)
%TESTFUNCTION Summary of this function goes here
%
% [OUTPUTARGS] = TESTFUNCTION(INPUTARGS) Explain usage here
%
% Examples:
%
% Provide sample usage code here
%
% See also: List related files here
% $Author: rcotton $ $Date: 2010/10/01 18:23:52 $ $Revision: 0.1 $
% Copyright: Health and Safety Laboratory 2010
(You'll obviously want a different company in your copyright statement.)
The first line of the help documentation is known as the H1 line, and is used by the function lookfor, among others. It is important that this comes straight after the function definition line.
If you have different use cases (maybe with and without optional arguments), then you should describe each one.
The Examples: and See also: lines are formatted in a way the works with the help report generator. (I've just spotted a bug - the year should be before the company name in the copyright line. Fix on its way.)
$Author:, etc., are formatted for use with CSV/SVN. Since git uses hashes of files, you can't change the content of the file, without git thinking that it's been updated.
We keep our code in a Subversion repository and use the keywords functionality for writing this sort of information into the header comments of the m-file when it is committed to the repo. We put a block of comments right after the initial function line in (most of) our m-files.
If you are not using a source code control system then:
You really should start using one right now.
You could write a script (in Matlab, why not) for maintaining the comment information you require, and implement some process to ensure that you run the script when necessary.
We don't generally put modification dates or histories in our source files, the repository tracks changes for us. Which is another reason you should be using one.
And, while you are thinking about all this, if you haven't already done so: check out Matlab's publish functionality.
EDIT: #yuk: I guess from your mention of TortoiseSVN that you are working on Windows. It's a couple of years since I installed Subversion on my Windows PC. I don't recall any problems at all with the installation, and I'm not qualified to help you debug yours -- but there are plenty on SO who are.
As for keeping version (etc) info up to date, there's no scripting required. You simply include the special strings, such as $Rev$, which Subversion recognises as keywords in the locations in your files (probably in the comments) where you want the version information (etc). This is well explained in the Subversion Book.
As for the Matlab documentation, I think the publish (and related) features are well explained in the Desktop Tools and Development Environment handbook which is available on-line at the Mathworks web-site.
As High Performance Mark points out, some form of source code control is ideal for handling this situation. However, if you are adding information by hand here are a few pointers:
I would definitely include a line stating the MATLAB version your code was written in, or perhaps which versions you know it works for.
When adding the information, you have to leave space between it and the help comment block if you don't want to display it when the user views the help contents, like so:
function myFunction
%# Help text for function
%# Your information
Unless you do want it displayed with the help. Then just make one big comment block.

What do you think is the best language for Bioinformatics? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I have done a couple research jobs in Bio-informatics and I have used Matlab for them. Matlab had a lot of powerful tools and was easy to use. I did thinks with genome sequencing and predicting metabolic pathways. I am wondering what other people think is best? or there might not be one specific language but a few that lend themselves best to Bio-informatics work that is math heavy and deals with a large amount of data.
You'll likely be interested in this thread over at BioStar:
Which are the best programming languages to study for a bioinformatician?
For most of us bioinformaticians, this includes Python, R, Perl, and bash command line utilities (like sed, awk, cut, sort, etc). There are also people who code in Java, Ruby, C++, and Matlab.
So the bottom line? Whichever language lets you get the work done most easily is the right one for you. Answering this question should include a careful survey of the libraries and other code that you can pull from, as well as information on your own preferences and experience. If you're doing microarray analysis, it's hard to beat the R/bioconductor libraries, but that's absolutely the wrong language for someone wrangling most types of large sequencing data sets.
There's no one right language for bioinformatics.
The important BLAST sequencing tool is written in C++
The MATT tool for aligning protein structures is written in C
Some of my colleagues in computational biology use Ruby.
In general, I see a lot of C and C++ for performance-critical code and a lot of scripting languages otherwise.
Python + scipy are decent (and FREE).
http://www.vetta.org/2008/05/scipy-the-embarrassing-way-to-code/
http://www.google.com/search?hl=en&source=hp&q=python+bioinformatics&aq=0&aqi=g9g-m1&aql=&oq=python+bio&gs_rfai=CeE1nPpMNTN2IJZ-yMZX6pcIKAAAAqgQFT9DLSgo
You do not even need to learn new syntax really when when dropping Matlab for SciPy.
Best or not, SAS is the de facto programming enviroment in biopharmas. If you were to work for the Pfizers, Mercks and Bayers of the world in bioinformatics, you had better have SAS skills. SAS programmers are in great demand.
What's the "best" language is both subjective and potentially different from task to task, but for bioinformatic work, I personally use R, Perl, Delphi and C (quite frequently a combination of several of these).
I work mainly with HMMs and protein sequences. I started out writing in C, but have since switched to Python, which I'm happy with. I find it's easier to prototype something quickly and results in easier to maintain code.
Here's a freely available academic paper written on the subject that evaluates the different languages, and in different situations: http://www.biomedcentral.com/1471-2105/9/82
They grouped 6 commonly used languages into 3 different levels.
2 compiled languages: C, C++
2 semi-compiled languages: C#, Java
2 interpreted languages: Perl, Python
Some general conclusions:
Compiled languages outperformed interpreted languages in global alignments and Neighbour-Joining programs
Interpreted languages generally used more memory
All languages performed roughly the same for BLAST computations, except for Python
Compiled languages require more written lines of code to perform the same tasks
Compiled languages tend to be better for algorithm implementation
Interpreted languages tend to be better for file parsing/manipulation
Here's another good free academic article discussing ways to build bioinformatics skills: http://dx.plos.org/10.1371/journal.pcbi.1000589

Why is Perl used so extensively in biology research? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I work as support staff in a biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk.
Why is Perl used so much in biology?
Lincoln Stein highlighted some of the saving graces of Perl for bioinformatics in his article:
How Perl Saved the Human Genome Project.
From his analysis:
I think several factors are responsible:
Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language.
Perl is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry. Of course this flexibility can be also be a curse. I talk more about the problems with Perl below.
Perl is component-oriented. Perl encourages people to write their software in small modules, either using Perl library modules or with the classic Unix tool-oriented approach. External programs can easily be incorporated into a Perl script using a pipe, system call or socket. The dynamic loader introduced with Perl5 allows people to extend the Perl language with C routines or to make entire compiled libraries available for the Perl interpreter. An effort is currently under way to gather all the world's collected wisdom about biological data into a set of modules called "bioPerl" (discussed at length in an article to be published later in the Perl Journal).
Perl is easy to write and fast to develop in. The interpreter doesn't require you to declare all your function prototypes and data types in advance, new variables spring into existence as needed, calls to undefined functions only cause an error when the function is needed. The debugger works well with Emacs and allows a comfortable interactive style of development.
Perl is a good prototyping language. Because Perl is quick and dirty, it often makes sense to prototype new algorithms in Perl before moving them to a fast compiled language.
Sometimes it turns out that Perl is fast enough so that of the algorithm doesn't have to be ported; more frequently one can write a small core of the algorithm in C, compile it as a dynamically loaded module or external executable, and leave the rest of the application in Perl (for an example of a complex genome mapping application implemented in this way, see http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/).
Perl is a good language for Web CGI scripting, and is growing in importance as more labs turn to the Web for publishing their data.
The real answer probably has less to do with Perl than you think. Many of the things that happen are accidents of history. At the time, way back when, Perl was pretty popular, Java was getting more popular, not too many people were paying attention to Python, and Ruby was just getting started.
The people who needed to get work done used Perl and made some libraries in Perl, and other people started using those libraries. Once people start using something that is moderately useful to them, they tend not to switch (economists call those "switching costs"). From there, even more people start using it because a lot of other people are using it.
The same evolution might not happen today. I'd say that Perl, Python, and Ruby are all completely adequate and up to the task. All the things that mobrule quotes from Lincoln Stein could apply to any of the three today. If everyone had to start from scratch today, any one of those languages could be the one that everyone uses.
I've noticed, from my own client base though (a very small and unrepresentative sample of biotech), that the people pushing the programming for a lot of the biological stuff seemed to be at least part-time sysadmins who were supporting scientists. The scientists worried about the science and did some light programming, but the IT support people were doing a lot of the heavy lifting for the non-science parts. Perl is very well positioned as a sysadmin tool since it's the duct-tape of the internet.
Probably because Perl is good at manipulating strings, and much research in genetics involves the manipulation of veeery long "ACTGCATG..." strings. Just guessing...
I use lots of Perl for dealing with qualitative and quantitative data in social science research. In terms of getting things done (largely with text) quickly, finding libraries on CPAN (nice central location), and generally just getting things done quickly, it can't be surpassed.
Perl is also excellent glue, so if you have some instrumental records, and you need to glue them to data analysis routines, then Perl is your language.
Perl seems to be the language of choice for bioinformatics - there's even an O'Reilly title on just this subject: Beginning Perl for Bioinformatics.
Perl is very powerful when it comes to deal with text and it's present in almost every Linux/Unix distribution. In bioinformatics, not only are sequence data very easy to manipulate with Perl, but also most of the bionformatics algorithms will output some kind of text results.
Then, the biggest bioinformatics centers like the EBI had that great guy, Ewan Birney, who was leading the BioPerl project. That library has lots of parsers for every kind of popular bioinformatics algorithms' results, and for manipulating the different sequence formats used in major sequence databases.
Nowadays, however, Perl is not the only language used by bioinformaticians: along with sequence data, labs produce more and more different kinds of data types and other languages are more often used in those areas.
The R statistics programming language for example, is widely used for statistical analysis of microarray and qPCR data (among others). Again, why are we using it so much? Because it has great libraries for that kind of data (see bioconductor project).
Now when it comes to web development, CGI is not really state of the art today, but people who know Perl may stick to it. In my company though it is no longer used...
I hope this helps.
Perl basically forces very short development cycles. That's the kind of development that gets stuff done.
It's enough to outweigh Perl's disadvantages.
Bioinformatics deals primarily in text parsing and Perl is the best programming language for the job as it is made for string parsing. As the O'Reilly book (Beginning Perl for Bioinformatics) says that "With [Perl]s highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis."
This seems to be a pretty comprehensive response. Perhaps one thing missing, however, is that most biologists (until recently, perhaps) don't have much programming experience at all. The learning curve for Perl is much lower than for compiled languages (like C or Java), and yet Perl still provides a ton of features when it comes to text processing. So what if it takes longer to run? Biologists can definitely handle that. Lab experiments routinely take one hour or more finish, so waiting a few extra minutes for that data processing to finish isn't going to kill them!
Just note that I am talking here about biologists that program out of necessity. I understand that there are some very skilled programmers and computer scientists out there that use Perl as well, and these comments may not apply to them.
People missed out DBI, the Perl abstract database interface that makes it really easy to work with bioinformatic databases.
There is also the one-liner angle. You can write something to reformat data in a single line in Perl and just use the -pe flag to embed that at the command line. Many people using AWK and sed moved to Perl. Even in full programs, file I/O is incredibly easy and quick to write, and text transformation is expressive at a high level compared to any engineering language around. People who use Java or even Python for one-off text transformation are just too lazy to learn another language. Java especially has a high dependence on the JVM implementation and its I/O performance.
At least you know how fast or slow Perl will be everywhere, slightly slower than C I/O. Don't learn grep, cut, sed, or AWK; just learn Perl as your command line tool, even if you don't produce large programs with it. Regarding CGI, Perl has plenty of better web frameworks such as Catalyst and Mojolicious, but the mindshare definitely came from CGI and bioinformatics being one of the earliest heavy users of the Internet.
Perl is very easy to learn as compared to other languages. It can fully exploit the biological data which is becoming the big data. It can manipulate big data and perform good for manipulation data curation and all type of DNA programming, automation of biology has become easy due languages like Perl, Python and Ruby. It is very easy for those who are knowing biology, but not knowing how to program that in other programming languages.
Personally, and I know this will date me, but it's because I learned Perl first. I was being asked to take FASTA files and mix with other FASTA files. Perl was the recommended tool when I asked around.
At the time I'd been through a few computer science classes, but I didn't really know programming all that well.
Perl proved fairly easy to learn. Once I'd gotten regular expressions into my head I was parsing and making new FASTA files within a day.
As has been suggested, I was not a programmer. I was a biochemistry graduate working in a lab, and I'd made the mistake of setting up a Linux server where everyone could see me. This was back in the day when that was an all-day project.
Anyway, Perl became my goto for anything I needed to do around the lab. It was awesome, easy to use, super flexible, other Perl guys in other labs we're a lot like me.
So, to cut it short, Perl is easy to learn, flexible and forgiving, and it did what I needed.
Once I really got into bioinformatics I picked up R, Python, and even Java. Perl is not that great at helping to create maintainable code, mostly because it is so flexible. Now I just use the language for the job, but Perl is still one of my favorite languages, like a first kiss or something.
To reiterate, most bioinformatics folks learned coding by just kluging stuff together, and most of the time you're just trying to get an answer for the principal investigator (PI), so you can't spend days on code design. Perl is superb at just getting an answer, it probably won't work a second time, and you will not understand anything in your own code if you see it six months later; BUT if you need something now, then it is a good choice even though I mostly use Python now.
I hope that gives you an answer from someone who lived it.