Does anyone know of a mouse gtf that includes cds, tss, and promoters? - mouse

I am looking for a mouse gtf to run with cufflinks that includes data on promoters, cds, and tss. So far, I have only been able to locate a gtf with data on genes and isoforms.
Thanks.

there is no a standard region for promoter. For tss you can just take the start position of the gene, and for promoter you can take, +-1000 nt around the tss, but it is up to you. There is some kind of standard to take 2000 nt around the tss.
Although if you want to run cufflink with that file, I don't see the point in tss. Maybe just in promoters. Or do a small region around de tss, but it would be inside the promoter region.
I think that you can easily do this with awk.
hope it give some help.
Maybe if you explain what do you want to do, other people know other options for your analysis.

yes, the Ensemble website has a gtf as such. you can also go to igenome: http://support.illumina.com/sequencing/sequencing_software/igenome.html
and select the version you want. let me know if this doesn't work. I can always send you my gtf file that i am using for mm10

Related

Is this possible to write a Quine in ook

According to this comment from the general question Is it possible to create a quine in every turing-complete language? it seems like it is said that it's possible.
However I didn't find any Ook! Quine on the internet.
Do you think that it's really possible?
And if yes will we be able to find it?
It wouldn't even be very difficult. You would want to code it in brainfuck and then translate, and the internal representation for each command should be a pair of numbers (probably from 0-2) to represent the punctuation of each half-command. You could borrow much of the structure from Erik Bosman's brainfuck quine.
Updated: here. https://gist.github.com/danielcristofani/1fe53487df1f7afcb5b91c06d95184b2
This is ~40 commands taken directly from Erik Bosman's quine, another ~120 freshly written commands of rather clunky output code to handle Ook!'s verbosity, and then the data segment to represent all that.

Eclipse Auto-Correct?

Is there a way to have Eclipse auto-correct certain misspellings? For example, I tend to type "System" as "Sysetm", and Eclipse catches it. However, it only tells me it's an invalid package, and I have to manually correct it. I'm hoping there's a way like in Microsoft Word, where you can add words to be auto-corrected.
Trust me you don't want something like that. It would make it almost impossible to write code with it changing what it thinks you want a variable called. Also its use would be very limited.
I have a hard enough time trying to convince word I mean colour and not color.
try to use "alt+/" after input 'sys'

Need to rewrite CNC file to shift absolute position

This question is really in two parts. To briefly introduce the issue, we have a requirement to take a CNC file (used with a Roland milling machine) that has been produced using a tool called ArtCam, and modify it to shift the absolute position of the pattern being cut.
The software produces, and the machine accepts, input files in the following form:
;;^IN;
!MC1;
!RC5000;
V50.0;
^PR;Z0,0,10500;
^PA;
V49.8;
Z0,0,1000;
V39.8;
Z0,0,100;
Z10,0,99;
Z1000,0,-13;
Z10,0,-124;
Z0,0,-125;
...thousands more Zx,y,z; instructions...
The first part to my question is, can anyone actually tell me what this file format is called? It's clearly not G-Code, and I haven't been able to find any reference or documentation for it anywhere.
The second part is, does anyone know how we might easily modify the absolute position of the pattern that these files cut. Obviously the Z lines are X,Y,Z position commands but I don't know if they're absolute or relative, and I don't know in what coordinate space/system they are. For all I know there might be a simple command we can add at the top that shifts the starting point, or we might need to rewrite all the Z lines, but without some information on the file format I'm at a dead end.
Thanks!
I realise this is an old question and you maybe already have an answer (or have no need for one now) but it looks like it's RML-1, assuming my searches were correct.
I first found this which showed very similar code to your example. It mentions ArtCAM and output for the MDX-540, a Roland machine.
Searching Roland's milling machines for information was a bit useless, but going through their 3D products for the MDX-540 mentions that the control command sets is "RML-1 and NC codes".
Then searching for RML-1 gives a result for a PDF manual.
Reading that PDF it looks like the single letter commands are "Mode 1", the ^ is used to select Mode2 and the 2 letter commands are Mode2 commands. !xx commands are common to both Mode1 and Mode2.
^PR sets the movement to relative mode.
^PA sets the movement to absolute mode.
Z moves.
Looking at your code sample it appears as if most positions are absolute and you'd need to re-write them all.

Tool to compare/diff HTML in bulk

I have a lot of HTML files (10,000's and GBs worth) scraped from a server and I want to check to make sure the server produces the same results after some modifications but ignore kinds of differences that don't matter, e.g. whitespace, missing newlines, timestamps, small changes in some kinds of number, etc.
Does anyone know of a tool for doing this? I'd really rather not do more filtering than I have to.
(Oh and it needs to run under linux)
You might consider using a clone detector such as our CloneDR. This tool parses large sets of computer program (HTML is special case) files, builds abstract syntax trees representing the essential structure of each files, and compares programs for similarity.
Because it is comparing essential program structure, it ignores inessential differences such as comments and whitespace, and deterimines that two code segments are either identical or one can be obtained from the other by substituting other blocks of code. The latter allows the recognition of code that has been modified in various ways. You can see samples of clone detection runs on a variety of computer languages at the web site.
In your case, what you would be looking for are files in system A which are essentially clones (exact or near misses) of files in system B. As a general rule, if a file a is a variant of file b (e.g., with a few changes) the CloneDr will report it as a clone and show the exact differences.
At the scale of 20,000 files, I can see why you want a tool, and I can see why you want near-miss matches rather than exact matches.
Doesn't run under Linux, but I assume your problem is hard to enough to solve so that isn't what you are optimizing.
I use winmerge alot in windows and from what i can see some people enjoy meld in linux, so perhaps that could do the trick for you
http://meld.sourceforge.net/
Other examples i saw from a quick googling was Kompare,xxdiff.sourceforge.net, and kdiff3.sourceforge.net
(could only post 1 link so wrote the adresses to xxdiff and kdiff3 as text)
Beyond Compare is purchased software that is actually worth the money (I never thought I'd hear myself typing that!). It is GUI based but handles thousands of files very well. It will allow you to specify unimportant changes with regular expressions as well as whitespace (beginning, middle and end of line). The feature set is very extensive, check out a trial download.
I do not work for this company, I just use Beyond Compare every day at work and enjoy it every time!

What is a good method for inventing a command name?

We're struggling to come up with a command name for our all purpose "developer helper" tool, which we are using on our project. It's like a wrapper for our existing tools like cmake and hg. The purpose of the command is really just to make our lives easier by combining multiple commands into one (for example, publishing packages). For example, we have commands like:
do conf
do build
do install
do publish
We've considered a few ambiguous names like do (as above) and run, but obviously, do is a Linux bash command and run is pretty ambiguous.
We'd like our command to be 2 chars short, preferably - but who thinks we're asking the impossible? Is there a practical way to check the availability of command names (other than just typing them into your terminal), or is it just a case of choose one and hope nobody else will use it? Are we worrying about nothing?
Since it's a "developer helper" tool why not use hm [run|build|port|deploy|test], Help Me ...
Give it a verbose name, then let everyone alias it to whatever they want. Make sure you use the verbose name in other scripts so that it removes ambiguity.
This way, each user gets to use whatever makes sense to him/her, and the scripts are more readable and more easily searchable (for example, grepping four "our_cool_tool" will usually yield better results than grepping for "run").
How many 2-character words are useful in this context? I think you need four. With that in mind, here are some suggestions.
omni
torq
fluf
mega
spif
crnk
splt
argh
quat
drul
scud
prun
sqat
zoom
sizl
I have more if you need them.
Pick one: http://en.wikipedia.org/wiki/List_of_all_two-letter_combinations
To check the availability of command names, I suggest looking for all two-letter filenames that are in the directories in your path. You can use a script like this
for item in `echo $PATH | sed 's/:/ /g'` ; do
ls -1d $item/??
done
It won't show builtins in your shell (like "do" as you mentioned) but it's a good start.
Change ?? to ??? for three-letter files, etc.
I'm going to vote for qp (quick package?) since it's easy to pronounce, easy to type, and easy to remember where the keys are on the keyboard.
I use "asd". it's short and most developers type it without thinking
(oh, and you can always claim later that it stands for some "Advanced Script for Developers" if you need to justify yourself a few years from now)
How about fu? As in Kung Fu. It's a special purpose tool. And it's really easy to type.
I think that run is a good name, at least anybody that will download your project will know what to do. Calling it without parameters should reveal your options.
Even 'do' will do, I think you can use backquotes to run it from bash scripts.
Also remember that running the tools without parameters will tell you what options you have.
Use makefiles to do everything for you.
How about calling it something descriptive, like 'build_runner', and then just aliasing it to 'br' (or preferred acronym) in your .bashrc?
There is a really crappy tool called cleartool (part of clearcase), and people will alias it on their machine to "ct". Perhaps you can have a longer command and suggest users alias it.
It would probably be best to do something like ire_and_curses suggested, name it descriptively then alias it to a 2 letter command. If I was choosing, I would name it dev_help and alias it to dh.
I think you're worrying about nothing. Install the program as 'the-command-to-do-evertyhing-and-if-you-dont-make-your-own-alias-for-it-you-should'. I don't think that will be too long for any modern filesystems, but you might need to shorten it to 'tctdeaiydmyoafiys'. See what common aliases are used, and then change the program's name to that. In other words: don't decide, let natural selection decide for you. If you are working with a team of < 10, this should not even remotely cause any problems.
Call it devtool alias to dt
Custom tools like that I like to start with the prefix 'jj-'. I can type (with big index-finger power) 'jj ' and see all my personal commands. Also, they group together in alphabetical lists. 'J' is not a very common character for built-inc commands, but you can pick your own.
Since you want two characters, you can use just 'zz', or something starting with 'z'.
Are you sure you want to put all your functionality in one command? That might be simultaneously over-constraining and over-loading the interface a little.
do conf
do build
do install
do publish