How do I run this Bismark Bisulfite Sequencing program? - perl

I am very new to coding so I'm not really sure how to approach this. I wanted to look at some data that we got and sequence them using Bismark. I already used Trim Galore to pare the reads, now I wanted to get the data into Bismark. However, I'm not exactly sure how to approach this. In the documentation it said that it required Perl to run so I downloaded Perl along with the Bismark zip file from github. I also downloaded the bowtie2 zip file and extracted both the zip files into the same directory. I then opened up the Perl command prompt and set the directory to one with my extracted folders.
I put this line in:
> \bismark\bismark_genome_preparation --path_to_bowtie ^
C:\Users\sevro\Documents\Lab_Code\bowtie2-master --verbose ^
C:\Users\sevro\Documents\Lab_Code\genome
The system cannot find the path specified.
I also tried this after changing the directory to the Bismark folder:
> perl bismark
Failed to execute Bowtie 2 porperly (return code of 'bowtie2 --version' was 256).
Please install Bowtie 2 or HISAT2 first and make sure it is in the PATH,
or specify the path to the Bowtie 2 with --path_to_bowtie2 /path/to/bowtie2,
or --path_to_hisat2 /path/to/hisat2
I tried a few other things but all in all I am a bit confused on how exactly to approach this. Things I have downloaded right now:
Bismark zip file- https://github.com/FelixKrueger/Bismark
Bowtie2 zip file- https://github.com/BenLangmead/bowtie2
A genome assembly in .fa format
The data that I want to analyze in fasta format
Any insight would be helpful.

I think Bismark and bowtie2 only supports Linux and macOS natively. If you want to use bismark on Windows you can try install it via a *nix emulation systems like Cygwin, MSYS2, or simply use WSL. I tested this on Windows 11 with WSL with Ubuntu 20.04:
Downloaded bowtie2-2.4.4-linux-x86_64.zip and extracted to ~/bowtie2/bowtie2-2.4.4-linux-x86_64 folder.
Downloaded Bismark-0.23.1.zip and extracted to ~/bismark/Bismark-0.23.1/
Tested installation:
$ perl --version
This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-gnu-thread-multi (with 50 registered patches, see perl -V for more detail)
$ perl bismark --path_to_bowtie2 ../../bowtie2/bowtie2-2.4.4-linux-x86_64/Bowtie 2 seems to be working fine (tested command '../../bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 --version' [2.4.4])
Output format is BAM (default)
Did not find Samtools on the system. Alignments will be compressed with GZIP instead (.sam.gz)
Genome folder was not specified!
DESCRIPTION
The following is a brief description of command line options and arguments to control the Bismark
bisulfite mapper and methylation caller. Bismark takes in FastA or FastQ files and aligns the
reads to a specified bisulfite genome. Sequence reads are transformed into a bisulfite converted forward strand
version (C->T conversion) or into a bisulfite treated reverse strand (G->A conversion of the forward strand).
Each of these reads are then aligned to bisulfite treated forward strand index of a reference genome
(C->T converted) and a bisulfite treated reverse strand index of the genome (G->A conversion of the
forward strand, by doing this alignments will produce the same positions). These 4 instances of Bowtie 2 or HISAT2
are run in parallel. The sequence file(s) are then read in again sequence by sequence to pull out the original
sequence from the genome and determine if there were any protected C's present or not.
The final output of Bismark is in BAM/SAM format by default, described in more detail below.
USAGE: bismark [options] <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}
[...]

Related

Core dump: how to determine version of crashed application

I need to strictly bind every core file generated by system to certain bin version of crashed application. I can specify core-name pattern in sysctl.conf:kernel.core_pattern, but there is no way to put bin version here.
How can I put the version of crashed program into core file (revision number) or any other way to determine version of crashed bin?
I'm using qmake VERSION variable in .pro file, which contains revision number from SVN. Its available by QCoreApplication::applicationVersion(), in my every bin by flag --version.
Assuming your app can get far enough to print out its version number without a core dump, you can write a small program (python would probably be easiest) that is invoked by a core dump. The program would read stdin, dump it to a file, then rename the file based on the version number.
From man 5 core:
Piping core dumps to a program
Since kernel 2.6.19, Linux supports an alternate syntax for the
/proc/sys/kernel/core_pattern file. If the first character of this
file is a pipe symbol (|), then the remainder of the line is inter‐
preted as a program to be executed. Instead of being written to a disk
file, the core dump is given as standard input to the program. Note
the following points:
* The program must be specified using an absolute pathname (or a path‐
name relative to the root directory, /), and must immediately follow
the '|' character.
* The process created to run the program runs as user and group root.
* Command-line arguments can be supplied to the program (since Linux
2.6.24), delimited by white space (up to a total line length of 128
bytes).
* The command-line arguments can include any of the % specifiers
listed above. For example, to pass the PID of the process that is
being dumped, specify %p in an argument.
If you call your script /usr/local/bin/dumper, then
echo "| /usr/local/bin/dumper %E" > /proc/sys/kernel/core_pattern
The dumper should copy stdin to a file, then try to run the program named on its command line to extract a version number and use that to rename the file.
Something like this might work (I haven't tried it, so use at extreme risk:)
#!/usr/bin/python
import sys,os,subprocess
from subprocess import check_output
CORE_FNAME="/tmp/core"
with open(CORE_FNAME,"f") as f:
while buf=sys.stdin.read(10000):
f.write(buf)
pname=sys.argv[1].replace('!','/')
out=subprocess.check_output([pname, "--version"])
version=out.split('\n')[0].split()[-1]
os.rename(CORE_FNAME, CORE_FNAME+version)
The really big risk of doing this is recursive core dumps that may crash your system. Be sure to use ulimit to only allow core dumps from processes that can print out their own versions without core dumping.
It would be a good idea to change the script to re-run the program to get the version info only if it is the program you are expecting.

colorgcc perl script with output to non-tty enabled writing to C dependency files

Ok, so here's my issue. I have written a build script in bash that pipes output to tee and sorts different output to different log files (so I can summarize errors/warnings at the end and get some statistics on files built). I wanted to use the colorgcc perl script (colorgcc.1.3.2) to colorize the output from gcc and had found in other places that this won't work piping to tee, since the script checks if it is writing to something that is not a tty. Having disabled this check everything was working until I did a full build and discovered some of the code we receive from another group builds C dependency files (we don't control this code, changing it or the build process for these isn't really an option).
The problem is that these .d files have the form as follows:
filename.o filename.d : filename.c \
dependant_file1.h \
dependant_file2.h (and so on for however many dependencies there are)
This output from GCC gets written into the .d file, but, since it is close enough to a warning/error message colorgcc outputs color codes (believe it's the check for filename:lineno:message but not 100% sure, could be filename:message check in the GCCOUT while loop). I've tried editing the regex to attempt to not match this but my perl-fu is admittedly pretty weak. So what I end up with is a color code on each line for these dependency files, which obviously causes the build to fail.
I ended up just replacing the check for ! -t STDOUT with a check for a NO_COLOR envar I set and unset in the build script for these directories (emulates the previous behavior of no color for non-tty). This works great if I run the full script, but doesn't if I cd into the directory and just run make (obviously setting and unsetting manually would work but this is a pain to do every time). Anyone have any ideas how to prevent this script from writing color codes into dependency files?
Here's how I worked around this. I added the following to colorgcc to search the gcc input for the flag to generate the .d files and just directly called the compiler in that case. This was inserted in place of the original TTY check.
for each $argnum (0 .. $#ARGV)
{
if ($ARGV[$argnum] =~ m/-M{1,2}/)
{
exec $compiler, #ARGV
or die("Couldn't exec");
}
}
I don't know if this is the proper 'perl' way of doing this sort of operation but it seems to work. Compiling inside directories that build .d files no longer inserts color codes and the source file builds do (both to terminal and my log files like I wanted). I guess sometimes the answer is more hacks instead of "hey, did you try giving up?".

rename command doesn't rename

This should work on my CentOS 6.6 but somehow the file name is not changed. What am I missing here?
rename -f 's/silly//' sillytest.zi
This should rename sillytest.zi to test.zi but the name is not changed. Of course I can use mv command but I want to apply to many files and patterns.
There are two different rename utilities commonly used on GNU/Linux systems.
util-linux version
On Red Hat-based systems (such as CentOS), rename is a compiled executable provided by the util-linux package. It’s a simple program with very simple usage (from the relevant man page):
rename from to file...
rename will rename the specified files by replacing the first occurrence of from in their name by to.
Newer versions also support a useful -v, --verbose option.
NB: If a file already exists whose name coincides with the new name of the file being renamed, then this rename command will silently (without warning) over-write the pre-existing file.
Example
Fix the extension of HTML files so that all .htm files have a four-letter .html suffix:
rename .htm .html *.htm
Example from question
To rename sillytest.zi to test.zi, replace silly with an empty string:
rename silly '' sillytest.zi
Perl version
On Debian-based systems ,rename is a Perl script which is much more capable
as you get the benefit of Perl’s rich set of regular expressions.
Its usage is (from its man page):
rename [ -v ] [ -n ] [ -f ] perlexpr [ files ]
rename renames the filenames supplied according to the rule specified as the first argument.
This rename command also includes a -v, --verbose option. Equally useful is its -n, --no-act which can be used as a dry-run to see which files would be renamed. Also, it won’t over-write pre-existing files unless the -f, --force option is used.
Example
Fix the extension of HTML files:
rename s/\.htm$/.html/ *.htm

how to make cygwin tar output proper unicode letters instead of shashed values?

I have a *.tar.gz file that have inside occasionally some names with non ascii letters.
for example when tar encounter a file containing word: naïve it outputs: na\303\257ve
Is there any swich, or tool to convert these slashed values to a proper letter ?
http://www.gnu.org/software/tar/manual/tar.html
By default GNU tar attempts to unquote each file or member name, replacing escape sequences according to the following table: ...
This default behavior is controlled by the following command line
option:
--unquote
Enable unquoting input file or member names (default).
--no-unquote
Disable unquoting input file or member names.
In other words, see if "--no-unquote" is an option for your version of Cygwin.
PS:
Which version of Cygwin tar are you using?

grep command to print follow-up lines after a match

how to use "grep" command to find a match and to print followup of 10 lines from the match. this i need to get some error statements from log files. (else need to download use match for log time and then copy the content). Instead of downloading bulk size files i need to run a command to get those number of lines.
A default install of Solaris 10 or 11 will have the /usr/sfw/bin file tree. Gnu grep - /usr/sfw/bin/ggrep is there. ggrep supports /usr/sfw/bin/ggrep -A 10 [pattern] [file] which does what you want.
Solaris 9 and older may not have it. Or your system may not have been a default install. Check.
Suppose, you have a file /etc/passwd and want to filter user "chetan"
Please try below command:
cat /etc/passwd | /usr/sfw/bin/ggrep -A 2 'chetan'
It will print the line with letter "chetan" and the next two lines as well.
-- Tested in Solaris 10 --