Trying to edit the name of files to be used as input in bash scripting - renaming

I am trying multiple files that look like this:
k27.contigs.fa; where the value (e.g. 27) differs
I am trying to write a for loop for a function for each of my file, and the functions looks like:
megahit_toolkit contig2fastg 99 k99.contigs.fa > k99.fastg
for x in *.contigs.fa
do
megahit_toolkit contig2fastg ${x%.contings.fa*} $x > ${x%.contings.fa*}.fastg
done
How do I modify the code so that I can replace the number and ignore the k before that?

Related

how to use each to pass sets of variables to a function in q

I have a function that deletes and works using a directory and a table and column as variables:
Delete1[dir,t,c]
Another that retruns a set of directories that works:
Paths[dir]
Now I am trying to combine these two using something like "each" to all the directories that Paths[dir] to Delete1 function and I am trying something like this:
Delete1 each (Paths[dir];t;c)
The syntax does not quite work.
You want to use projection. Supplying only the second and third arguments to the Delete1 function creates a new function with just one argument. You can use each between the projection and Paths
Delete1[;t;c] each Paths[dir]
You could use dot apply to this end, you can read more about dot applies here https://code.kx.com/q/ref/unclassified/#apply. It would look like the following:
Delete1 .' (Paths[dir];t;c)
Note if you're using this delete function to delete a column from a table in every partition you only need to delete it from .d file in the last partition. (like in a previous question of yours soft deleting a column from a table in q )

Why is no output files written in prinseqlite perl loop?

I am completely new to this type of coding/command lines, so I am sorry if I am asking this question in a wrong way.
I want to loop over all files in a directory (I am quality trimming DNA sequencing files (.fastq format))
I have written this loop:
for i in *.fastq; do
perl /apps/prinseqlite/0.20.4/prinseq-lite.pl -fastq $i -min_len 220 -max_len 240 -min_qual_mean 30 -ns_max_n 5 -trim_tail_right 15 -trim_tail_left 15 -out_good /proj/forhot/qfiltered/looptest/$i_filtered.fastq -out_bad null; done
The code itself seems to work, I can see in my terminal that it is taking the right files and it is doing the trimming (it is writing a summary log in the terminal as it goes), but no output files are generated - i.e these ones:
-out_good /proj/forhot/qfiltered/looptest/$i_filtered.fastq
If I run the code in a non-loop way, just on one file it works (= the output is generated). link this example:
prinseq-lite.pl -fastq 60782_merged_rRNA.fastq -min_len 220 -max_len 240 -min_qual_mean 30 -ns_max_n 5 -trim_tail_right 15 -trim_tail_left 15 -out_good 60782_merged_rRNA_filt_codeTEST.fastq -out_bad null
Is there a simple reason/answer to this?
This problem has nothing to do with Perl at all.
/proj/forhot/qfiltered/looptest/$i_filtered.fastq is read by the shell as interpolating the contents of i_filtered. There is no such shell variable, so this argument turns into /proj/forhot/qfiltered/looptest/.fastq ($i_filtered turns into nothing).
Therefore all of your prinseq-lite.pl executions place their output in the same file, which (because its name starts with a .) is "hidden": You need to use ls -a to see it, not just ls.
Fix
... -out_good /proj/forhot/qfiltered/looptest/${i}_filtered.fastq
Note that this would give you e.g. 60782_merged_rRNA.fastq_filtered.fastq for an input file of 60782_merged_rRNA.fastq. If you want to get rid of the duplicate .fastq part, you need something like:
... -out_good /proj/forhot/qfiltered/looptest/"${i%.fastq}"_filtered.fastq

Matlab / Octave - trouble getting started with functions, is there a function main equivalent?

I'm trying to get started with Matlab / Octave and having a difficult time figuring how to organize a program into functions. Currently I'm trying to write a simple program that adds two numbers together and displays the result, with the adding being done by a function. I would have figured this would have worked:
% test.m
close all;
clear all;
num1 = 2;
num2 = 2;
result = myAdd(num1, num2);
disp(result); % this should display 4 ??
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function retval = myAdd(var1, var2)
retval = var1 + var2;
end
Running the above with Octave 4.0.0, I get the following errors:
error: 'myAdd' undefined near line 7 column 10
error: called from
test at line 7 column 8
I have tried also putting the function first and the test part second, and also putting the function in a separate file and having a main.m file in the same directory call the myAdd function, all result in errors.
So here are my questions:
-Does Matlab / Octave have a main equivalent ??
-How does the interpreter know where to start? Does it automatically go to the first line in the program, or is there a certain function name you can use to make it start with that function as function main() is in C/C++ ??
-In a Python program of significant size, my usual practice is to organize things as follows:
# some_python_program.py
import abc
import xyz
###################################################################################################
def main():
# stuff to get program started here
# end main
###################################################################################################
def function1():
# specific function here
# end function
###################################################################################################
def function2():
# specific function here
# end function
###################################################################################################
if __name__ == "__main__":
main()
Is there a way to do the equivalent in Matlab/Octave ??
If somebody could provide some direction as to a main equivalent and/or how to organize functions in Matlab/Octave please advise, thanks.
Matlab/Octave can be a bit confusing in this way if you're coming from a language like python. In order to define a function (without using anonymous functions), you need to create a separate file with the name of that function, which can then be called using the command line.
For example, you would like to create a function called myadd. You should create a file named myadd.m whose contents will be:
function out = myadd(a,b)
out = a+b;
end
Then, as long as your file is on your path (save it to your MATLAB folder or put it in your current working directory), you can call it from the Command Window as follows:
>> myadd(5,6)
ans =
11
Only one function will be made publicly available per file (the one whose name matches the file name). However, you can still define multiple functions per file if you plan to use only that function. For example, if you have a file named foo.m, you can do the following:
function out = foo(a,b)
out = fun(a,b);
end
function out = fun(a,b)
out = a * b;
end
This will allow you to call foo(5,6) from the Command Window, but fun(5,6) will result in an error: Undefined function or variable 'fun'.
Read more about local functions and nested functions.
Hope this is helpful!

how to compare string from .txt file with workspace in Matlab

I'm having a .txt file like this:
structure
a = title
c1 = A.B.C
endstructure
I want to read this to matlab and then check if in my workspace already exists a structure named A.B.C. if so, then I want to save data from this structure to variable c1.
I have problem with proper parsing the line c1 = A.B.C and then comparing it with workspace.
any help appreciated.
explanation - - - - - -
in my workspace there is a structure A.B.C. = [0 1 2 3 5 8]
in my txt file I write c1 = A.B.C, and that is the name. I want in my program to check if this name matches some already existing data with this name in the workspace. if so, assign this data to a variable c1 and leave c1 in the workspace.
To determine if your variable already exists in the Matlab workspace, you can use matlab exist function. It would be a little awkward in your situation though as you can only check to see if variable 'A' exists. You could then nest it further and see if the variable has the specified field. It could look like this:
if( exist('A','var') && isfield(A,'B') && isfield(A.B,'C') )
%do something
end
You can use evalin with the syntax evalin('base',expression,catch_expr) with expression the right side of your '=' sign.
It's not really efficient but it avoid you to have to parse the structure name. In catch_expr you can set a flag telling not to do the assignment.
Just use assignin after that to assign the value in 'c1'. 'c1' is a here string so it's easy to put it in a loop with different names.

Using a .fasta file to compute relative content of sequences

So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.
Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.
Apparently it's something like this:
>label
sequence
>label
sequence
>label
sequence
My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.
Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.
I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!
I get that it's confusing, but you really should try to limit your question to one concrete problem, see https://stackoverflow.com/faq#questions
I have no idea what a ".fasta" file or 'G' and 'C' is.. but it probably doesn't matter.
Generally:
Open input file
Read and parse data. If it's in some strange format that you can't parse, go hunting on http://metacpan.org for a module to read it. If you're lucky someone has already done the hard part for you.
Compute whatever you're trying to compute
Print to screen (standard out) or another file.
A "TAB-delimite" file is a file with columns (think Excel) where each column is separated by the tab ("\t") character. As quick google or stackoverflow search would tell you..
Here is an approach using 'awk' utility which can be used from the command line. The following program is executed by specifying its path and using awk -f <path> <sequence file>
#NR>1 means only look at lines above 1 because you said the sequence starts on line 2
NR>1{
#this for-loop goes through all bases in the line and then performs operations below:
for (i=1;i<=length;i++)
#for each position encountered, the variable "total" is increased by 1 for total bases
total++
}
{
for (i=1;i<=length;i++)
#if the "substring" i.e. position in a line == c or g upper or lower (some bases are
#lowercase in some fasta files), it will carry out the following instructions:
if(substr($0,i,1)=="c" || substr($0,i,1)=="C")
#this increments the c count by one for every c or C encountered, the next if statement does
#the same thing for g and G:
c++; else
if(substr($0,i,1)=="g" || substr($0,i,1)=="G")
g++
}
END{
#this "END-block" prints the gene name and C, G content in percentage, separated by tabs
print "Gene name\tG content:\t"(100*g/total)"%\tC content:\t"(100*c/total)"%"
}