extract metadata from genomic gbff files

extract metadata from genomic gbff files - metadata

I have >1000 .gbff.gz genomic files and I want to extract metadata from each and have metadata entries in separate columns.

In Matlab you can make use of genbankread of the Bioinformatics Toolbox. Here is an example of how to achieve what you want:
results = [];
% unzip data
gunzip('*.gbff.gz');
% process each file
files = dir('*.gbff');
for file = {files.name}
data = genbankread(char(file));
% process each file entry
for i = 1:size(data, 2)
LocusName = '';
Definition = '';
Organism = '';
GenesTotal = NaN;
GenesCoding = NaN;
RRNAs = '';
TRNAs = NaN;
IsolationSource = '';
Country = '';
% copy fields
if isfield(data(i), 'LocusName')
LocusName = data(i).LocusName;
end
if isfield(data(i), 'Definition')
Definition = data(i).Definition;
end
if isfield(data(i), 'Source')
Organism = data(i).Source;
end
% parse comments
if isfield(data(i), 'Comment')
for j = 1:size(data(i).Comment, 1)
tokens = regexp(data(i).Comment(j, :), ...
'^\s*([^\s].*[^\s])\s*::\s*([^\s].*[^\s])\s*$', 'tokens');
if ~isempty(tokens)
switch tokens{1}{1}
case 'Genes (total)'
GenesTotal = str2double(tokens{1}{2});
case 'Genes (coding)'
GenesCoding = str2double(tokens{1}{2});
case 'rRNAs'
RRNAs = tokens{1}{2};
case 'tRNAs'
TRNAs = str2double(tokens{1}{2});
end
end
end
end
% parse features
if isfield(data(i), 'Features')
Feature = '';
for j = 1:size(data(i).Features, 1)
tokens = regexp(data(i).Features(j, :), '^(\w+)', 'tokens');
if isempty(tokens)
tokens = regexp(data(i).Features(j, :), ...
'^\s+/(\w+)="([^"]+)"', 'tokens');
if ~isempty(tokens)
switch Feature
case 'source'
switch tokens{1}{1}
case 'isolation_source'
IsolationSource = tokens{1}{2};
case 'country'
Country = tokens{1}{2};
end
end
end
else
Feature = tokens{1}{1};
end
end
end
% append entries to results
results = [results; struct( ...
'File', char(file), 'LocusName', LocusName, 'Definition', Definition, ...
'Organism', Organism, 'GenesTotal', GenesTotal, ...
'GenesCoding', GenesCoding, 'RRNAs', RRNAs, 'TRNAs', TRNAs, ...
'IsolationSource', IsolationSource, 'Country', Country)];
end
end
% data is in variable results

Related

how to dlmwrite a file from array

How to write the cell as below in text file(my_data.out)?
http_only = cell2mat(http_only)
dlmwrite('my_data.out',http_only)
I get the error as below:(I have tried to solve but still return me the error)
Here is my full code:
I want to generate the text file for each of the data which only store 'http_only'
then check for that is it meet the word in split_URL.
%data = importdata('DATA/URL/training_URL')
data = importdata('DATA/URL/testing_URL')
domain_URL = regexp(data,'\w*://[^/]*','match','once')
no_http_URL = regexp(domain_URL,'https?://(?:www\.)?(.*)','tokens','once');
no_http_URL = vertcat(no_http_URL{:});
split_URL = regexp(no_http_URL,'[:/.]*','split')
[sizeData b] = size(split_URL);
for i = 1:100
A7_data = split_URL{i};
data2=fopen(strcat('DATA\WEBPAGE_SOURCE\TESTING_DATA\',int2str(i),'.htm'),'r')
CharData = fread(data2, '*char')'; %read text file and store data in CharData
fclose(data2);
img_only = regexp(CharData, '<img src.*?>', 'match'); %checking
http_only = regexp(img_only, '"http.*?"', 'match');
http_only1 = horzcat(http_only{:});
fid = fopen('my_data.out',int2str(i),'w');
for col = 1:numel(http_only1)
fprintf(fid,'%s\n',http_only1{:,col});
end
fclose(fid);
feature7_data=(~cellfun('isempty', regexpi(CharData , A7_data, 'once')))
B7(i)=sum(feature7_data)
end
feature7(B7>=5)=-1;
feature7(B7<5&B7>2)=0;
feature7(B7<=2)=1;
feature7'

Write cell-by-cell using fprintf -
fid = fopen('my_data.out','w');
for col = 1:numel(http_only)
fprintf(fid,'%s\n',http_only{:,col});
end
fclose(fid);
Edit 1: If your input is a cell array of cell arrays, use this code instead.
Code
http_only1 = horzcat(http_only{:});
fid = fopen('my_data.out','w');
for col = 1:numel(http_only1)
fprintf(fid,'%s\n',http_only1{:,col});
end
fclose(fid);
Edit 2: For a number of inputs to be stored into separate files, use this demo -
data1 = {{'[]'} {'"http://google.com"'} {'"http://yahoo.com'}};
data2 = {{'[]'} {'"http://overflow.com"'} {'"http://meta.exchange.com'}};
data = cat(1,data1,data2);
for k = 1:size(data,1)
data_mat = horzcat(data{k,:});
out_filename = strcat(out_basename,num2str(k),'.out');
fid = fopen(out_filename,'w');
for col = 1:numel(data_mat)
fprintf(fid,'%s\n',data_mat{:,col});
end
fclose(fid);
end

Creat a loop to generate a a mean value of data which I got form TXT files?

I trying to do script to evalute my lab data into matlab, I have lots of txt files for many samples, each sample has 30 txt files. I did a function to get the data from these files and store them into a structure that contains data and labels.
I would like to know if it is possible to load all the 30 files using loop instead of line by line .
function s = try1(fn)
% adapt for other file format according to your needs
fid = fopen(fn);
s.data = [];
% skip first lines
for k=1:6
l = fgetl(fid);
% read header
a = regexp(l,',','split');
s.labels = a;
l = fgetl(fid);
k=0;
end
while( (l~=-1)&(k<130) )
l = l(1:end-1);
a = regexp(l,', ','split');
a = regexpre
p(a, ',','.');
s.data = [s.data; str2double(a(2:4))];
l = fgetl(fid);
k = k+1;
end
fclose(fid);
end

To do what you want with a data directory and processing each file in a loop you can use a combination of fullfile(...) and a loop. This will likely work:
dataDirectory = 'c:/myDirectory';
allFilesDir = dir(fullfile(dataDirectory , '*.txt'));
allFN = {allFilesDir.name}
% Or do this:
% allFN = {'file1.txt', 'file2.txt'};
result = [];
for ind = 1:length(allFN)
myFN = fullfile(dataDirectory, allFN{ind});
result(ind) = try1(myFN);
end
% Now get the mean:
myMean = mean([result.data])

Binary file - Matrix Manipulation

I have several binary files with a known structure (8,12000,real*4). 8 is the number of variables, and 12000 represent the time steps. From these binaries I need to get a final 16x9 matrix defined as followed:
the first column contains a filename identification.
on the diagonal are located the extreme values (maxima and minima) of the corresponding variable.
the simultaneous values of the other variables shall be given in the rows.
At the moment I'm using this code
for i = 1:num_files
for j = 1:num_ext
fid = fopen([fullfile(BEGINPATH{1},FILELIST{i}) '.$' VAREXTENSION{j}],'r');
A = fread(fid,[DLC_dimens{i}{3}(1) DLC_dimens{i}{3}(2)],'real*4');
for k = 1:size(A,1)
[max_val(k,i) max_idx(k,i)] = max(A(k,:));
[min_val(k,i) min_idx(k,i)] = min(A(k,:));
end
fclose(fid);
end
end
% Pre-allocate ULS matrices
uloads = cell(2*size(A,1),size(A,1)+1);
uloads_temp = cell(2*size(A,1),size(A,1)+1);
% Generate ULS matrix for the first file
for i = 1:size(uloads,1)
uloads{i,end} = DLC_dimens{1}(1);
end
fid = fopen([fullfile(BEGINPATH{1},FILELIST{1}) '.$' VAREXTENSION{1}],'r');
A = fread(fid,[DLC_dimens{i}{3}(1) DLC_dimens{i}{3}(2)],'real*4');
for j = 1:size(uloads,2)-1
uloads{1,j} = A(j,max_idx(1,1))*DLC_dimens{1}{4};
uloads{2,j} = A(j,min_idx(1,1))*DLC_dimens{1}{4};
uloads{3,j} = A(j,max_idx(2,1))*DLC_dimens{1}{4};
uloads{4,j} = A(j,min_idx(2,1))*DLC_dimens{1}{4};
uloads{5,j} = A(j,max_idx(3,1))*DLC_dimens{1}{4};
uloads{6,j} = A(j,min_idx(3,1))*DLC_dimens{1}{4};
uloads{7,j} = A(j,max_idx(4,1))*DLC_dimens{1}{4};
uloads{8,j} = A(j,min_idx(4,1))*DLC_dimens{1}{4};
uloads{9,j} = A(j,max_idx(5,1))*DLC_dimens{1}{4};
uloads{10,j} = A(j,min_idx(5,1))*DLC_dimens{1}{4};
uloads{11,j} = A(j,max_idx(6,1))*DLC_dimens{1}{4};
uloads{12,j} = A(j,min_idx(6,1))*DLC_dimens{1}{4};
uloads{13,j} = A(j,max_idx(7,1))*DLC_dimens{1}{4};
uloads{14,j} = A(j,min_idx(7,1))*DLC_dimens{1}{4};
uloads{15,j} = A(j,max_idx(8,1))*DLC_dimens{1}{4};
uloads{16,j} = A(j,min_idx(8,1))*DLC_dimens{1}{4};
end
fclose(fid);
% ULS temporary matrix generation
uls = uloads;
for i = 2:num_files
fid = fopen([fullfile(BEGINPATH{1},FILELIST{i}) '.$' VAREXTENSION{1}],'r');
A = fread(fid,[8 12000],'float32');
for j = 1:size(uloads,1)
uloads_temp{j,9} = DLC_dimens{i}(1);
end
for k = 1:size(uloads,2)-1
uloads_temp{1,k} = A(k,max_idx(1,i))*DLC_dimens{i}{4};
uloads_temp{2,k} = A(k,min_idx(1,i))*DLC_dimens{i}{4};
uloads_temp{3,k} = A(k,max_idx(2,i))*DLC_dimens{i}{4};
uloads_temp{4,k} = A(k,min_idx(2,i))*DLC_dimens{i}{4};
uloads_temp{5,k} = A(k,max_idx(3,i))*DLC_dimens{i}{4};
uloads_temp{6,k} = A(k,min_idx(3,i))*DLC_dimens{i}{4};
uloads_temp{7,k} = A(k,max_idx(4,i))*DLC_dimens{i}{4};
uloads_temp{8,k} = A(k,min_idx(4,i))*DLC_dimens{i}{4};
uloads_temp{9,k} = A(k,max_idx(5,i))*DLC_dimens{i}{4};
uloads_temp{10,k} = A(k,min_idx(5,i))*DLC_dimens{i}{4};
uloads_temp{11,k} = A(k,max_idx(6,i))*DLC_dimens{i}{4};
uloads_temp{12,k} = A(k,min_idx(6,i))*DLC_dimens{i}{4};
uloads_temp{13,k} = A(k,max_idx(7,i))*DLC_dimens{i}{4};
uloads_temp{14,k} = A(k,min_idx(7,i))*DLC_dimens{i}{4};
uloads_temp{15,k} = A(k,max_idx(8,i))*DLC_dimens{i}{4};
uloads_temp{16,k} = A(k,min_idx(8,i))*DLC_dimens{i}{4};
end
if uloads_temp{1,1}(:) > uls{1,1}(:)
uls(1,:) = uloads_temp(1,:);
end
if uloads_temp{2,1}(:) < uls{2,1}(:)
uls(2,:) = uloads_temp(2,:);
end
if uloads_temp{3,2}(:) > uls{3,2}(:)
uls(3,:) = uloads_temp(3,:);
end
if uloads_temp{4,2}(:) < uls{4,2}(:)
uls(4,:) = uloads_temp(4,:);
end
if uloads_temp{5,3}(:) > uls{5,3}(:)
uls(5,:) = uloads_temp(5,:);
end
if uloads_temp{6,3}(:) < uls{6,3}(:)
uls(6,:) = uloads_temp(6,:);
end
if uloads_temp{7,4}(:) > uls{7,4}(:)
uls(7,:) = uloads_temp(7,:);
end
if uloads_temp{8,4}(:) < uls{8,4}(:)
uls(8,:) = uloads_temp(8,:);
end
if uloads_temp{9,5}(:) > uls{9,5}(:)
uls(9,:) = uloads_temp(9,:);
end
if uloads_temp{10,5}(:) < uls{10,5}(:)
uls(10,:) = uloads_temp(10,:);
end
if uloads_temp{11,6}(:) > uls{11,6}(:)
uls(11,:) = uloads_temp(11,:);
end
if uloads_temp{12,6}(:) < uls{12,6}(:)
uls(12,:) = uloads_temp(12,:);
end
if uloads_temp{13,7}(:) > uls{13,7}(:)
uls(13,:) = uloads_temp(3,:);
end
if uloads_temp{14,7}(:) < uls{14,7}(:)
uls(14,:) = uloads_temp(14,:);
end
if uloads_temp{15,8}(:) > uls{15,8}(:)
uls(15,:) = uloads_temp(15,:);
end
if uloads_temp{16,8}(:) < uls{16,8}(:)
uls(16,:) = uloads_temp(16,:);
end
fclose(fid);
end
Now comes the question: I was thinking of a procedure where
I generate a temporary uloads_temp matrix just with the first file;
Calculate the uloads matrix for the i-th file (i = 2:num_files)
Compare the terms on the diagonal between i-th uloads matrix and temporary uloads_temp:
a) if the elements of the i-th ulaods are major (minor) than the respective uloads_temp values
b) update uloads_temp rows the condition a) occurs.
I hope I explained everything properly. Could you please give me a hint on how to perform the described loops?
I thank you all in advance.
WKR,
Francesco
P.S. : everything could be reproduced by means fo matrices filled of random numbers; I just copied and pasted my code with reference to a file list.

incapsulation of a code inmatlab

my code is
pathname=uigetdir;
filename=uigetfile('*.txt','choose a file name.');
data=importdata(filename);
element= (data.data(:,10));
in_array=element; pattern= [1 3];
locations = cell(1, numel(pattern));
for p = 1:(numel(pattern))
locations{p} = find(in_array == pattern(p));
end
idx2 = [];
for p = 1:numel(locations{1})
start_value = locations{1}(p);
for q = 2:numel(locations)
found = true;
if (~any((start_value + q - 1) == locations{q}))
found = false;
break;
end
end
if (found)
idx2(end + 1) = locations{1}(p);
end
end
[m2,n2]=size(idx2)
res_name= {'one' 'two'};
res=[n n2];
In this code I finding a pattern in one of the column of my data file and counting how many times it's repeated.
I have like 200 files that I want to do the same with them but unfotunatlly I'm stuck.
this is what I have added so far
pathname=uigetdir;
files=dir('*.txt');
for k=1:length(files)
filename=files(k).name;
data(k)=importdata(files(k).name);
element{k}=data(1,k).data(:,20);
in_array=element;pattern= [1 3];
locations = cell(1, numel(pattern));
for p = 1:(numel(pattern))
locations{p} = find(in_array{k}== pattern(p));
end
idx2{k} = [];
how can I continue this code..??

OK, first define this function:
function [inds, indsy] = findPattern(M, pat, dim)
indices = [];
if nargin == 2
dim = 1;
if size(M,1) == 1
dim = 2; end
end
if dim == 1
if numel(pat) > size(M,1)
return; end
for ii = 1:size(M,2)
inds = findPatternCol(M(:,ii), pat);
indices = [indices; repmat(ii,numel(inds),1) inds]%#ok
end
elseif dim == 2
if numel(pat) > size(M,2)
return; end
for ii = 1:size(M,1)
inds = findPatternCol(M(ii,:).', pat);
indices = [indices; inds repmat(ii,numel(inds),1)]%#ok
end
else
end
inds = indices;
if nargout > 1
inds = indices(:,1);
indsy = indices(:,2);
end
end
function indices = findPatternCol(col, pat)
inds = find(col == pat(1));
ii = 1;
prevInds = [];
while ~isempty(inds) && ii<numel(pat) && numel(prevInds)~=numel(inds)
prevInds = inds;
inds = inds(inds+ii<=numel(col) & col(inds+ii)==pat(ii+1));
ii = ii + 1;
end
indices = inds(:);
end
which is decent but probably not the most efficient. If performance becomes a problem, start here with optimizations.
Now loop through each file like so:
pathname = uigetdir;
files = dir('*.txt');
indices = cell(length(files), 1);
for k = 1:length(files)
filename = files(k).name;
data(k) = importdata(files(k).name);
array = data(1,k).data(:,20);
pattern = [1 3];
indices{k} = findPattern(array, pattern);
end
The number of occurrences of the pattern can be found like so:
counts = cellfun(#(x)size(x,1), indices);

Converting Matlab to Octave is there a containers.Map equivalent?

I am trying to convert some matlab code from the Maia package into something that will work with Octave. I am currently getting stuck because one of the files has several calls to containers.Map which is apparently something that has not yet been implemented in octave. Does anyone have any ideas for easily achieving similar functionality without doing a whole lot of extra work in octave? Thanks all for your time.
function [adj_direct contig_direct overlap names longest_path_direct...
weigth_direct deltafiles deltafiles_ref ReferenceAlignment ...
contig_ref overlap_ref name_hash_ref] = ...
assembly_driver(assemblies,ref_genome,target_chromosome, ...
deltafiles_ref,contig_ref, overlap_ref, ...
name_hash_ref, varargin)
% ASSEMBLY_DRIVER Combines contig sets into one assembled chromosome
%
% INPUT
% assemblies
% ref_chromosome
% Startnode_name
% Endnode_name
% OPTIONAL DEFAULT
% 'z_weigths' [.25 .25 .25 .25]
% 'clipping_thrs' 10
% 'ref_distance' -10
% 'ref_quality' 1E-5
% 'max_chromosome_dist' 100
% 'quit_treshold' 15
% 'tabu_time' 3
% 'minimum_improvement' -inf
% 'ref_node_assemblies' all assemblies (slow)
% 'endextend' true
%
%
% SET DEFAULTS
% General parameters
z_weights = [.25 .25 .25 .25];
clipping_thrs = 10;
mapfilter = '-rq';
alignlen = 75;
ident = 85;
% Reference nod parameters
ref_distance = -10;
ref_quality = 1E-5;
max_chromosome_dist = 100;
% TABU parameters
quit_treshold = 15;
tabu_time = 3;
minimum_improvement = -inf;
ref_node_assemblies = assemblies;
% Extending the assembly outwards from the start and en node
endextend = true;
AllowReverse = true;
% If no start and end node are given, they will be determined from tiling
Startnode_name = '';
Endnode_name = '';
containment_edge = true;
ref_first = true;
% If contigs have already been aligned to the reference, give the
% deltafile
ReferenceAlignment = 'NotYetDoneByMaia';
% Get VARARGIN user input
if length(varargin) > 0
while 1
switch varargin{1}
case 'Startnode_name'
Startnode_name = varargin{2};
case 'Endnode_name'
Endnode_name = varargin{2};
case 'z_weigths'
z_weights = varargin{2};
case 'clipping_thrs'
clipping_thrs = varargin{2};
case 'ref_distance'
ref_distance = varargin{2};
case 'ref_quality'
ref_quality = varargin{2};
case 'max_chromosome_dist'
max_chromosome_dist = varargin{2};
case 'quit_treshold'
quit_treshold = varargin{2};
case 'tabu_time'
tabu_time = varargin{2};
case 'minimum_improvement'
minimum_improvement = varargin{2};
case 'ref_node_assemblies'
ref_node_assemblies = assemblies(varargin{2},:);
case 'extend_ends'
endextend = assemblies(varargin{2},:);
case 'AllowReverse'
AllowReverse = varargin{2};
case 'ReferenceAlignment'
ReferenceAlignment = varargin{2};
case 'containment_edge'
containment_edge = varargin{2};
case 'ref_first'
ref_first = varargin{2};
case 'mapfilter'
mapfilter = varargin{2};
case 'alignlen'
alignlen = varargin{2};
case 'ident'
ident = varargin{2};
otherwise
error(['Input ' varargin{2} ' is not known']);
end
if length(varargin) > 2
varargin = varargin(3:end);
else
break;
end
end
end
% Read input assemblies
assembly_names = assemblies(:,1);
assembly_locs = assemblies(:,2);
assembly_quality = containers.Map(assemblies(:,1),assemblies(:,3));
assembly_quality('reference') = ref_quality;
% Read input assemblies for creation of reference nodes
ref_node_assembly_names = ref_node_assemblies(:,1);
ref_node_assembly_locs = ref_node_assemblies(:,2);
ref_node_assembly_quality = containers.Map(ref_node_assemblies(:,1),ref_node_assemblies(:,3));
ref_node_assembly_quality('reference') = ref_quality;
% If there is only one assembly there is nothing to align
if size(assemblies,1) >= 2
% Align assemblies against each other
assembly_pairs = {};
coordsfiles = [];
deltafiles = [];
for i = 1:length(assembly_locs)-1
for j = i+1:length(assembly_locs)
[coordsfile,deltafile] = align_assemblies({assembly_locs{i},assembly_locs{j}},{assembly_names{i}, assembly_names{j}}, ...
mapfilter, alignlen, ident);
coordsfiles = [coordsfiles; coordsfile];
%deltafiles = [deltafiles deltafile];
deltafiles = [deltafiles; {deltafile}];
assembly_pairs = [assembly_pairs;[assembly_names(i) assembly_names(j)]];
end
end
% fprintf('Loading alignment files.\n');
% load alignments_done;
% Put the nucmer alignments in an adjency matrix
%[adj, names, name_hash, contig, overlap] = get_adj_matrix(coordsfiles, assembly_pairs, assembly_quality, z_weights, 'clipping_thrs', clipping_thrs, 'dove_tail', 'double','edge_weight','z-scores', 'containment_edge', true);
[adj, names, name_hash, contig, overlap] = get_adj_matrix(deltafiles, assembly_pairs, assembly_quality, z_weights, 'clipping_thrs', clipping_thrs, 'dove_tail', 'double','edge_weight','z-scores', 'containment_edge', containment_edge);
% Merge deltafiles
deltafilesnew = deltafiles{1};
if size(deltafiles,1) > 1
for di = 2:size(deltafiles,1)
deltafilesnew = [deltafilesnew deltafiles{di}];
end
end
deltafiles = deltafilesnew;
else
assembly_pairs = {};
coordsfiles = [];
deltafiles = [];
adj = [];
names = {};
name_hash = containers.Map;
contig = struct('name',{},'size',[],'chromosome',[],'number',[], 'assembly', [], 'assembly_quality', []);
overlap = struct('Q',{},'R',[],'S1',[],'E1', [], 'S2', [], 'E2', [], 'LEN1', [], 'LEN2', [], 'IDY', [], 'COVR', [], 'COVQ', [],'LENR',[], 'LENQ',[]);
end
% Ad the pseudo nodes to the graph. If the contigs have already been
% aligned to the reference genome, just select the alignments that
% correspond to the target chromosome
if isequal(ReferenceAlignment,'NotYetDoneByMaia')
% Align all contigs in 'contig_sets_fasta' to the reference chromosome
[contig_ref, overlap_ref, name_hash_ref, deltafiles_ref] = align_contigs_sets(...
ref_genome, ref_node_assembly_locs, ref_node_assembly_names, ...
ref_node_assembly_quality, clipping_thrs, z_weights, ...
ref_distance,max_chromosome_dist);
ReferenceAlignment = 'out2.delta';
end
% Select only the entries in the deltafile for the current target chromosome
[contig_target_ref, overlap_target_ref, name_hash_target_ref, delta_target_ref] = ...
GetVariablesForTargetChromosome(...
contig_ref, overlap_ref, deltafiles_ref);
% Ref clipping should be high in case of tiling
%if isequal(max_chromosome_dist,'tiling')
% clipping_thrs = 10000
%end
% Add reference nodes to the adjency matrix
[adj, names, name_hash, contig, overlap, delta_target_ref, Startnode_name, Endnode_name] = get_reference_nodes( ...
adj, names, name_hash, contig, overlap, target_chromosome, ...
contig_target_ref, overlap_target_ref, name_hash_target_ref, delta_target_ref, ...
max_chromosome_dist, ref_distance, clipping_thrs, ref_first,...
Startnode_name, Endnode_name, AllowReverse);
% Give reference edges some small extra value to distict between
% assemblies to which a reference node leads
% adj = rank_reference_edges(adj,contig,assembly_quality);
% Specify a start and an end node for the assembly
Startnode = name_hash(Startnode_name);
Endnode = name_hash(Endnode_name);
% Find the best scoring path
fprintf('Directing the final graph\n');
% Calculate path on undirected graph to get an idea on how to direct the graph
[longest_path weigth] = longest_path_tabu(adj, Startnode, Endnode, quit_treshold, tabu_time, minimum_improvement);
% Make the graph directed (greedy)
[adj_direct contig_direct] = direct_graph(adj,overlap, contig, names, name_hash,clipping_thrs, Startnode, longest_path, true, ref_first);
% Calcultate final layout-path
fprintf('Find highest scoring path\n');
[longest_path_direct weigth_direct] = longest_path_tabu(adj_direct, Startnode, Endnode, quit_treshold, tabu_time, minimum_improvement);
function [contig_target_ref, overlap_target_ref, name_hash_target_ref, delta_target_ref] = ...
GetVariablesForTargetChromosome(...
contig_ref, overlap_ref, deltafiles_ref)
% Select only the entries in the deltafile for the current target chromosome
delta_target_ref = deltafiles_ref;
for di = size(delta_target_ref,2):-1:1
if ~isequal(delta_target_ref(di).R,target_chromosome)
delta_target_ref(di) = [];
end
end
overlap_target_ref = overlap_ref;
for oi = size(overlap_target_ref,2):-1:1
if ~isequal(overlap_target_ref(oi).R,target_chromosome)
overlap_target_ref(oi) = [];
end
end
contig_target_ref = contig_ref;
for ci = size(contig_target_ref,1):-1:1
if isequal(contig_target_ref(ci).assembly, 'reference') && ~isequal(contig_target_ref(ci).name,target_chromosome)
contig_target_ref(ci) = [];
end
end
name_hash_target_ref = make_hash({contig_target_ref.name}');
end
end

There is no exact equivalent of containers.Map in Octave that I know of...
One option is to use the java package to create java.util.Hashtable. Using this example:
pkg load java
d = javaObject("java.util.Hashtable");
d.put('a',1)
d.put('b',2)
d.put('c',3)
d.get('b')
If you are willing to do a bit of rewriting, you can use the builtin struct as a rudimentary hash table with strings (valid variable names) as keys, and pretty much anything stored in values.
For example, given the following:
keys = {'Mon','Tue','Wed'}
values = {10, 20, 30}
you could replace this:
map = containers.Map(keys,values);
map('Mon')
by:
s = struct();
for i=1:numel(keys)
s.(keys{i}) = values{i};
end
s.('Mon')
You might need to use genvarname to produce valid keys, or maybe a proper hashing function that produces valid key strings.
Also look into struct-related functions: getfield, setfield, isfield, fieldnames, rmfield, etc..

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

extract metadata from genomic gbff files - metadata

I have >1000 .gbff.gz genomic files and I want to extract metadata from each and have metadata entries in separate columns.

Related

how to dlmwrite a file from array

Creat a loop to generate a a mean value of data which I got form TXT files?

Binary file - Matrix Manipulation

incapsulation of a code inmatlab

Converting Matlab to Octave is there a containers.Map equivalent?

Categories

Resources