Convert module-qualified OID to ObjectIdentity - pysnmp

How do you programatically convert module-qualified OIDs to ObjectIdentity? I want to convert something like "IP-MIB::ipAdEntAddr.127.0.0.1.123" to an ObjectIdentity. Splitting it into ObjectIdentity('IP-MIB', 'ipAdEntAddr', '127.0.0.1.123') or into ObjectIdentity('IP-MIB', 'ipAdEntAddr', 127, 0, 0, 1, 123) doesn't work, as resolveWithMib fails with "Bad IP address syntax"

Considering ipAddrTable is indexed by just one column (ipAdEntAddr), I think this should resolve just fine:
ObjectIdentity('IP-MIB', 'ipAdEntAddr', '127.0.0.1')
API-wise, ObjectIdentity understands MIB and object names as first two parameters, the rest should be individual sub-indices. So if 123 index component would make sense, it should go separately:
ObjectIdentity('IP-MIB', 'ipAdEntAddr', '127.0.0.1', 123)

Related

MongoDB should report error when negative integer is used in dot notation?

MongoDB allows to use dot notation to do queries on JSON sub-keys or array elements (see ref1 or ref2). For instance, if a is an array in the documents the following query:
db.c.find({"a.1": "foo"})
returns all documents in which 2nd element in the a arrary is the "foo" string.
So far, so good.
What is a bit surprissing in that MongoDB accepts using negative values for the index, e.g.:
db.c.find({"a.-1": "foo"})
That doesn't return anything (makes sense if it an unsupported syntax) but what I wonder if why MongoDB doesn't return error upon this operation or if it has some sense at the end. Documentation (as far as I've checked) doesn't provide any clue.
Any information on this is welcome!
That is not an error. The BSON spec defines a key name as
Zero or more modified UTF-8 encoded characters followed by '\x00'. The (byte*) MUST NOT contain '\x00', hence it is not full UTF-8.
Since "-1" is a valid string by that definition, it is a valid key name.
Quick demo:
> db.test.find({"a.-1":{$exists:true}})
{ "_id" : 0, "a" : { "-1" : 3 } }
Playground
Also note how that spec defines array:
Array - The document for an array is a normal BSON document with integer values for the keys, starting with 0 and continuing sequentially. For example, the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1': 'blue'}. The keys must be in ascending numerical order.

PySpark list() in withColumn() only works once, then AssertionError: col should be Column

I have a DataFrame with 6 string columns named like 'Spclty1'...'Spclty6' and another 6 named like 'StartDt1'...'StartDt6'. I want to zip them and collapse into a columns that looks like this:
[[Spclty1, StartDt1]...[Spclty6, StartDt6]]
I first tried collapsing just the 'Spclty' columns into a list like this:
DF = DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6')))
This worked the first time I executed it, giving me a new column called 'Spclty' containing rows such as ['014', '124', '547', '000', '000', '000'], as expected.
Then, I added a line to my script to do the same thing on a different set of 6 string columns, named 'StartDt1'...'StartDt6':
DF = DF.withColumn('StartDt', list(DF.select('StartDt1', 'StartDt2', 'StartDt3', 'StartDt4', 'StartDt5', 'StartDt6'))))
This caused AssertionError: col should be Column.
After I ran out of things to try, I tried the original operation again (as a sanity check):
DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6'))).collect()
and got the assertion error as above.
So, it would be good to understand why it only worked the first time (only), but the main question is: what is the correct way to zip columns into a collection of dict-like elements in Spark?
.withColumn() expects a column object as second parameter and you are supplying a list.
Thanks. After reading a number of SO posts I figured out the syntax for passing a set of columns to the col parameter, using struct to create an output column that holds a list of values:
DF_tmp = DF_tmp.withColumn('specialties', array([
struct(
*(col("Spclty{}".format(i)).alias("spclty_code"),
col("StartDt{}".format(i)).alias("start_date"))
)
for i in range(1, 7)
]
))
So, the col() and *col() constructs are what I was looking for, while the array([struct(...)]) approach lets me combine the 'Spclty' and 'StartDt' entries into a list of dict-like elements.

Extracting and joining exons from multiple sequence alignments

Using my (fairly) basic coding skills, I have put together a script that will parse an aligned multi-fasta file (a multiple sequence alignment) and extract all the data between two specified columns.
use Bio::SimpleAlign;
use Bio::AlignIO;
$str = Bio::AlignIO->new(-file => $inputfilename, -format => 'fasta');
$aln = $str->next_aln();
$mini = $aln->slice($array[0], $array[1]);
$out = Bio::AlignIO->new(-file => $array[3], -format => 'fasta');
$out->write_aln($mini);
The problem I have is that I want to be able to slice multiple regions from the same alignment and then join these regions prior to writing to an outfile. The complication is that I want to supply a file with a list of co-ordinates where each line contains two or more co-ordinates between which data should be extracted and joined.
Here is an example co-ordinate file
ORF1, 10, 50, exon1 # The above line should produce a slice between columns 10-50 and write to an outfile
ORF2, 70, 140, exon1
ORF2, 190, 270, exon2
ORF2, 500, 800, exon3 # Data should be extracted between the ranges specified here and in the above two lines and then joined (side by side) to produce the outfile.
ORF3, 1200, 1210, exon1
etc etc
And here is an (small) example of an aligned fasta file
\>Sample1
ATGGCGACCGTGCACTACTCCCGCCGACCTGGGACCCCGCCGGTCACCCTCACGTCGTCC
CCCAGCATGGATGACGTTGCGACCCCCATCCCCTACCTACCCACATACGCCGAGGCCGTG
GCAGACGCGCCCCCCCCTTACAGAAGCCGCGAGAGTCTGGTGTTCTCCCCGCCTCTTTTT
CCTCACGTGGAGAATGGCACCACCCAACAGTCTTACGATTGCCTAGACTGCGCTTATGAT
GGAATCCACAGACTTCAGCTGGCTTTTCTAAGAATTCGCAAATGCTGTGTACCGGCTTTT
TTAATTCTTTTTGGTATTCTCACCCTTACTGCTGTCGTGGTCGCCATTGTTGCCGTTTTT
CCCGAGGAACCTCCCAACTCAACTACATGA
\>Sample2
ATGGCGACCGTGCACTACTCCCGCCGACCTGGGACCCCGCCGGTCACCCTCACGTCGTCC
CCCAGCATGGATGACGTTGCGACCCCCATCCCCTACCTACCCACATACGCCGAGGCCGTG
GCAGACGCGCCCCCCCCTTACAGAAGCCGCGAGAGTCTGGTGTTCTCCCCGCCTCTTTTT
CCTCACGTGGAGAATGGCACCACCCAACAGTCTTACGATTGCCTAGACTGCGCTTATGAT
GGAATCCACAGACTTCAGCTGGCTTTTCTAAGAATTCGCAAATGCTGTGTACCGGCTTTT
TTAATTCTTTTTGGTATTCTCACCCTTACTGCTGTCGTGGTCGCCATTGTTGCCGTTTTT
CCCGAGGAACCTCCCAACTCAACTACATGA
I think there should be a fairly simple way to solve this problem, potentially using the information in the first column, paired with the exon number, but I can't for the life of me figure out how this can be done.
Can anyone help me out?
The aligned fasta file you posted -- at least as it appears on the stackoverflow web page -- did not compile. According to https://en.wikipedia.org/wiki/FASTA_format, the description lines should begin with a >, not with \>.
Be sure to run all Perl programs with use strict; use warnings;. This will facilitate debugging.
You have not populated #array. Consequently, you can expect to get errors like these:
Use of uninitialized value $start in pattern match (m//) at perl-5.24.0/lib/site_perl/5.24.0/Bio/SimpleAlign.pm line 1086, <GEN0> line 16.
Use of uninitialized value $start in concatenation (.) or string at perl-5.24.0/lib/site_perl/5.24.0/Bio/SimpleAlign.pm line 1086, <GEN0> line 16.
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Slice start has to be a positive integer, not []
STACK: Error::throw
STACK: Bio::Root::Root::throw perl-5.24.0/lib/site_perl/5.24.0/Bio/Root/Root.pm:444
STACK: Bio::SimpleAlign::slice perl-5.24.0/lib/site_perl/5.24.0/Bio/SimpleAlign.pm:1086
STACK: fasta.pl:26
Once you assign plausible values, e.g.,
#array = (1,17);
... you will get more plausible results:
$ perl fasta.pl
>Sample1/1-17
ATGGCGACCGTGCACTA
>Sample2/1-17
ATGGCGACCGTGCACTA
HTH!

Using Python need to print in csv format

{u'Test1': u'Result1', u'_id': ResultId('987600234565ade'), u'bugseverity': u'major'}
{u'Test2': u'Result2', u'_id': ResultId('987600234465ade'), u'bugseverity': u'minor'}
{u'Test3': u'Result3', u'_id': ResultId('9876002399999de'), u'bugseverity': u'minor'}
The output received after running query on a mongodb is given above. Using this output I need to print values in csv format using python.
I slightly changed the input data into a form which would make more sense technically speaking.
In addition I removed the ResultId() since this seems to be a special datatype which needs to be converted into a string separately before doing any further data handling after receiving the responses from the database.
However, I would suggest doing something like this using csv.Dictwriter():
import csv
# changed sample data key `Test` in order to have this key equal in all responses
# which would make more sense technically
data = [{u'Test': u'Result1', u'_id': '987600234565ade', u'bugseverity': u'major'},
{u'Test': u'Result2', u'_id': '987600234465ade', u'bugseverity': u'minor'},
{u'Test': u'Result3', u'_id': '9876002399999de', u'bugseverity': u'minor'}]
# define the column names
fieldnames = ['Test', '_id', 'bugseverity']
with open('dict.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for d in data:
for key, value in d.items():
writer.writerow(d)
Giving dict.csv as output:
Test,_id,bugseverity
Result1,987600234565ade,major
Result1,987600234565ade,major
Result1,987600234565ade,major
Result2,987600234465ade,minor
Result2,987600234465ade,minor
Result2,987600234465ade,minor
Result3,9876002399999de,minor
Result3,9876002399999de,minor
Result3,9876002399999de,minor

Sort matches by closer to the beginning of the string

There is a table, which contains site URLs.
I want to sort Sphinx results in a way: "the closer keyword to the beginning of string - the more relevant"
"foobar.com, barfoo.com, barbarfoo.com" is correct result set for keyword "foo"
I have tried :
$s = new SphinxClient;
$s->setServer("localhost", 9312);
$s->SetMatchMode(SPH_MATCH_ALL);
$s->SetSortMode(SPH_SORT_RELEVANCE);
$s->SetFieldWeights(array(
'id' => 0,
'url' => 1000,
));
$result = $s->query("foo");
Unfortunately I get result, that sorted by id.
Hmm, dont think sphinx can do that directly. There are various ranking factors but all based on words.
Can match part words using the 'min_prefix_len', but cant get 'where in the word' the match happens, to be able to rank by it.
The only way may be able to get to work with sphinx, would be to use wordbreaker
http://sphinxsearch.com/blog/2013/01/29/a-new-tool-in-the-trunk-wordbreaker/
to index your domains names as seperate words. Hoping that your domains would be split correctly at "foo bar com", "bar foo com", "bar bar foo com" - which then could rank by word position, eg min_hit_pos
http://sphinxsearch.com/docs/current.html#field-factors