How can we identify all the modules belongs to a given distribution.
e.g. the XML::LibXML distribution provides a set of following modules
https://metacpan.org/release/XML-LibXML
How can we get this list either through cpan/ppm or through any standard per packages.
Actually we are writing a unit test framework for our code written in Perl. To verify the module, we need a way to find the distribution name for a given module name.
The MetaCPAN API provides a solution to this problem with a JSON web service (http://api.metacpan.org).
It's easy to try different queries using curl on the command line or via the web form at http://explorer.metacpan.org/
If you know the name of the release you're searching for,
you can do a query like this to get a list of module names:
/module/_search
{
"query" : { "match_all" : {} },
"size" : 1000,
"fields" : [ "module.name" ],
"filter" : {
"and": [
{ "term" : { "module.authorized" : true } },
{ "term" : { "module.indexed" : true } },
{ "term" : { "release" : "XML-LibXML-1.95" } },
{ "term" : { "status" : "latest" } }
]
}
}
You could also substitute "release": "XML-LibXML-1.95" with "distribution": "XML-LibXML".
If you are starting with a module name and need to determine the name of the release first, try this:
/module/_search
{
"query" : { "match_all" : {} },
"size" : 1000,
"fields" : [ "release", "distribution" ],
"filter" : {
"and": [
{ "term" : { "module.name" : "XML::LibXML" } },
{ "term" : { "status" : "latest" } }
]
}
}
That query syntax is the ElasticSearch DSL since the api uses ElasticSearch to index the data.
To do query from perl there is a MetaCPAN::API
module, though I have not used it myself.
Since it's just a web request you can use LWP or any other HTTP module.
You might alo want to check out the
ElasticSearch and
ElasticSearch::SearchBuilder
modules which provide a more full perl interface to querying an ElasticSearch database.
Here's a full example in perl using LWP:
use JSON qw( encode_json decode_json );
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $res = $ua->post("http://api.metacpan.org/module/_search",
Content => encode_json({
query => { match_all => {} },
size => 1000,
# limit reponse text to just the module names since that's all we want
fields => ['module.name'],
filter => {
and => [
{ term => { "module.authorized" => 1 } },
{ term => { "module.indexed" => 1 } },
{ term => { "distribution" => "XML-LibXML" } },
{ term => { "status" => "latest" } }
]
}
})
);
my #modules =
# this can be an array (ref) of module names for multiple packages in one file
map { ref $_ ? #$_ : $_ }
# the pieces we want
map { $_->{fields}{'module.name'} }
# search results
#{ decode_json($res->decoded_content)->{hits}{hits} };
print join "\n", sort #modules;
For more help visit #metacpan on irc.perl.org,
or check out the wiki at https://github.com/CPAN-API/cpan-api/wiki.
If you explain a little more what you are doing and/or trying to achive you might find other ways to do it.
Related
I am created a mongodb and in the I am filling my client email addresses and there related accounts. But I have found that some values listed as email are not email at all. See the below example.
{
"_id" : ObjectId("591d9cf30ef9acde11d7af6b"),
"email" : "w#Yahoo.com",
"src" : [
{
"acc" : "yahoo",
"name" : "matter"
}
]
}
{
"_id" : ObjectId("591daa540ef9acde11d7af6c"),
"email" : "122",
"src" : [
{
"acc" : "ldd"
}
]
}
I want to check if the key email has the correct value of email or not. If not then I would like to remove the document and make my mongo clean.
How I can achieve that?
Use the remove command using a regex with the $not operator
db.getCollection('somecollection').remove( { email: { $not: /#/ } } )
I'm not 100% sure the regex will work correctly with the # like this. but I would recommend to always test by using find in stead of remove first.
db.getCollection('somecollection').find( { email: { $not: /#/ } } )
Well the answer is probably no but I am curious to ask.
I have a Document which has two level of arrays in it:
{ '_id : '...' , events : [ urls : [], social_id : [] ], 'other_data' : '...' }
The code below works. What is does is update on a specific event the url array and adds to that set the event['event_url'] value (python).
db.col.update(
{ 'f_id' : venue['id'],
"events.title" : find_dict["events.title"] },
{ '$addToSet': { 'events.$.urls': event['event_url']} }
)
However in the same event I want to add a social id if not exists.
db.col.update(
{ 'f_id' : venue['id'],
"events.title" : find_dict["events.title"] },
{ '$addToSet': { 'events.$.social_id': event['social_id']} }
)
I was wandering if it's possible to merge the above commands into one and not run the update twice. I have not found anything in the documentation but I guess it's worth asking.
You can combine the two updates into a single operation by including both fields in the $addToSet object:
db.col.update(
{ 'f_id': venue['id'], "events.title": find_dict["events.title"] },
{ '$addToSet': {
'events.$.urls.': event['event_url'],
'events.$.social_id.': event['social_id']
}}
)
This is the first of 7 test/example documents, in collection "SoManySins."
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"Treats" : "Sin1 = Gluttony",
"Sin1" : "Gluttony",
"Favourited" : "YES",
"RecentActivity" : "YES",
"GoAgain?" : "YeaSure."
}
I would like to be able to query to retrieve any info in any position,
just by referring to the position. The following document,
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"Sin1" : "Gluttony",
"?????????" : "??????",
"RecentActivity" : "YES",
"GoAgain?" : "YeaSure."
}
One could retrieve whatever might be in the 3rd key~value
pair. Why should one have to know ahead of time what the
data is, in the key? If one has the same structure for the
collection, who needs to know? This way, you can get
double the efficiency? Like having a whole lot of mailboxes,
and your app's users supply the key and the value; your app
just queries the dbs' documents' arrays' positions.
Clara? finally? I hope?
The sample document you've provided is not saved as an array in BSON:
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"Sin1" : "Gluttony",
"?????????" : "??????",
"RecentActivity" : "YES",
"GoAgain?" : "YeaSure."
}
Depending on the MongoDB driver you are using, the fields here are typically represented in your application code as an associative array or hash. These data structures are not order-preserving so you cannot assume that the 3rd field in a given document will correspond to the same field in another document (or even that the same field ordering will be consistent on multiple fetches). You need to reference the field by name.
If you instead use an array for your fields, you can refer by position or select a subset of the array using the $slice projection.
Example document with an array of fields:
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"fields": [
{ "Sin1" : "Gluttony" },
{ "?????????" : "??????" },
{ "RecentActivity" : "YES" },
{ "GoAgain?" : "YeaSure." }
]
}
.. and query to find the second element of the fields array (a $slice with skip 1, limit 1):
db.SoManySins.find({}, { fields: { $slice: [1,1]} })
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"fields" : [
{
"?????????" : "??????"
}
]
}
This is one way to Query and get back data when you may not
know what the data is, but you know the structure of the data:
examples in Mongo Shell, and in PHP
// the basics, setup:
$dbhost = 'localhost'; $dbname = 'test';
$m = new Mongo("mongodb://$dbhost");
$db = $m->$dbname;
$CursorFerWrites = $db->NEWthang;
// defining a set of data, creating a document with PHP:
$TheFieldGenerator = array( 'FieldxExp' => array(
array('Doc1 K1'=>'Val A1','Doc1 K2'=>'ValA2','Doc1 K3'=>'Val A3'),
array('Doc2 K1'=>'V1','Doc2 K2'=>'V2','Doc2 K3'=>'V3' ) ) ) ;
// then write it to MongoDB:
$CursorFerWrites->save($TheFieldGenerator);
NOTE : In the Shell : This produces the same Document:
> db.NEWthang.insert({"FieldxExp" : [
{"Doc1 K1":"Val A1","Doc1 K2":"Val A2","Doc1 K3":"Val A3"},
{"Doc2 K1":"V1", "Doc2 K2":"V2","Doc2 K3":"V3"}
]
})
#
Now, some mongodb Shell syntax:
> db.NEWthang.find().pretty()
{
"_id" : ObjectId("516c4053baa133464d36e836"),
"FieldxExp" : [
{
"Doc1 K1" : "Val A1",
"Doc1 K2" : "Val A2",
"Doc1 K3" : "Val A3"
},
{
"Doc2 K1" : "V1",
"Doc2 K2" : "V2",
"Doc2 K3" : "V3"
}
]
}
> db.NEWthang.find({}, { "FieldxExp" : { $slice: [1,1]} } ).pretty()
{
"_id" : ObjectId("516c4053baa133464d36e836"),
"FieldxExp" : [
{
"Doc2 K1" : "V1",
"Doc2 K2" : "V2",
"Doc2 K3" : "V3"
}
]
}
> db.NEWthang.find({}, { "FieldxExp" : { $slice: [0,1]} } ).pretty()
{
"_id" : ObjectId("516c4053baa133464d36e836"),
"FieldxExp" : [
{
"Doc1 K1" : "Val A1",
"Doc1 K2" : "Val A2",
"Doc1 K3" : "Val A3"
}
]
}
Finally, how about write the Query in some PHP ::
// these will be for building the MongoCursor:
$myEmptyArray = array();
$TheProjectionCriteria = array('FieldxExp'=> array('$slice' => array(1,1)));
// which gets set up here:
$CursorNEWthang1 = new MongoCollection($db, 'NEWthang');
// and now ready to make the Query/read:
$ReadomgomgPls=$CursorNEWthang1->find($myEmptyArray,$TheProjectionCriteria);
and the second document will be printed out:
foreach ($ReadomgomgPls as $somekey=>$AxMongoDBxDocFromCollection) {
var_dump($AxMongoDBxDocFromCollection);echo '<br />';
}
Hope this is helpful for a few folks.
Right now, we are using mongodb 1.2.2 to create a database and store values. Our data types look like this:
"file" : "1" , "tools": { "foo": { "status": "pending"} }
"file" : "2" , "tools": { "bar": { "status": "pending" } }
"file" : "3" , "tools": { "foo": { "status": "running" } }
"file" : "4" , "tools": { "bar": { "status": "done" } }
"file" : "5" , "tools": { "foo": { "status": "done" } }
We want to query for every single one that has { "status" : "pending" }. We do not want to use {"tools.foo.status" : "pending"} because we will have many different variations other than foo and bar. To make it more clear we want to do something like this {"tools.*.status" : "pending"}
No, you can't do that. I'm afraid you'll have to maintain your own index for this. That is, for every insert/update to the files collection, do an upsert to the file_status_index collection to update current status.
Querying is also a two-step process: first query the index collection to get the ids, and then issue $in query to the files collection to get actual data.
This may sound scary, but that's a price you have to pay with this schema.
Firstly, you should upgrade your MongoDB. 1.2.2 is really an old version.
Secondly, you cannot do query you ask. You can do this with the Map/Reduce.
I think it's time to ask why you're storing things the way you are.
There is no efficient way to search this kind of structure; since there is no known keys-only path to get to the value you're filtering on, every single record needs to be expanded every single time, and that's very expensive, especially once your collection no longer fits in RAM.
IMO, you'd be better off with a secondary collection to hold these statuses. Yes, it makes your datastore more relational, but that's because your data is relational.
file_tools:
{ 'file_id' : 1, 'name' : 'foo', 'status' : 'pending' }
{ 'file_id' : 2, 'name' : 'bar', 'status' : 'pending' }
{ 'file_id' : 3, 'name' : 'foo', 'status' : 'running' }
{ 'file_id' : 4, 'name' : 'foo', 'status' : 'done' }
{ 'file_id' : 5, 'name' : 'foo', 'status' : 'done' }
files:
{ 'id': 1 }
{ 'id': 2 }
{ 'id': 3 }
{ 'id': 4 }
{ 'id': 5 }
> // find out which files have pending tools
> files_with_pending_tools = file_tools.find( { 'status' : 'pending' }, { 'file_id' : 1 } )
> //=> [ { 'file_id' : 1 }, { 'file_id' : 2 } ]
>
> // just get the ids
> file_ids_with_pending_tools = files_with_pending_tools.map( function( file_tool ){
> file_tool['file_id']
> })
> //=> [1,2]
>
> // query the files
> files.find({'id': { $in : file_ids_with_pending_tools }})
> //=> [ { 'id' : 1 }, { 'id' : 2 } ]
I have a weird problem with MongoDB (2.0.2) map reduce.
So, the story goes like this:
I have Ad model (look for model source extract below) and I need to group up to n ads per category in order to have a nice ordered listing I can later use to do more interesting things.
# encoding: utf-8
class Ad
include Mongoid::Document
cache
include Mongoid::Timestamps
field :title
field :slug, :unique => true
def self.aggregate_latest_active_per_category
map = "function () {
emit( this.category, { id: this._id });
}"
reduce = "function ( key, value ) {
return { ads:v };
}"
self.collection.map_reduce(map, reduce, { :out => "categories"} )
end
All fun and games up until now.
What I expect is to get a result in a form which resembles (mongo shell for db.categories.findOne() ):
{
"_id" : "category_name",
"value" : {
"ads" : [
{
"id" : ObjectId("4f2970e9e815f825a30014ab")
},
{
"id" : ObjectId("4f2970e9e815f825a30014b0")
},
{
"id" : ObjectId("4f2970e9e815f825a30014b6")
},
{
"id" : ObjectId("4f2970e9e815f825a30014b8")
},
{
"id" : ObjectId("4f2970e9e815f825a30014bd")
},
{
"id" : ObjectId("4f2970e9e815f825a30014c1")
},
{
"id" : ObjectId("4f2970e9e815f825a30014ca")
},
// ... and it goes on and on
]
}
}
Actually, it would be even better if I could get value to contain only array but MongoDB complains about not supporting that yet, but, with later use of finalize function, that is not a big problem I want to ask about.
Now, back to problem. What actually happens when I do map reduce is that it spits out something like :
{
"_id" : "category_name",
"value" : {
"ads" : [
{
"ads" : [
{
"ads" : [
{
"ads" : [
{
"ads" : [
{
"id" : ObjectId("4f2970d8e815f825a3000011")
},
{
"id" : ObjectId("4f2970d8e815f825a3000017")
},
{
"id" : ObjectId("4f2970d8e815f825a3000019")
},
{
"id" : ObjectId("4f2970d8e815f825a3000022")
},
// ... on and on and on
... and while I could probably work out a way to use this it just doesn't look like something I should get.
So, my questions (finally) are:
Am I doing something wrong and what is it?
I there something wrong with MongoDB map reduce (I mean besides all the usual things when compared to hadoop)?
Yes, you're doing it wrong. Inputs and outputs of map and reduce should be uniform. Because they are meant to be executed in parallel, and reduce might be run over partially reduced results. Try these functions:
var map = function() {
emit(this.category, {ads: [this._id]});
};
var reduce = function(key, values) {
var result = {ads: []};
values.forEach(function(v) {
v.ads.forEach(function(a) {
result.ads.push(a)
});
});
return result;
}
This should produce documents like:
{_id: category, value: {ads: [ObjectId("4f2970d8e815f825a3000011"),
ObjectId("4f2970d8e815f825a3000019"),
...]}}