I have a document with many objects, and several aggregate processors that run on them.
Lets say the name of the objects document is Objects.
For one processor, I created another document called ProcessedObjects in which each instance is an object that contains one field "processedObjectPtr" which is a link to an object.
I would like to run the following basic loop :
for all objects that haven't been processed yet:
1 - process object
2 - add object to processed object list
The part which I don't know how to do in MongoDB is to get the list of objects that haven't been processed yet. Theoretically I could mark the object itself as processed by adding another field to it, but when I will have many processors that will get ugly, which is why I prefer to keep the 'processed object list' in a separate document.
Is there an elegant way to do this, or will I have to add the processed metadata to the actual objects?
I am using mongoengine, but any answer will do.
Thanks!
I ended up adding a 'processedFlags' list to my objects, so each processor could mark (via a flag) that it already processed this object, and also query which objects do not have the flag.
Related
I have an application that uses the DocumentFormat.OpenXml API to a Word document from one or more originating documents, inserting and deleting chunks of data as it goes. In other words, the resulting document will be significantly different from the constituent documents.
I have already successfully created things like Custom Document Properties, Document Variables and Core File Properties.
But: is it possible to get the other metadata items (number of pages, words, paragraphs, etc.) refreshed, without actually having to calculate these?
Thank you to #Cindy Meister for the answer.
I was hoping that there might be some method or other in the DocumentFormat.OpenXML SDK that I could call, but it seems that is not possible.
I am developing a program to sync users between to different LDAP Servers. I have two types of user groups: Master-Groups and Target-Groups (those are predefined in a config-file. There can be multiple Master and Targets per Group definition).
Users in Master-Groups missing in the Target-Groups shall be added to the Targets, Users in Target-Groups missing in Master-Groups shall be removed from the Targets.
The Users in those Groups are Objects themselves. My problem is as follows:
I loop through my availiable master groups and have to perform a quick lookup wheter a user is already part of a target-group. I am struggeling to pick the right datastructure to solve this problem. I tried using a hash, but quickly realized that hash-keys are stringyfied, so I cannot perform
if ( exists( $master_members->{$target_user_object} ) )
When using an array for storing the objects, everytime I have to check if a user object exists, I have to loop through the whole array which essentially kills performance.
How do I perfom a lookup if a specific object exists in a list of objects?
Kind Regards,
Yulivee
You're right that hash keys are stringified. You cannot use objects as keys. But a hash is the right data structure.
Instead of just letting Perl stringify your references, build your own serializer. That could be as simple as using the cn. Or a concatenation of all the fields of the object. Make a sub, put that in there, call that sub within your exist.
... if exists $master_members->{ my_serializer($target_user_object) };
I need to recursively traverse a very large and complicated object model to search for a particular value of an ID.
The value I'm looking for is in a property called "ID", but objects with a particular ID might have many children, some of which are arrays, each having a different ID, and each of those children in turn can have a different ID and so on and so forth.
So if I give you an object, say $web, and you know that deep down in it's object model there is a value of an object that you are looking for. How do you look for it using recursion and reflection?
Note: This is a generic powershell/recursion/programming question even though the topic is SharePoint.
How about using Format-Custom? For example, getting a lot of nested member data from a directory info is done like so,
(gci)[0] | fc > test.txt
Will give some 8800 lines of data for expanded members.
I took a shortcut earlier and made the primary key of my Mongo database by concatenating various fields to create a "unique id"
I would now like to change it to actually use the ObjectId. What's the best approach to do it? I have a little over 3M documents and would like this to be as least disruptive as possible.
A simple approach would be to bring down the site for a bit and then copy every document from one to the other one which is using ObjectIds but I'd like to keep the application running if I can. I imagine one way would be to write to both for a period of time while the migration happens but that would require me having two similar code bases so I wonder if there's a way to avoid all that.
To provide some additional information:
It's just one collection that's not referenced by any others. I have another MySQL database that contains some values that are used to create the queries that read from this MongoDB collection.
I'm using PyMongo/Mongoengine libraries to interact with MongoDB from Python and I don't know if it's possible to just change the primary key for a collection.
You shouldn't bring your site down for any reason if it does not go down itself. :)
No matter how many millions of records you have, the solution to the problem resides on how you use your ids.
If you cross-reference documents in different collections using these ids, then for every updated object, you will update all other objects that references this one.
As first step, your system should be updated to stop creating new objects in the old way. If your system lets you easily do this, then you can update your database very easily. If this change is not easy to make, then your system has some architectural problems and you should first change this. If this is the situation, please update your question so I can update my answer.
Since I don't know anything about your applications and data, what I say will be too general. Let's call the collection you want to update coll_bad_id. Every item in this collection is referenced in other collections like coll_poor_guy and coll_wisdom_searcher. How I would do this is to run over coll_bad_id one item at a time like this:
1. read one item
2. update _id with new style of _id
3. insert item back to collection
-- now we have two copies of the same item one with old-style id, one with new
4. update each item referencing this to use new style id
5. remove the duplicate item with old-style id from collection
One thing you should keep in mind that, bson ObjectId's keep date/time data that can be very useful. Since you rebuild all these objects on one day, your ObjectId's will not reflect correct creation times for these items. For newly added items, they will. You can note the first newly added item as the milestone of items with ids with correct-creation times.
UPDATE: Code sample to run on Mongo shell.
This is not the most efficient way to do this; but it is safe to run since we do not remove anything before adding them back with a new _id. Better can be doing this in small amounts by adding queries to find() call.
var cursor = db.testcoll.find()
cursor.forEach(function(item) {
var oldid= item._id; // we save old _id to use for removal below.
delete item._id; // When we add an item without _id, Mongo creates a unique _id.
db.testcoll.insert(item); // We add item without _id.
db.testcoll.remove(oldid); // We delete the item with bad _id.
});
I have a Student collection and a Person Collection.
Person contains the fields: name, address, etc
Student contains: rollno, and a person field that stores the person._id for this student
Now I want to show the name of student in the student template, but note that there's no name field in Student, I'll need to get that from that student's Person document.
Is there a way to get a mongodb cursor on the client that has the student information as well as selective field from that student's person document?
Also, is there a better or more standard way of achieving what I'm trying to achieve?
Note: I don't want to use redundancy and store the name field on the Student document, so that's not a solution
is there a better or more standard way of achieving what I'm trying to
achieve?
It sounds like you are trying to read all information about a student in one read - the only way to do that is to have all that information in a single document.
Flexible schema of document databases allow you to have documents in a single collection which are not required to have the same schema, aka number of fields.
So I would recommend that you consider why you actually need separate collections for person and student - this causes writes to two collections when you add a student (and while a single write is atomic, two writes are not) and it also causes the issue you have now where you need to have two separate reads to get all information about a student.
This SO question is somewhat related to your situation.
See the accepted answer in this thread:
Possible bug when observing a cursor, when deleting from collection
It involves using a modified version of the built-in _publishCursor titled publishModifiedCursor, which allows you to specify a callback to add properties to each document in the cursor you are publishing.
I would change your code to have a role / job attribute in the Person object.
It's semantic, at least for me, and think about the difficulty level in someone changing jobs in your original method vs simply changing the role.
Then you could just search
Persons.find {role: 'student'}
And that would just be totally analogous with having a student object.
As Asya said, the students can just have extra fields the other ones done have.