Perl: Quickly find object in list of objects - looking for a fitting datastructure - perl

I am developing a program to sync users between to different LDAP Servers. I have two types of user groups: Master-Groups and Target-Groups (those are predefined in a config-file. There can be multiple Master and Targets per Group definition).
Users in Master-Groups missing in the Target-Groups shall be added to the Targets, Users in Target-Groups missing in Master-Groups shall be removed from the Targets.
The Users in those Groups are Objects themselves. My problem is as follows:
I loop through my availiable master groups and have to perform a quick lookup wheter a user is already part of a target-group. I am struggeling to pick the right datastructure to solve this problem. I tried using a hash, but quickly realized that hash-keys are stringyfied, so I cannot perform
if ( exists( $master_members->{$target_user_object} ) )
When using an array for storing the objects, everytime I have to check if a user object exists, I have to loop through the whole array which essentially kills performance.
How do I perfom a lookup if a specific object exists in a list of objects?
Kind Regards,
Yulivee

You're right that hash keys are stringified. You cannot use objects as keys. But a hash is the right data structure.
Instead of just letting Perl stringify your references, build your own serializer. That could be as simple as using the cn. Or a concatenation of all the fields of the object. Make a sub, put that in there, call that sub within your exist.
... if exists $master_members->{ my_serializer($target_user_object) };

Related

MongoDB list documents not in another list

I have a document with many objects, and several aggregate processors that run on them.
Lets say the name of the objects document is Objects.
For one processor, I created another document called ProcessedObjects in which each instance is an object that contains one field "processedObjectPtr" which is a link to an object.
I would like to run the following basic loop :
for all objects that haven't been processed yet:
1 - process object
2 - add object to processed object list
The part which I don't know how to do in MongoDB is to get the list of objects that haven't been processed yet. Theoretically I could mark the object itself as processed by adding another field to it, but when I will have many processors that will get ugly, which is why I prefer to keep the 'processed object list' in a separate document.
Is there an elegant way to do this, or will I have to add the processed metadata to the actual objects?
I am using mongoengine, but any answer will do.
Thanks!
I ended up adding a 'processedFlags' list to my objects, so each processor could mark (via a flag) that it already processed this object, and also query which objects do not have the flag.

Traversing an object model recursively to search for a value

I need to recursively traverse a very large and complicated object model to search for a particular value of an ID.
The value I'm looking for is in a property called "ID", but objects with a particular ID might have many children, some of which are arrays, each having a different ID, and each of those children in turn can have a different ID and so on and so forth.
So if I give you an object, say $web, and you know that deep down in it's object model there is a value of an object that you are looking for. How do you look for it using recursion and reflection?
Note: This is a generic powershell/recursion/programming question even though the topic is SharePoint.
How about using Format-Custom? For example, getting a lot of nested member data from a directory info is done like so,
(gci)[0] | fc > test.txt
Will give some 8800 lines of data for expanded members.

Comparing the attribute contents of two managed objects?

I am setting a managedObject up from data I am getting off the web, before I add this new object to the managedObjectContext I want to check if its all ready in the database. Is there a way to compare two managed objects in one hit, or do I have to compare each attribute individually to work out if they are identical or one contains a difference?
Simple Example:
Entity:Pet (Created but not inserted into database)
Attribute, Name: Brian
Attribute, Type: Cat
Attribute, Age: 12
Entity:Pet (Currently in database)
Attribute, Name: Brian
Attribute, Type: Cat
Attribute, Age: 7
In this example can I compare [Brian, Cat, 12] with [Brian, Cat, 7] or do I need to go through each attribute one by one to ascertain a full match?
Unique identifiers are often used to search for objects by only having to match the one field. As you note, matching on multiple fields could be annoying and inefficient, but it's perhaps not as bad as you think: you can construct an NSPredicate to quite easily match all the required fields on objects in Core Data.
Use of NSPredicate aside: suppose you just want to match one field. If you don't have a suitable unique identifier in the data as provided, you could derive one. The obvious way is to construct a hash code for everything you store, based on each field you want to match on. Then when you wish to check if an 'incoming' object is already in core data, compute the hash code for the new object, then just look for an object in core data with that same hash code. (Note: if you find an object that already exists with the same hash code, you might want to then compare all the fields to check that it really does represent the same object -- there's a tiny chance it might be a 'different' object, A.K.A. a hash collision).
A very naive hash code implementation for an object X would be something like:
hashcode(X) = hashcode(X.name) + hashcode(X.type) + hashcode(X.age)
To see a more realistic example of writing a hashcode function, see the accepted answer here.
By the way, I'm assuming that you don't want to load all your objects from core data into memory at once. If however that is acceptable (suppose you have quite a limited amount of items), an alternative is to implement isEqual and hash on your class, and then use regular foundation class methods like NSArray indexOfObject: (or, even better, NSDictionary objectForKey:) to locate objects of interest.

Storing the order of embedded documents in a separated array

I have a set of objects that the user can sort arbitrarily. I would like to make my client remember the sorting of the set of objects so that when the user visits the page again the ordering he/she chose will be preserved. However, the client-side framework should also be able to quickly lookup the objects from whatever array/hashmap they are stored in based upon the ordering. What is the most efficient way of doing this?
The best way I have found for doing this is using an array that stores the IDs of the array in the particular order I wanted. From there, I can access the array of objects I wanted by converting the array to a hashmap using Underscore.js.

Meaning of Open hashing and Closed hashing

Open Hashing (Separate Chaining):
In open hashing, keys are stored in linked lists attached to cells of a hash table.
Closed Hashing (Open Addressing):
In closed hashing, all keys are stored in the hash table itself without the use of linked lists.
I am unable to understand why they are called open, closed and Separate. Can some one explain it?
The use of "closed" vs. "open" reflects whether or not we are locked in to using a certain position or data structure (this is an extremely vague description, but hopefully the rest helps).
For instance, the "open" in "open addressing" tells us the index (aka. address) at which an object will be stored in the hash table is not completely determined by its hash code. Instead, the index may vary depending on what's already in the hash table.
The "closed" in "closed hashing" refers to the fact that we never leave the hash table; every object is stored directly at an index in the hash table's internal array. Note that this is only possible by using some sort of open addressing strategy. This explains why "closed hashing" and "open addressing" are synonyms.
Contrast this with open hashing - in this strategy, none of the objects are actually stored in the hash table's array; instead once an object is hashed, it is stored in a list which is separate from the hash table's internal array. "open" refers to the freedom we get by leaving the hash table, and using a separate list. By the way, "separate list" hints at why open hashing is also known as "separate chaining".
In short, "closed" always refers to some sort of strict guarantee, like when we guarantee that objects are always stored directly within the hash table (closed hashing). Then, the opposite of "closed" is "open", so if you don't have such guarantees, the strategy is considered "open".
You have an array that is the "hash table".
In Open Hashing each cell in the array points to a list containg the collisions. The hashing has produced the same index for all items in the linked list.
In Closed Hashing you use only one array for everything. You store the collisions in the same array. The trick is to use some smart way to jump from collision to collision until you find what you want. And do this in a reproducible / deterministic way.
The name open addressing refers to the fact that the location ("address") of the element is not determined by its hash value. (This method is also called closed hashing).
In separate chaining, each bucket is independent, and has some sort of ADT (list, binary search trees, etc) of entries with the same index.
In a good hash table, each bucket has zero or one entries, because we need operations of order O(1) for insert, search, etc.
This is a example of separate chaining using C++ with a simple hash function using mod operator (clearly, a bad hash function)