I would like to know is there a way to compare two doc bases of Documentum to ensure that they are same. For example, to compare the child-parent relationship for an object in 2 doc bases.
Any pointers towards this is much appreciated.
Thanks in advance.
You could write a query that gets all of the documents and folders (using a join), and order/compare the folder paths. That would at least give you a way to determine if all the documents and folder exist.
However, unless you actually did a backup and load from database/content store to the other, the objects will all have different r_object_id values because it is based on the unique repository id.
You can start running the Documentum job "dm_StateOfDocbase" and compare the results.
If you need more specific informations about a relation, you can query the object dm_relation_type.
Related
I'm currently learning a lot about the MEAN stack and obviously MongoDB. I want to set my database up so that nothing is ever 'removed', things are only marked as deleted or moved somewhere else, like an archived collection/database. What's the industry standard way of doing this?
The way i see it is that I have two options, both raising more questions:
Marking documents as deleted with a deleted key.
Would I store this as a timestamp with an accompanying array of timestamps? The array is needed as I'm wanting to also create a 'restoring' functionality, in-turn meaning it can be deleted more than once which I want to track. This will mean that I have to update a lot of my queries to ignore that key.
Move the documents to another collection or database.
This would require the most work as I'd need to handle any other functionality that references that document. For example deleting a user from a cinema database, would this mean that I have to archive previous bookings as well or just update queries to also search in the archive?
I couldn't find any useful resources on this but if you guys know of any then please point me in that direction :) thanks.
Thanks hector! his answer:
"Actually, there is not a "standard" way to do this. Each company does it by its own way. For your first option, you don't need to store a timestamp array but just a flag indicating that document is "deleted". Then, in another collection you can store the events. For instance: {event: "deleted", date: "03/08/2017 08:00:00", documentId: "7726"} An event store is the way to go"
I have a filesystem like structure, where there are a bunch of folders that each contain the Object Id of their parent folder. Given a specific folder, I want to return the path to this folder; the only way I can think of to do this is to traverse up the tree of linked Object Ids until I get to the root.
Doing this using just db.find, I would have to send out a number of queries equivalent to the depth of the folder. This doesn't scale however, so I am wondering of there is a way that mongo can chain the connected IDs for me, within a single request. Is this possible? Or is there at least a better way?
There is not really a way to do this in a single MongoDB query like you asked for a first.
You could denormalize the path into each object so it doesn't have to walk all the parents. But any changes would require an update to each and every child.
Basically each folder has...
{
...,
name: "my_folder",
path: "\path\to\folder",
children: ...,
}
And when you update a folder you need to walk the children of the folder and update them by taking parent.path, adding parent.name, and slapping that into the child's path property. Update away down the chain. Updates are more expensive, but reads a whole lot cheaper.
MongoDB documentation has some examples of patterns to model Tree Structures.
Check if one of them fits your requirements. With the information you have provided I would say that an Array of Ancestors could be a good option.
I believe what you'd like to do is pretty similar to using a tree model.
I understand you data currently don't have the ancestors field, but that would be one of the best approaches to the problem:
folder
{
_id: 100,
folder_name: "foo",
ancestors: [80, 23, 1]
}
Please check this video from MongoDB University(by 10gen) for further explanation:
https://www.youtube.com/watch?feature=player_embedded&v=lIjXyQklGWY
I've got a question for you couchbase pros: Is it possible to synchronize a subset of documents (eg. the documents within a view) with an other bucket?
So that the other bucket documents are always a direct subset of the "master" bucket?
if so, isn't that to much expensive in terms of perfomance? or does couchbase have any functionality to only create deeplinks to the documents instead of copying it?
Alternatively: is it possible to write views on views?
Thank you in advance!
--- EDIT ----
Let's say I want to have two sets (buckets) of documents S1 and S2. S2 is a subset of S1. Each set contains the same views V1, V2 and V3 since I want to be able to query any of them with the same logic/interface. In my case set S2 is build per user/company/store/whatever, in production there should be like 1000ish subsets S2 - to stay abstract let's call them S2a S2b and S2c.
The selection of documents which to be contained in any subset is done by a filtering instance (for example a view). Let's call these filtering instances F1 for filtering S1 to S2 hence F1a, F1b and F1c.
So with my actual knowledge of couchbase this results in the following design/view architecture: I've got the three "base" views to display V1,V2 and V3, and to realize S2a, S2b and S2c I must create the design views S2aV1, S2aV2, S2aV3, S2bV1, S2bV2, etc. (9 Views).
One could say "Well choose your keys wisely and you can avoid the sub views" but in my opinion this isn't that easy because of the following circumstances: In worst case the filter parameters change every minute and contain many WHERE IN constraints which could (at my actual point of view) not be handled efficiently querying k/v lists.
This leads to the following thoughts and the question I initially asked. If I use the same views in any subset (defined by a filter) shouldn't it be possible to build up an entity which helps me handling complex filtering? For example a function which is called during runtime while generating the view output? This could look like /design/view?filter=F1 or something like that.
Or do you have any other ideas to solve this problem? Or should I use SQL since it's more capable of handling frequently changing filters?
Generally speaking for most models you don't really need to have bucket "subsets", is there a particular reason you are trying to do this and why you would want that data broken out? You can also query your views, or instead of a view on a view, you can just make a separate view that maps/filters further based on your needs (i.e. does the same job as a view on a view).
We are working on Elastic Search integration. Maybe better for your use case
I think what you want to do is write a view on your original bucket, and then copy the key/values from that view, to be documents in a new bucket.
It shouldn't be hard to write an automated framework for managing this so that you can keep the derived data up to date in near real time.
I'm trying to clean up a database by first finding unreferenced objects. I have extracted all the database objects into a list, and all the ddl code into files, I also have all the Java source code for the project.
Basically what I want to do (preferably in Perl as it's the scripting language that I'm most familiar with) is to somehow index the contents of all the extracted database ddl and Java files (to speed up the search), step through the database object list and then search through all the files (using the index) to see if those objects are referenced anywhere and create a report.
If you could point me in the right direction to find something that indexes all those files in a way that I can search them (preferably in Perl) I would greatly appreciate it.
The key here is to be able to do this programatically, not manually (using something like Google desktop search).
Break the task down into its steps and start at the beginning. First, what does a record look like, and what information in it connects it to another record? Parse that record, store its unique identifier and a list of the things it references.
Once you have that list, invert it. For each reference, create a list of the objects referenced. Count them by their identifier. You should be able to get the ones whose count is zero.
That's a very general answer, but you asked a very general question. If you are having trouble, break it down into just one of those steps and ask a more specific question, supplying sample data and the code you've tried so far.
Good luck,
An interesting module you might use to do what you want is KinoSearch, it provides you the kind of indexing you said to be looking for. Then you can go through the object identifiers and check if there are references to it.
What's the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don't have access to mongodb's new "distinct" command, either, since my driver, erlmongo, doesn't seem to implement it, yet.
Even if your driver doesn't implement distinct, you can implement it yourself. In JavaScript (sorry, I don't know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({"distinct" : "collection_name", "key" : "tags"})
So, that is: you do a findOne on the "$cmd" collection of whatever database you're using. Pass it the collection name and the key you want to run distinct on.
If you ever need a command your driver doesn't provide a helper for, you can look at http://www.mongodb.org/display/DOCS/List+of+Database+Commands for a somewhat complete list of database commands.
I know this is an old question, but I had the same issue and could not find a real solution in PHP for it.
So I came up with this:
http://snipplr.com/view/59334/list-of-keys-used-in-mongodb-collection/
John, you may find it useful to use Variety, an open source tool for analyzing a collection's schema: https://github.com/jamescropcho/variety
Perhaps you could run Variety every N hours in the background, and query the newly-created varietyResults database to retrieve a listing of unique keys which begin with a given string (i.e. are descendants of a specific parent).
Let me know if you have any questions, or need additional advice.
Good luck!