How can I index a bunch of files in Perl? - perl

I'm trying to clean up a database by first finding unreferenced objects. I have extracted all the database objects into a list, and all the ddl code into files, I also have all the Java source code for the project.
Basically what I want to do (preferably in Perl as it's the scripting language that I'm most familiar with) is to somehow index the contents of all the extracted database ddl and Java files (to speed up the search), step through the database object list and then search through all the files (using the index) to see if those objects are referenced anywhere and create a report.
If you could point me in the right direction to find something that indexes all those files in a way that I can search them (preferably in Perl) I would greatly appreciate it.
The key here is to be able to do this programatically, not manually (using something like Google desktop search).

Break the task down into its steps and start at the beginning. First, what does a record look like, and what information in it connects it to another record? Parse that record, store its unique identifier and a list of the things it references.
Once you have that list, invert it. For each reference, create a list of the objects referenced. Count them by their identifier. You should be able to get the ones whose count is zero.
That's a very general answer, but you asked a very general question. If you are having trouble, break it down into just one of those steps and ask a more specific question, supplying sample data and the code you've tried so far.
Good luck,

An interesting module you might use to do what you want is KinoSearch, it provides you the kind of indexing you said to be looking for. Then you can go through the object identifiers and check if there are references to it.

Related

how do I duplicate a project in Anylogic

I have a very simple question. Normally in other programs, such as word, you can just simply save the document under a different resulting in two separate documents. However, this doesn't work for AnyLogic. Does anyone know how to duplicate a project?
If you do file save as it will create a new alp file for you
But for it to be a truly different model you need to change the Java Package to something unique... See how it is kept as model24 in my screenshot
But be careful it can have some unwanted consequences in a very complex model and you will need to fix these manually, but all doable

How to create and search in a language dictionary

I'm working on an Xcode project.
I need to add a dictionary of words, like English dictionary for example. Then, when required, I need to find some words in it.
I have been thinking about creating an array from a file to store all the words, then sort it in some way I don't know and at last do a binary search or something like this to find the word I'm looking for.
Do you think is a good idea?
Do you have any clue to sort the array?
Is binary search a good idea?
The best way would be to use a database (e.g. SQL-based), or CoreData to store it. That is unless you require the words to be entirely in the memory, which would be the case if you often listed all of them. But even then it can be solved by lazy-loading.

In what scenarios would I need to use the CREATEREF, DEREF and REF keywords?

This question is about why I would use the above keywords. I've found plenty of MSDN pages that explain how. I'm looking for the why.
What query would I be trying to write that means I need them? I ask because the examples I have found appear to be achievable in other ways...
To try and figure it out myself, I created a very simple entity model using the Employee and EmployeePayHistory tables from the AdventureWorks database.
One example I saw online demonstrated something similar to the following Entity SQL:
SELECT VALUE
DEREF(CREATEREF(AdventureWorksEntities3.Employee, row(h.EmployeeID))).HireDate
FROM
AdventureWorksEntities3.EmployeePayHistory as h
This seems to pull back the HireDate without having to specify a join?
Why is this better than the SQL below (that appears to do exactly the same thing)?
SELECT VALUE
h.Employee.HireDate
FROM
AdventureWorksEntities3.EmployeePayHistory as h
Looking at the above two statements, I can't work out what extra the CREATEREF, DEREF bit is adding since I appear to be able to get at what I want without them.
I'm assuming I have just not found the scenarios that demostrate the purpose. I'm assuming there are scenarios where using these keywords is either simpler or is the only way to accomplish the required result.
What I can't find is the scenarios....
Can anyone fill in the gap? I don't need entire sets of SQL. I just need a starting point to play with i.e. a brief description of a scenario or two... I can expand on that myself.
Look at this post
One of the benefits of references is that it can be thought as a ‘lightweight’ entity in which we don’t need to spend resources in creating and maintaining the full entity state/values until it is really necessary. Once you have a ref to an entity, you can dereference it by using DEREF expression or by just invoking a property of the entity
TL;DR - REF/DEREF are similar to C++ pointers. It they are references to persisted entities (not entities which have not be saved to a data source).
Why would you use such a thing?: A reference to an entity uses less memory than having the DEFEF'ed (or expanded; or filled; or instantiated) entity. This may come in handy if you have a bunch of records that have image information and image data (4GB Files stored in the database). If you didn't use a REF, and you pulled back 10 of these entities just to get the image meta-data, then you'd quickly fill up your memory.
I know, I know. It'd be easier just to pull back the metadata in your query, but then you lose the point of what REF is good for :-D

Creating a dictionary from another dictionary keeping the structure intact

I have a big dictionary with deep hierarchy in it... I want to read it and create another dictionary with same structure but with some modifications while I am reading the source dictionary.
Modifications are like if the keyName is "server" then remove that key, if the keyName is "notification" then alter its value.
What is the best way to do this keeping the structure of source dictionary intact.
Read the Deep Copies section of Collections Programming Topics. In fact, you should really read the entire document. You'll end up reading it all at some point anyway (or worse, having us repeatedly point you there), and it's only a few dozen pages.
I know this probably isn't the answer you were looking for, but the alternative is for someone here to code up a method that deep copies dictionaries for you. I'm not going to do that. If you get stuck on something specific, by all means, ask here.

MongoDB: What's a good way to get a list of all unique tags?

What's the best way to keep track of unique tags for a collection of documents millions of items large? The normal way of doing tagging seems to be indexing multikeys. I will frequently need to get all the unique keys, though. I don't have access to mongodb's new "distinct" command, either, since my driver, erlmongo, doesn't seem to implement it, yet.
Even if your driver doesn't implement distinct, you can implement it yourself. In JavaScript (sorry, I don't know Erlang, but it should translate pretty directly) can say:
result = db.$cmd.findOne({"distinct" : "collection_name", "key" : "tags"})
So, that is: you do a findOne on the "$cmd" collection of whatever database you're using. Pass it the collection name and the key you want to run distinct on.
If you ever need a command your driver doesn't provide a helper for, you can look at http://www.mongodb.org/display/DOCS/List+of+Database+Commands for a somewhat complete list of database commands.
I know this is an old question, but I had the same issue and could not find a real solution in PHP for it.
So I came up with this:
http://snipplr.com/view/59334/list-of-keys-used-in-mongodb-collection/
John, you may find it useful to use Variety, an open source tool for analyzing a collection's schema: https://github.com/jamescropcho/variety
Perhaps you could run Variety every N hours in the background, and query the newly-created varietyResults database to retrieve a listing of unique keys which begin with a given string (i.e. are descendants of a specific parent).
Let me know if you have any questions, or need additional advice.
Good luck!