Performance in MongoDB and GridFS - mongodb

I am developing a plugin that using mongodb. The plugin has to store some .dcm files (DICOM files) in the database as binary files. After that, the plugin has to store the metadata of the file and be able to make some query on only these metadata.
Naturally, I chose GridFs to answer at my problem. Because I can use the same file to store the binary data in the chunks collection and the metadata in the metadata field in the files collection (and bypass the sized limit of MongoDB).
But another problem comes to me. This solution would be great but I am storing at the same time the binary data and the metadata. Let me explain : first I store the binary file and after that I retrieve the file and read metadata from it and store the metadata in the same file. It is an obligation for me for some externals reasons. So I lost a lot of time to retrieve the file and restore it again. For update the metadata from a file that it is already stored, I am using this code :
GridFSDBFile file = saveFs.findOne(uri.getFileName());
if (file == null) {
return false;
} else {
file.setMetaData(new BasicDBObject());
file.save();
return true;
}
The main problem it that I have to find the file before to modify it and then store it AGAIN !
So my first question is : Is there a best way to retrieve file from the database instead of findOne(String fileName) ? Is the method findOne(ObjectID id) is faster ? (I don't think so because I think that fileName is already indexed by default, is not it ?)
I have tried another way to do it. To bypass this problem, I decided to store 2 different files, ones for binary data and ones for metadata. In this case, I don't loose time to retrieve the file in the database. But I have got 2 times more files... But I almost sure that it exist a better way to do it !
So my second question : Do you think that I would have to used 2 different collections ? One which used GridFs to store the binary data and the other one that used classic mongo storage (or GridFS) to only store the metada ?
Thank you a lot for reading me and for your answer :).

For your first question, both _id and filename fields are indexed by default. While _id field is unique, filename is not. So if you have files with same filenames, getting a file with the filename will be relatively slower than getting it by the _id field.
For your second question, you can always have metadata for any GirdFS file you inserted. That means you don't have to have more than GridFS. Use GridFS to insert data, but just before inserting it, assign your metadata to the file you want to insert. That way you can query files using the metadata. If the metadata you want to have is fixed for all documents, then you can have those fields indexed too, and queryable of course.

Related

Creating query to get some certain parts from a file GRIDFS

In my Spring Boot application I used GridFS to store large file in my database. To find certain files, I use normal queries on the files collection:
GridFSFile file = gridFsTemplate.findOne(Query.query(Criteria.where(ID).is(id)));
but with this approach I'm getting the entire file.
My question is, how to create some queries without loading the whole file in the memory?
My stored files are books ( in pdf format ) and suppose I want to get the content from certain page without loading the entire book in the memory.
I'm guessing I'll have to use the chunk collection and perform some operations to the chunks but I cannot find how to do that.
GridFS is described here. Drivers do not provide a standardized API for retrieving parts of the file, but you can read that spec and construct your own queries that would retrieve portions of the written chunks.
Your particular driver may provide partial file retrieval functionality, consult its docs for that.

IBM iSeries : fulls details of journal entries (type R)

I need to analyze the journal entries of type R for DB2 on iSeries in order to be able to audit all sql requests (Insert, Update,Delete) generating changes on data : in fact, i would like to analyze the ENTRY_DATA field as returned by QSYS2.Display_Journal in order to dissect image-before / image after of changed lines.
I can't find the appropriate IBM doc / web url providing all details on those entries. Can somebody point me to such details ?
Starting point for journal info is here: Journal entry information
Note that while Display_Journal() is nice, it may not be all that useful for your purposes as it returns the before and after image of the record as a BLOB. Each ENTRY_DATA format would be unique to the file being journaled. Plus, there isn't for instance a built in way to convert a substring of the blob back into a readable packed decimal value.
The Journal APIs would probably be a better choice.
But a generic audit tool that uses the journals is a non-trivial task.
Best choice would be to simply buy a third party tool designed to do what you're trying to do.
Extract before/after image from journal.
Simply copy the joesd to a flat file. Then copy flat file to database *NOCHK
This code gets the after image.
? DSPJRN ?*JRN(mylib/myJRN)
OUTPUT(*OUTFILE)
OUTFILFMT(*TYPE3)
OUTFILE(QTEMP/Z1)
ENTDTALEN(*CALC)
insert into myflatfil
SELECT joesd FROM qtemp/z1 WHERE JOENTT = 'UP'
cpyf myflatfil mydatabase *nochk
Export Journal Entries V4.9
The EXPJRNE command exports journal entries of fiels, data areas and
data queues to an output file. The output file has the same layout as
the journalized file plus the journaling information. EXPJRNE makes it
really easy to analyze journal entries by SQL.
EXPJRNE

How to get LibreOffice's document binary?

I'm just starting to develop extensions to LibreOffice suite and I'd like to get the binary of the current active document. In fact I'd like to do something similar to an ajax request where I'd sent this document. Any ideia?
As ngulam stated, the document proper is XML.
The raw file on disk is stored in a ZIP container. You can find the URL to this disk from the document and then access this ZIP container directly. I do not believe that it is possible, however, to see the document as a binary blob (or even the XML as stored in the ZIP container) using the API and accessing what has been loaded into memory.
Can you clarify your question. For example, are you attempting to access binary portions such as an inserted graphic into a document?

How to query for files in GridFS and return only the last uploaded version

I am storing files using GridFS and the C# official driver.
I am mimic-ing a folder structure and am storing the full directory path in the metadata (i.e. /folder1/subfolder1 ). I am also storing multiple versions of a file using the built-in versioning feature of MongoDB.
This allows me to query for the files in a specific folder using :
var filesQuery = Query.EQ("metadata.ParentPath", myParentPath);
var filesMongo = MongoDatabase.GridFS.Find(filesQuery);
My problem is that this query returns all the files, including the old ones.
How can I insert the version parameter and return only the last uploaded files (as used in the FindOne method of the C# driver)?
I don't know how to include it in the query ("version" doesn't work as it's handled with the upload date internally as far as I know).
Thanks !
I can't think of any way you could write the query to return the newest version of each file in your ParentPath. If you were just returning a single file you could sort by the uploadDate descending and take just the first one (just like the driver does), but that trick doesn't work when you are returning all the files in the directory.
You could write a map/reduce job to do this, but that's probably overkill.
You could also add another boolean (e.g. metadata.isCurrentVersion) to your metadata to flag the current version of each file. It would be up to you to clear the flag on all older versions each time you upload a newer version, but it would make it trivially easy to query for just the current versions.
As long as you don't have too many versions of each file I think your best solution is to do that part of the filtering client side.
You probably want to make sure you have an index on metadata.ParentPath also if there are going to be many files stored.

Fetching data from CSV file locally, iphone application

I am developing an iphone app which will fetch the data from CSV file as per the keyword entered in to the UITextFiled, eg. if user enters london than all the possible entries containing the same keyword should be listed down, I have tried CHCSVParser but i am still not able to conclude any result. Can anyone tell me is it even feasible??? and if yes than please help me through the initial steps.
Thanks.
If you can, then using a plist instead of csv will be much easier/flexible.
Maybe you can import your data in a .sqlite ressource file that contains all elements from your csv file.
Then for listing 15 000 elements or a subset of them in a tableview, the nsfetchedresultscontroller will help you. Initializing it with a fetch request will permits you to filter your elements based on one or more attribute name(s).
http://developer.apple.com/library/ios/#documentation/CoreData/Reference/NSFetchedResultsController_Class/Reference/Reference.html
Yeah, if you're going to repeatedly reference "random" elements of your CSV file, you should convert it to an SQLite DB (though that would be overkill if you're only referencing things once). For your app, as well as I can understand the description, a DB of some sort is most definitely warranted.
As to reading the CSV file, there are, no doubt, tools available, but it's probably just as easy to use NSString componentsSeparatedByString to parse each line.