How to get LibreOffice's document binary? - libreoffice

I'm just starting to develop extensions to LibreOffice suite and I'd like to get the binary of the current active document. In fact I'd like to do something similar to an ajax request where I'd sent this document. Any ideia?

As ngulam stated, the document proper is XML.
The raw file on disk is stored in a ZIP container. You can find the URL to this disk from the document and then access this ZIP container directly. I do not believe that it is possible, however, to see the document as a binary blob (or even the XML as stored in the ZIP container) using the API and accessing what has been loaded into memory.
Can you clarify your question. For example, are you attempting to access binary portions such as an inserted graphic into a document?

Related

Creating query to get some certain parts from a file GRIDFS

In my Spring Boot application I used GridFS to store large file in my database. To find certain files, I use normal queries on the files collection:
GridFSFile file = gridFsTemplate.findOne(Query.query(Criteria.where(ID).is(id)));
but with this approach I'm getting the entire file.
My question is, how to create some queries without loading the whole file in the memory?
My stored files are books ( in pdf format ) and suppose I want to get the content from certain page without loading the entire book in the memory.
I'm guessing I'll have to use the chunk collection and perform some operations to the chunks but I cannot find how to do that.
GridFS is described here. Drivers do not provide a standardized API for retrieving parts of the file, but you can read that spec and construct your own queries that would retrieve portions of the written chunks.
Your particular driver may provide partial file retrieval functionality, consult its docs for that.

Document Repository REST application in java

I have a requirement to develop a Document Repository which will maintain all documents related to different listed Companies. Each document will be related to a Company. It has to be REST API. Documents can be in pdf, html, word or excel format. Along with storing documents, I need to store metadata as well like CompanyID, Doc format, timestamp, doc language etc.
As the number of document will grow in years to come, its important that the application is scalable.
Also need to translate non-English doc and store it translated English version in some parent-child relation which is easy for retrieval.
Any insights on the approach, libraries/jars to use and best practices and references are welcome.
The base 64 encoded content of the file could be included as the part of your payload along with file metadata.
Posting a File and Associated Data to a RESTful WebService preferably as JSON
Once the file reach to your end, you could save the file either local to your hard disk or save the same base64 encoded content as in your data store(user blob/clob).

Saving image in database

Is it good to save an image in database with type BLOB?
or save only the path and copy the image in specific directory?
Which way is the best (I mean good performance for the database and the application) and why?
What are your requirements?
In the vast majority of cases saving the path will be better, simply because of the sheer size of the files compared to the rest of data (bulge the DB by GBs due to image inclusion). Consider adding an indirection, eg. save the path as a name and a reference to a storage resource (eg. a storage_id referencing a row in storages tables) and the path attached to the 'storage'. This way you can easily move files (copy all files, then update the storage path, rather than update 1MM individual paths).
However, if your requirements include consistent backup/restore and/or disaster recoverability, is often better to store images in the DB. Is not easier, nor more convenient, but is simply going to be required. Each DB has its own way of dealing with this problem, eg. in SQL Server you would use a FILESTREAM type which allows remote access via file access API. See FILESTREAM MVC: Download and Upload images from SQL Server for an example.
Also, a somehow dated but none the less interesting paper on the topic: To BLOB or Not to BLOB.

Performance in MongoDB and GridFS

I am developing a plugin that using mongodb. The plugin has to store some .dcm files (DICOM files) in the database as binary files. After that, the plugin has to store the metadata of the file and be able to make some query on only these metadata.
Naturally, I chose GridFs to answer at my problem. Because I can use the same file to store the binary data in the chunks collection and the metadata in the metadata field in the files collection (and bypass the sized limit of MongoDB).
But another problem comes to me. This solution would be great but I am storing at the same time the binary data and the metadata. Let me explain : first I store the binary file and after that I retrieve the file and read metadata from it and store the metadata in the same file. It is an obligation for me for some externals reasons. So I lost a lot of time to retrieve the file and restore it again. For update the metadata from a file that it is already stored, I am using this code :
GridFSDBFile file = saveFs.findOne(uri.getFileName());
if (file == null) {
return false;
} else {
file.setMetaData(new BasicDBObject());
file.save();
return true;
}
The main problem it that I have to find the file before to modify it and then store it AGAIN !
So my first question is : Is there a best way to retrieve file from the database instead of findOne(String fileName) ? Is the method findOne(ObjectID id) is faster ? (I don't think so because I think that fileName is already indexed by default, is not it ?)
I have tried another way to do it. To bypass this problem, I decided to store 2 different files, ones for binary data and ones for metadata. In this case, I don't loose time to retrieve the file in the database. But I have got 2 times more files... But I almost sure that it exist a better way to do it !
So my second question : Do you think that I would have to used 2 different collections ? One which used GridFs to store the binary data and the other one that used classic mongo storage (or GridFS) to only store the metada ?
Thank you a lot for reading me and for your answer :).
For your first question, both _id and filename fields are indexed by default. While _id field is unique, filename is not. So if you have files with same filenames, getting a file with the filename will be relatively slower than getting it by the _id field.
For your second question, you can always have metadata for any GirdFS file you inserted. That means you don't have to have more than GridFS. Use GridFS to insert data, but just before inserting it, assign your metadata to the file you want to insert. That way you can query files using the metadata. If the metadata you want to have is fixed for all documents, then you can have those fields indexed too, and queryable of course.

How to use MongoDB or other document database to keep video files, with options of adding to existing binary files and parallel read/write

I'm working on a video server, and I want to use a database to keep video files.
Since I only need to store simple video files with metadata I tried to use MongoDB in Java, via its GridFS mechanism to store the video files and their metadata.
However, there are two major features I need, and that I couldn't manage using MongoDB:
I want to be able to add to a previously saved video, since saving a video might be performed in chunks. I don't want to delete the binary I have so far, just append bytes at the end of an item.
I want to be able to read from a video item while it is being written. "Thread A" will update the video item, adding more and more bytes, while "Thread B" will read from the item, receiving all the bytes written by "Thread A" as soon as they are written/flushed.
I tried writing the straightforward code to do that, but it failed. It seems MongoDB doesn't allow multi-threaded access to the binary (even if one thread is doing all the writing), nor could I find a way to add to a binary file - the Java GridFS API only gives an InputStream from an already existing GridFSDBFile, I cannot get an OutputStream to write to it.
Is this possible via MongoDB, and if so how?
If not, do you know of any other DB that might allow this (preferably nothing too complex such as a full relational DB)?
Would I be better off using MongoDB to keep only the metadata of the video files, and manually handle reading and writing the binary data from the filesystem, so I can implement the above requirements on my own?
Thanks,
Al
I've used mongo gridfs for storing media files for a messaging system we built using Mongo so I can share what we ran into.
So before I get into this for your use case scenario I would recommend not using GridFS and actually using something like Amazon S3 (with excellent rest apis for multipart uploads) and store the metadata in Mongo. This is the approach we settled on in our project after first implementing with GridFS. It's not that GridFS isn't great it's just not that well suited for chunking/appending and rewriting small portions of files. For more info here's a quick rundown on what GridFS is good for and not good for:
http://www.mongodb.org/display/DOCS/When+to+use+GridFS
Now if you are bent on using GridFS you need to understand how the driver and read/write concurrency works.
In mongo (2.2) you have one writer thread per schema/db. So this means when you are writing you are essentially locked from having another thread perform an operation. In real life usage this is super fast because the lock yields when a chunk is written (256k) so your reader thread can get some info back. Please look at this concurrency video/presentation for more details:
http://www.10gen.com/presentations/concurrency-internals-mongodb-2-2
So if you look at my two links essentially we can say quetion 2 is answered. You should also understand a little bit about how Mongo writes large data sets and how page faults provide a way for reader threads to get information.
Now let's tackle your first question. The Mongo driver does not provide a way to append data to GridFS. It is meant to be a fire/forget atomic type operation. However if you understand how the data is stored in chunks and how the checksum is calculated then you can do it manually by using the fs.files and fs.chunks methods as this poster talks about here:
Append data to existing gridfs file
So going through those you can see that it is possible to do what you want but my general recommendation is to use a service (such as Amazon S3) that is designed for this type of interaction instead of trying to do extra work to make Mongo fit your needs. Of course you can go to the filesystem directly as well which would be the poor man's choice but you lose redundancy, sharding, replication etc etc that you get with GridFS or S3.
Hope that helps.
-Prasith