MarkLogic "XDMP-FRAGTOOLARGE" error while storing 200MB+ File using REST - rest

When i try to store a 200MB+ xml file to marklogic using REST it gives the following error "XDMP-FRAGTOOLARGE: Fragment of /testdata/upload/submit.xml too large for in-memory storage".
I have tried the Fragment Roots and Fragment Parents option but still gets the same error.
But when i store the file without '.xml' extension in uri. it saves the file but not Xquery operations can be performed on it.

MarkLogic won't be able to derive the mime from the uri without extension. It will then fall back to storing it as binary.
I think that if you would use xdmp:document-load from QConsole, you might be able to load it correctly, as that will not try to hold the entire document in memory first. It won't help you much though, you will likely hit the same error elsewhere. The REST api will have to pass it through in memory, so that won't work like this.
You could raise memory settings in the Admin UI, but you are generally better off by splitting your input. MarkLogic Content Pump (MLCP) will allow you to do so using the aggregate_record options. That will split the file into smaller pieces based on a particular element, and store these as separate documents inside MarkLogic.
HTH!

Related

How to use VTK to efficiently write time-varying field data on a fixed mesh?

I am working on physics simulation research. I have a large fixed grid in one of my projects that does not vary with time. The fields on the grid, on the other hand, vary with time in the simulation. I need to use VTK to record the field data in each step for visualization (Paraview).
The method I am using is to write a separate *.vtu file to disk at each time step. This basically serves the purpose, but actually writes a lot of duplicate data (re-recording the geometry of the mesh at each step), which not only consumes more disk space, but also wastes time on encoding and parsing.
I would like to have a way to write the mesh information only once, and the rest of the time only new field data is written, while being able to guarantee the same visualization. Please let me know if VTK and Paraview provide such an interface and how to implement it.
Using .pvtu and refer to the same .vtu as Piece for each step should do the trick.
See this similar post on the ParaView discourse, and the pvtu doc
EDIT
This seems to be a side effect of the format, this is not supported by the writer.
The correct solution is to use another file format ...
Let me provide my own research findings for reference.
As Nico said, with the combination of pvtu/vtu files, we could theoretically implement a geometry structure stored in a separate vtu file, referenced by a pvtu file. Setting the NumberOfPieces attribute of the ptvu file to 1 would enable the construction of only one separate vtu file.
However, the VTK library does not expose a dedicated operation interface to control the writing process of vtu files. No matter how it is set, as long as the writer's input contains geometry structures, the writer will write geometry information to disk, and this process cannot be skipped through the exposed interface.
However, it is indeed possible to make multiple pvtu files point to the same vtu file by manually editing the piece node in the ptvu file, and paraview can recognize and visualize such a file group properly.
I did not proceed to try adding arrays to the unstructured grid and using pvtu output.
So, I think the conclusion is.
if you don't want to dive into VTK's library code and XML implementation, then this approach doesn't make sense.
if you are willing to write a series of files, delete most of them from the vtu file, and then point all the pvtu's piece nodes to the only surviving vtu file by editing the pvtu file, you can save a lot of disk space, but will not shorten the write, read, and parse times.
If you implement an XML writer by yourself, you can achieve all the requirements in theory, but it requires a lot of coding work.

Save file MD5 hash to file metadata

I have a large number of XML based data files with complex contents. Currently I am validating the contents at every use, and that is slow. I started thinking I could have a utility to validate the XML, then get an MD5 hash of the file and save it to the file meta data. Then, at use I can compare saved hash with current hash and only validate those files that are different.
At least, I can do a performance comparison and see if that will actually be any faster.
That said, I am not finding any way to add a custom Hash property to the file meta data. And I wonder if there is a better way to do this?
For some other XML files I am using code signing, but those are program resource XML files that I provide. These other XML files are modified by the customer for use, so I can't use code signing.
I also could include a text file that lists the XML files and their associated hashes, but storing the hash in the file seems a more elegant solution. It just seems like Windows is less than forthcoming with the custom metadata options. At least local files. Of course there is all sorts of metadata options when files are on SharePoint, or AWS S3, etc. And indeed, I need to be able to hash files and save that as metadata on the file, and have it survive a round trip through a cloud repository too, since that is the solution I am looking at for solving the Work From Home problem. A company would create and validate their XML files, then upload them to an S3 bucket, then code on the user machine would download and use them.
Am I on the right track, or is this a dead end? And if so, might a self-signed certificate solve the issue? Create your certificate and share the public key with users. Then sign your XML with it. That feels... not ideal.
I determined that this approach was indeed a dead end, due to the fact that I can't ensure that files will always be hosted on an NTFS formatted drive. Especially in smaller firms a NAS is a common location, and with Work from Home becoming a thing, so is a local external FAT32 formatted drive.
The solution is to prevalidate the XML, get a hash of the XML as a string, and then add that hash to the root node at an attribute. The XML load code can then pass loaded XML to a method that compares the value of that hash to a rehash of the same XML as a string, with the attribute removed. Net result, a universally applicable way to verify if the XML is changed since prevalidation. Which was really the goal.

Can I refresh all the Word Document metadata using DocumentFormat.OpenXml

I have an application that uses the DocumentFormat.OpenXml API to a Word document from one or more originating documents, inserting and deleting chunks of data as it goes. In other words, the resulting document will be significantly different from the constituent documents.
I have already successfully created things like Custom Document Properties, Document Variables and Core File Properties.
But: is it possible to get the other metadata items (number of pages, words, paragraphs, etc.) refreshed, without actually having to calculate these?
Thank you to #Cindy Meister for the answer.
I was hoping that there might be some method or other in the DocumentFormat.OpenXML SDK that I could call, but it seems that is not possible.

how to save an image to a database using Icefaces 3.0.1 FileEntry component

I wanna use a fileentry component to save an image to a database(postgres with JPA connection) and JSF. I was searching it, so i think i have to use an inputstream, but i don't know how to use it to convert the image to a bytea type to be saved to a database in a column called for example cover's book(bytea). what code shoud be in the bean. Please help me. I have something like this:
Icefaces 3.0.1 FileEntry: FileEntryListener is never called
You're trying to do it all at once. Break it down step-by-step into pieces you can understand on their own.
First make sure that receiving the image and storing it to a local file works. There are lots of existing examples for this part.
Then figure out how to map bytea in your JPA implementation and store/retrieve files from disk. You didn't mention which JPA provider you're using, so it's hard to be specific about this.
Finally, connect the two up by writing directly from the stream, but only once you know both parts work correctly by themselves. At this point you will find that you can't actually send a stream directly to the database if you want to use a bytea field - you must fetch the whole file into memory as a byte buffer.
If you want to be able to do stream I/O with files in the database you can do that using PostgreSQL's support for "large objects". You're very unlikely to be able to work with this via JPA, though; you'll have to manage the files in the DB directly with JDBC using the PgJDBC extensions for large object suppport. You'll have to unwrap the Connection object from the connection pooler to get the underlying Connection that you can cast to PgConnection in order to access this API.
If you're stuck on any individual part, feel free to post appropriately detailed and specific questions about that part; if you link back to this question that'll help people understand the context. As it stands, this is a really broad "show me how to use technology X and Y to to Z" that would require writing (or finding) a full tutorial to answer.

In salesforce.com can you have multivalued attributes?

I am developing a Novell Identity Manager driver for Salesforce.com, and am trying to understand the Salesforce.com platform better.
I have had really good success to date. I can read pretty much arbitrary object classes out of SFDC, and create eDirectory objects for them, and what not. This is all done and working nicely. (Publisher Channel). Once I got Query events mapped out, most everything started working in the Publisher Channel.
I am now working on sending events back to SFDC (Subscriber channel) when changes occur in eDirectory.
I am using the upsert() function in the SOAP API, and with Novell Identity Manager, you basically build the SOAP doc, and can see the results as you build it. (You can do it in XSLT or you can use the various allowed tokens to build the document in DirXML Script. I am using DirXML Script which has been working well so far.).
The upshot of that comment is that I can build the SOAP document, see it, to be sure I get it right. Which is usually different than the Java/C++ approach that the sample code usually provides. Much more visual this way.
There are several things about upsert() that I do not entirely understand. I know how to blank a value, should I get that sort of event. Inside the <urn:sObjects> node, add a node like (assuming you get your namespaces declared already):
<urn1:fieldsToNull>FieldName</urn1:fieldsToNull>
I know how to add a value (AttrValue) to the attribute (FieldName), add a node like:
<FieldName>AttrValue</FieldName>
All this works and is pretty straight forward.
The question I have is, can a value in SFDC be multi-valued? In eDirectory, a multi valued attribute being changed, can happen two ways:
All values can be removed, and the new set re-added.
The single value removed can be sent as that sort of event (remove-value) or many values can be removed in one operation.
Looking at SFDC, I only ever see Multi-picklist attributes that seem to be stored in a single entry : or ; delimited. Is there another kind of multi valued attribute managed differently in SFDC? And if so, how would one manipulate it via the SOAP API?
I still have to decide if I want to map those multi-picklists to a single string, or a multi valued attribute of strings. First way is easier, second way is more useful... Hmmm... Choices...
Some references:
I have been using the page Sample SOAP messages to understand what the docs should look like.
Apex Explorer is a kicking tool for browsing the database and testing queries. Much like DBVisualizer does for JDBC connected databases. This would have been so much harder without it!
SoapUi is also required, and a lovely tool!
As far as I know there's no multi-value field other than multi-select picklists (and they map to semicolon-separated string). Generally platform encourages you to create a proper relationship with another (possibly new, custom) table if you're in need of having multiple values associated to your data.
Only other "unusual" thing I can think of is how the OwnerId field on certain objects (Case, Lead, maybe something else) can be used to point to User or Queue record. Looks weird when you are used to foreign key relationships from traditional databases. But this is not identical with what you're asking as there will be only one value at a time.
Of course you might be surpised sometimes with values you'll see in the database depending on the viewing user's locale (stuff like System Administrator profile becoming Systeembeheerder in Dutch). But this will be still a single value, translated on the fly just before the query results are sent back to you.
When I had to perform SOAP integration with SFDC, I've always used WSDL files and most of the time was fine with Java code generated out of them with Apache Axis. Hand-crafting the SOAP message yourself seems... wow, hardcore a bit. Are you sure you prefer visualisation of XML over the creation of classes, exceptions and all this stuff ready for use with one of several out-of-the-box integration methods? If they'll ever change the WSDL I need just to regenerate the classes from it; whereas changes to your SOAP message creation library might be painful...