I have a large number of XML based data files with complex contents. Currently I am validating the contents at every use, and that is slow. I started thinking I could have a utility to validate the XML, then get an MD5 hash of the file and save it to the file meta data. Then, at use I can compare saved hash with current hash and only validate those files that are different.
At least, I can do a performance comparison and see if that will actually be any faster.
That said, I am not finding any way to add a custom Hash property to the file meta data. And I wonder if there is a better way to do this?
For some other XML files I am using code signing, but those are program resource XML files that I provide. These other XML files are modified by the customer for use, so I can't use code signing.
I also could include a text file that lists the XML files and their associated hashes, but storing the hash in the file seems a more elegant solution. It just seems like Windows is less than forthcoming with the custom metadata options. At least local files. Of course there is all sorts of metadata options when files are on SharePoint, or AWS S3, etc. And indeed, I need to be able to hash files and save that as metadata on the file, and have it survive a round trip through a cloud repository too, since that is the solution I am looking at for solving the Work From Home problem. A company would create and validate their XML files, then upload them to an S3 bucket, then code on the user machine would download and use them.
Am I on the right track, or is this a dead end? And if so, might a self-signed certificate solve the issue? Create your certificate and share the public key with users. Then sign your XML with it. That feels... not ideal.
I determined that this approach was indeed a dead end, due to the fact that I can't ensure that files will always be hosted on an NTFS formatted drive. Especially in smaller firms a NAS is a common location, and with Work from Home becoming a thing, so is a local external FAT32 formatted drive.
The solution is to prevalidate the XML, get a hash of the XML as a string, and then add that hash to the root node at an attribute. The XML load code can then pass loaded XML to a method that compares the value of that hash to a rehash of the same XML as a string, with the attribute removed. Net result, a universally applicable way to verify if the XML is changed since prevalidation. Which was really the goal.
Related
When i try to store a 200MB+ xml file to marklogic using REST it gives the following error "XDMP-FRAGTOOLARGE: Fragment of /testdata/upload/submit.xml too large for in-memory storage".
I have tried the Fragment Roots and Fragment Parents option but still gets the same error.
But when i store the file without '.xml' extension in uri. it saves the file but not Xquery operations can be performed on it.
MarkLogic won't be able to derive the mime from the uri without extension. It will then fall back to storing it as binary.
I think that if you would use xdmp:document-load from QConsole, you might be able to load it correctly, as that will not try to hold the entire document in memory first. It won't help you much though, you will likely hit the same error elsewhere. The REST api will have to pass it through in memory, so that won't work like this.
You could raise memory settings in the Admin UI, but you are generally better off by splitting your input. MarkLogic Content Pump (MLCP) will allow you to do so using the aggregate_record options. That will split the file into smaller pieces based on a particular element, and store these as separate documents inside MarkLogic.
HTH!
I would like to set up the configuration of each level of my iphone game through a plist or some kind of flat file. One drawback of doing this is that user can potentially open up the app and change the flat file. I am now thinking of hard coding it as an instance of say Config class. Is that a good idea? What is the conventional approach for saving/loading/configuring levels?
The approach I've often used is two-fold: First, write the data out as data, rather than text, simply to make it a bit less obvious what it is. If you're using a plist, you can serialize it as an NSData element. Secondly, create a hash (SHA-1, etc) of the data, salted and/or concatenated with some value internal to your program, and store the hash either along side the data or somewhere else. Then when the data is read back in, you can validate that it hasn't been tampered with.
You could obfuscate the data in various ways, or actually encrypt it before storing it. Encryption has export ramifications however.
I'm trying to loop through pdf files in a directory and send them to a sharepoint document library. When I send them I would like to add the customer, invoice, etc to the list as well. Anyone have recommendations?
Sure. This can done fairly easily. Here's the article I've used in the past for reference:
http://blogs.technet.com/b/heyscriptingguy/archive/2010/09/23/use-powershell-cmdlets-to-manage-sharepoint-document-libraries.aspx
Setting metadata should be pretty easy as well, but PowerShell can't guess what a customer, invoice, etc is. So you'll have to have some data source. If the filename contains the data, you could split it. If the data is in the file itself, there are some methods of getting plaintext strings out of a PDF, but it's going to be a bit harder than the first part of your request.
Let me know if I can help further with any specifics.
I've been checking out Facebook code lately and all of their images and files have names comprised of just random letters and numbers like "FSEB6oLTK3I.png", "cWd6w4ZgtPx.png", "GsNJNwuI-UM.gif". What do these names mean? Are they using some sort of naming system (if so, what is it?) or are the names just random?
They are generated completely randomly. And probably done for good reasons too. If this name was predicable then you could see someone's random upload by just knowing their name or id.
After generating a file name, they store the image on disk and store the image name in the database. Again this purely done for security reasons.
I think the names are generated completely random. If that's not the case, one would need a lot more data regarding the images/files and their uploaders, not to mention additional data about... well, anything that might be relevant for an upload.
I think that it is just random. They probably have a database that has all the random filenames
I want to ship some default data with my app. This data can be stored perfectly in a property list. The strucure is simple:
Root
0
animalType = cat
animalName = Tom
1
animalType = dog
animalName = Rambo
I thought: When I use a property list rather than hard-coding it somewhere, then I could easily provide more defaults to choose from after the app is distributed already. If I would hard-code it, I would have to provide heavy upgrades every time, and Apple would take weeks to let them pass.
But there's one thing I don't get. I make that property list manually in Xcode and put that in my Resources group. As far as I know, Xcode would compile it into some binary format. In my app I would use NSPropertyListSerialization to create a NSDictionary out of it. But this property list would not get placed into the documents directory of the sandbox, right? So if the application would download an update of that property list some time in the future, the update would have to go into documents dir, but the app would still look at the old plist in the root, right? Where to put it? Must I copy it to documents, just like an sqlite database?
And the other thing: When I edit the plist and provide the whole thing as an XML for download/update from a server, then of course that thing would not be "compiled" into some binary format. How would that work? Would NSPropertyListSerialization have trouble reading it? Must I compile that thing every time with XCode and let the app download some binary stuff?
There are two commonly used property list formats: proprientary binary format and xml format (DTD). You can use either of them, and NSPropertyListSerialization will detect automatically, which one is used for your data when de-seralizing.
XML format is more verbose, but it's simple to generate. If you're publishing data from server, you might consider generating xml plist, and compress it with gzip or something.
Now to your first question about where to store the data. To make application payload smaller you might first check documents directory for updated plist, and if it is not present - load default plist from your application bundle.
One general approach used is to always copy plists or other updated elements into the application documents directory - then you just always load from there, and replace when there is an update.
Or you could pre-load the data into a database, download plist updates and refresh the database entries at that time.