SophosLabs Intelix - 16MB restriction - sophoslabs-intelix

According to the documentation and experiment with actual code, it is found that file larger than 16MB cannot be scanned by SophosLabs Intelix. Reference: https://api.labs.sophos.com/doc/changelog.html
Short video files, such as MP4, can usually be in the size of 100MB etc. It looks like such file types shall be processed by Static Analysis. What are the alternatives, given such restriction?
Attempted scanning a 50MB video and fails.

Related

Powershell - How to compress images within multiple Word Documents

I have a large quantity of Word Documents that contain images that result in very large file sizes (20-50MB) due to the size of the image files. This is causing storage and speed issues. I would like to have a Powershell script to add to the scripts I already run on batches of documents that uses Word's image compression feature (or other methods) to reduce the file size of the images, without affecting their size as they appear in the documents.
I found a script that claimed to do this on Reddit but it simply did not work. Unfortunately, I am pretty inexperienced with Powershell and this falls well outside of my capability to write myself. Any help would be appreciated.

Questions after reading the API doc of upload session

I'm a bit confused after reading this doc.
The doc says:
The fragments of the file must be uploaded sequentially in order. Uploading fragments out of order will result in an error.
Does that mean that, for one file divided into #1~10 fragments in order, I can only upload fragment 2 after I finish uploading fragment 1? If so, why is it possible to have multiple nextExpectedRanges? I mean, if you upload fragments one by one, you can make sure that previous fragments have already been uploaded.
According to the doc, byte range size has to be a multiple of 320 KB. Does that imply that the total file size has to be a multiple of 320 KB also?
There are currently some limitations that necessitate this sequencing requirement, however the long-term goal is to not. As a result, the API reflects this by supporting multiple nextExpectedRanges, but does not currently leverage it.
No, multiples of 320KiB are just the ideal size. You can choose others, and you can mix them. So for you scenario you could use all 320KiB chunks, except for the last one which would be whatever size is relevant to hit the overall size of your file.

How does mmap() help read information at a specific offset versus regular Posix I/O

I'm trying to understanding something a bit better about mmap. I recently read this portion of this accepted answer in the related stackoverflow question (quoted below):mmap and memory usage
Let's say you read a 100MB chunk of data, and according to the initial
1MB of header data, the information that you want is located at offset
75MB, so you don't need anything between 1~74.9MB! You have read it
for nothing but to make your code simpler. With mmap, you will only
read the data you have actually accessed (rounded 4kb, or the OS page
size, which is mostly 4kb), so it would only read the first and the
75th MB.
I understand most of the benefits of mmap (no need for context-switches, no need to swap contents out, etc), but I don't quite understand this offset. If we don't mmap and we need information at the 75th MB offset, can't we do that with standard POSIX file I/O calls without having to use mmap? Why does mmap exactly help here?
Of course you could. You can always open a file and read just the portions you need.
mmap() can be convenient when you don't want to write said code or you need sparse access to the contents and don't want to have to write a bunch of caching logic.
With mmap(), you're "mapping" the entire contest of the file to offsets in memory. Most implementation of mmap() do this lazily, so each ~4K block of the file is read on-demand, as you access those memory locations.
All you have to do is access the data in your file like it was a huge array of chars (i.e. int* someInt = &map[750000000]; return *someInt;), and let the OS worry about what portions of the file have been read, when to read the file, how much, writing the dirty data blocks back to the file, and purging the memory to free up RAM.

Confused about the advantage of MongoDB gridfs

MongoDB gridfs says the big advantage is that splitting big file to chunks, and then you don't have to load entire file to memory if you just want to see part of the file. But my confusion is that even though I open a big file from local disk I can just use skip() API to just load part of the file which I wanted. I don't have to load the entire file at all. So how come MongoDB says that is the advantage?
Even though cursor.skip() method does not return the entire file, it has to load it into memory. It requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return results(Doesn't greatly affect when collection is small in size).
As the offset increases, cursor.skip() will become slower and more CPU intensive. With larger collections, cursor.skip() may become IO bound.
However, Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.
Thus, allowing the user to access information from arbitrary sections of files, such as to “skip” to the middle of file(using id or filename) without being CPU intensive.
Official documentations: 1.Skip
2.GridFS.
Update:
About what Peter Brittain is suggesting:
There are many things to consider(infrastructure,presumed usage stats,file size etc.) while one is choosing between filesystem and GridFS.
For example: If you have millions of files, GridFS tends to
handle it better, also you need to consider file system limitations
like the maximum number of files/directory etc.
You might want to consider going through this article:
Why use GridFS over ordinary Filesystem Storage?

Iphone XML file parsing preference, but what is Big what is Small?

You may know here is a page comparing the XML parsers for Iphone,
http://www.raywenderlich.com/553/how-to-chose-the-best-xml-parser-for-your-iphone-project
The comparing words for these parsers are ;
"this one is for small XML files" "for large XML files" "for relatively small XML files"
Well what the heck on earth does that mean? for instance speed is important for me and my XML is expected to be around 300KB so is that small? big or relatively small? and what does large XML files mean in Iphone? 1 MB? 50 MB? 100MB? or even 500KB?
I know there is not a strict distinction exist, but at least I need to have a rough idea what those adjectives means, I need to parse this XML in around 1-2 seconds in IPhone.
how should I choose to use one parser over another by looking at my file size and my speed requirements?
Thanks
We have an app that downloads a configuration file from a server that is frequently in excess of 1 MB. We use GDataXml to parse it, and it's relatively fast. 1MB of XML is kind of large for an XML file, but then again I'm sure large companies like WalMart, Tyson, etc. have apps that use massive XML files (possibly 50 MB). That really is a massive amount of text data though, and JSON may be a better alternative in terms of character use. Additionally, you can read the data straight from the file and shove it in an NSDictionary that you can then query. If you have control of the file output, consider JSON.