File is too large to scan - what are the alternative options? - sophoslabs-intelix

If a file is over 16MB it cannot be scanned by static or dynamic file analysis in Sophos Intelix. Is there a suggested fallback option for scanning/threat-detection in larger files?

Related

SophosLabs Intelix - 16MB restriction

According to the documentation and experiment with actual code, it is found that file larger than 16MB cannot be scanned by SophosLabs Intelix. Reference: https://api.labs.sophos.com/doc/changelog.html
Short video files, such as MP4, can usually be in the size of 100MB etc. It looks like such file types shall be processed by Static Analysis. What are the alternatives, given such restriction?
Attempted scanning a 50MB video and fails.

Powershell - How to compress images within multiple Word Documents

I have a large quantity of Word Documents that contain images that result in very large file sizes (20-50MB) due to the size of the image files. This is causing storage and speed issues. I would like to have a Powershell script to add to the scripts I already run on batches of documents that uses Word's image compression feature (or other methods) to reduce the file size of the images, without affecting their size as they appear in the documents.
I found a script that claimed to do this on Reddit but it simply did not work. Unfortunately, I am pretty inexperienced with Powershell and this falls well outside of my capability to write myself. Any help would be appreciated.

Why moodledata directory has this structure?

I know moodle's internal files such as uploaded images are stored in moodledata directory.
Inside, there are several directories:
moodledata/filedir/1c/01/1c01d0b6691ace075042a14416a8db98843b0856
moodledata/filedir/63/
moodledata/filedir/63/89/
moodledata/filedir/63/89/63895ece79c4a91666312d0a24db82fe3017f54d
moodledata/filedir/63/3c/
moodledata/filedir/63/37/
moodledata/filedir/63/a7/
What are these hashses?
What are the design reasons behind this design, in oposition with, for example, wordpress /year/month/file.jpg structure?
https://docs.moodle.org/dev/File_API_internals#File_storage_on_disk
Simple answer - files are stored based on the hash of their content (inspired by the way Git stores files internally).
This means that if you have the same file in multiple places (e.g. the same PDF or image in multiple courses), it is stored only once on the disk, even if the original filename is different.
On real sites this can involve a huge reduction in disk usage (obviously dependent on how much duplication there is on your site).
Moodledata files are stored according to the SHA1 encoding of their content, in order to prevent the duplication of content (for example, when the same file is uploaded twice with a different name).
For further explanations of how to handle such files, you can read the official documentation of the File API :
https://docs.moodle.org/dev/File_API_internals
especially the File storage on disk part.

File I/O on NoSQL - especially HBase - is it recommended? or not?

I'm new at NoSQL and now I'm trying to use HBase for file storage. I'll store files in HBase as binary.
I don't need any statistics, only file storage.
IS IT RECOMMENDED? I worry about I/O speed.
The reason why I use HBase for a storage is I have to use HDFS, but I can't build Hadoop on a client computer. Because of it, I was tring to find some libraries which helps the client to connect to HDFS to get files. But I couldn't find it, and I just choose HBase instead of a connection library.
In this situation, what should I do?
I don't know about Hadoop, but MongoDB has GridFS which is designed for distributed file storage which enables you to scale horizontally, get replication for "free" and so on.
http://www.mongodb.org/display/DOCS/GridFS
There will be some overhead with storing files in chunks in MongoDB, so if your load is low to medium, and you need low response times, you will probably be better off with using the file system directly. Performance will also vary between different driver implementations.
I think that capability to mount HDFS as regular file system should help you. http://wiki.apache.org/hadoop/MountableHDFS
You certainly can use HBase to store files. It is perhaps not ideal, and based on your file size distribution you may want to tweak some of the settings. Compared with HDFS, it is probably a much better alternative for large numbers of files.
Settings to look out for:
max region size: You will likely want to turn this up to 4GB
max cell size: you will want to set this to 0 to disable this limit
You may also want to look at other kinds of alternatives (maybe even MapR).

How To Create File System Fragmentation?

Risk Factors for File Fragmentation include mostly full Disks and repeated file appends. What are other risk factors for file fragmentation? How would one make a program using common languages like C++/C#/VB/VB.NET to work with files & make new files with the goal of increasing file fragmentation?
WinXP / NTFS is the target
Edit: Would something like this be a good approach? Hard Drive free space = FreeMB_atStart
Would creating files of say 10MB to fill
90% of the remaining hard drive space
Deleting every 3rd created file
making file of size FreeMB_atStart * .92 / 3
This should achieve at least some level of fragmentation on most file systems:
Write numerous small files,
Delete some at random files,
Writing a large file, byte-by-byte.
Writing it byte-by-byte is important, because otherwise if the file system is intelligent, it can just write the large file to a single contiguous place.
Another possibility would be to write several files simultaneously byte-by-byte. This would probably have more effect.