Why Fuzz images?

Why Fuzz images? - png

I am reading about fuzzing. I have some basic questions regarding fuzzing. I searched but couldn't find any good explanation.
Why image files are popular and common for fuzzing? What is the benefit of using image files?
Why png files are popular and common for fuzzing?
Why Libpng is popular and common for fuzzing?
Is it best to fuzz png images with libpng for beginners? Why?
If someone can answer, it will be very helpful for me.
Thank you in advance.

You fuzz not image files, but software that parses these. Typically
developers don't write code to parse images, but use third party
libraries like libpng. As a developer you don't need to fuzz third
party libraries, only the code of your project. As a security
engineer you can fuzz them.
It is easy to setup fuzzing of such an opensource library - you can build it
statically instrumented, create a small application that calls into
it and fuzz it with an easy to setup fuzzer like afl. This and the
fact that such libraries are widely used, thus errors in these can
have big impact on a lot of applications, make them a good target
for fuzzing.
But image files are not the only files that are widely used and have
popular libraries to handle them. Most fuzzers are unaware of input
structure of the tested binary. They mostly use input mutation
techniques at bit/byte level - changing values of some bit/byte of
the input, feeding it to the tested application and watching it's
behaviour. When the input is highly structured, a fuzzer fails to
test deep into the code. For example to test a browser feeding
html-files to it, requires a fuzzer to create inputs that have
correct lexical and syntactical structure. Typically the code for
lexical/syntax handling is autogenerated based on a language
grammar. By changing bits/bytes in html you most likely get bad
keywords, which would be rejected by such an autogenerated code,
thus testing mostly this code and not getting deeper. Image files
are typically not highly structured and easier to fuzz deeply, thus
can be fuzzed with better coverage.
It is also faster to fuzz a small input than a bigger one - less
bits to change. It's easy to create a small image file just by
taking a small image as a seed, than for example an html-file.
I don't know if png files are more popular for fuzzing than other binary media files, but they structure can include multiple headers/chunks of different types which results in more different handling paths in the code and thus makes it more likely to have errors.
As I said it's opensource, widely used, easy to set up and can be fuzzed fast - it's much faster to run a small application, than for example a browser.
I'm not sure there can be a 'best' criteria, but it's easy and therefore good for beginners.

Related

Can I and How to extract data from .dat file like using the software?

I want to know if i can extract data from .dat file imitating the way I do it inside the software.
Commonly, I load the .dat file in the software through "Load Data" option, then is on my interest obtaining 2 files, generated through "Params to ASCII" and "Data to ASCII" options inside the software. Like you see, I obtain 2 ascii files, which are easily read with a text editor.
The concern is that I do it all manually, and there are lot of .dat files, so I spend lot of ass-hours doing just clicks.
So, I want to know if there is some way to automatize those operations, anyway serve. I am thinking, through my limited knowledge, in scripts that imitate what I do manually (don't know how to do it), or something more complex, which involves reverse-engineering (also don't know how to do it or if it's possible). Or maybe using powershell...
Maybe you guys could help me, surely you have more brillaint minds!
Kind regards!

There are at least four options that I can think of. Sadly, .dat is not a well-defined file format like .pdf, but a general extension used for all kinds of data files. Do you know the name of the software you open the files? That would help to find a solution. Anyway, some general ideas; to recommend any or be more practical requires to know the software.
Use application vendor's API or libraries to read the file. Vendors often provide .Net library for reading the file from disk or via API call. This would be the clean and supported way. For example, to read dBase database files, there's a library at Github.
Read the file as raw binary (as explained in the article linked by Abraham Zinala). I'd rather not try this first, as it requires some reverse engineering and might provide unexpected errors.
Use UI automation. That is, create a script that uses SendKeys to simulate pressing keyboard keys. There are tools such as AutoIT that make this easier. This is kind of last resort, as it is error prone and cumbersome. If the software supports macros or has internal automation capability, try that before 3rd party tools.
The system sending you .dat files offers the data in some other easy to process format. Whilst this is the easiest solution for you, the other party might not agree.

Suitability of CodeNameOne for image processing application

We need to build an image processing application for smartphones (mainly for iPhone). The operations consist of:
image filtering
composite geometrical transformation
region growing
Also, the user is required to specify (touch) some parts of the image. Those parts would serve as inputs for the app. Example: eyes and lip in a face.
We have built a desktop version of this app. The processing part is quite heavy and we extensively used BufferedImage class.
Should we use CodeNameOne for building this app? If not then what alternatives do you suggest?
Please consider the following factors:
Performance
Ease of writing the code (for image processing)

I gave an answer for this in our discussion forum but I think its a worthwhile question for a duplicate post:
Generally for most platforms you should be fine in terms of performance except for iOS & arguably Windows Phone.
Codename One is optimized for common use cases, since iOS doesn't allow for JIT's it can never be as fast as Java on the desktop since its really hard to do some optimizations e.g. array bound check elimination. So every access to an array will contain a branch check which can be pretty expensive for image processing.
Add to that the fact that we don't have any image processing API's other than basic ARGB and you can get the "picture", it just won't be efficient or easy.
The problem is that this is a very specific field, I highly doubt you will find any solution that will help you with this sort of code. So your only approach AFAIK is to build native code to do the actual image processing heavy lifting.
You can do this with Codename One by using the NativeInterface API which allows you to invoke critical code in native code and use cn1lib's to wrap them as libraries. You would then be able to get native performance for that portion of the code but this would only make sense for critical sections in the code. If you write a lot of native code the benefits of Codename One start to dissipate and you might as well go to native.

Common Libraries at a Company

I've noticed in pretty much every company I've worked that they have a common library that is generally shared across a number of projects. More often than not this has been a single companyx-commons project that ends up as a dumping ground for common programs including:
Command Line Parsers
File Utilities
Framework Helpers
etc...
Some of these are well thought out and some duplicate functionality found in Apache commons-lang, commons-io etc..
What are the things you have in your common library and more importantly how do you structure the common libraries to make them easy to improve and incorporate across other projects?

In my experience, the single biggest factor in the success of a common library is user buy-in; users in this case being other developers; and culture of your workplace/team(s) will be a big factor.
Separate libraries (projects/assemblies if you're in .Net) for different application tiers is essential (e.g: there's obviously no point putting UI and data access code together).
Keep things as simple as possible; what you don't put in a common library is often at least as important as what you do. Users of the library won't want to have to think, so usage needs to be super easy.
The golden rule we stuck to was keeping individual functions focused on a single task - do one thing and do it well (or very very well); don't try and provide something that tries to take every possibility into account, the more reusable you think you're making it - the less likely it is to be used. Code Complete (the book) has some excellent content on common libraries.
A good approach to setting/improving a library up is to do regular code reviews and retrospectives; find good candidates that you've already come up with and consider re-factoring them into a library for future projects; a good candidate will be something that more than one developer has had to do on more that one project (for example).
Set-up some sort of simple and clear governance of the libraries - someone who can 'own' a specific library and ensure it's overal quality (such as a senior dev or team lead).

I have so far written most of the common libraries we use at our office.
We have certain button classes that are just slightly more useful to us than the standard buttons
A database management class that does some internal caching and can connect to ODBC, OLEDB, SQL, and Access databases without even the flip of a parameter
Some grid and list controls that are multi threaded so we can add large amounts of data to them without the program slowing and without having to write all the multithreading code every time there is a performance issue with a list box/combo box.
These classes make it easier for all of us to work on each other's code and know how exactly they work since we all use the exact same interfaces throughout our products.
As far as organization goes, all of the DLL's are stored along with their source code on a shared development drive in the office that we all have access to. (We're a pretty small shop)

We split our libraries by function.
Commmon.Ui.dll has base classes for ui elements.
Common.Data.Dll is sort of a wrapper around Enterprise library Data access classes.
Common.Business is a dumping ground for other common classes that don't fit into one of those.
We create other specialized dlls as needs arise.

Writing my own file versioning program

There is what seems to be a plethora of version control systems. Therefore, to draw a bad conclusion, it must be easy to write one.
What are some issues that must be considered in order to write a simple file versioning system? (What are the minimum necessary functions?)
Is it a feasible task for one person?

A good place to learn about version control is Eric Sink's Weblog. His most recent article is Time and Space Tradeoffs in Version Control Storage, for one example.
Another good example is his series of articles Source Control HOWTO. Yes, it's all about how to use source control, but it has a lot of information about the decisions and tradeoffs developers have to make when designing the system. The best example of this is probably his article on Repositories, where he explains different methods of storing versions. I really learned a lot from this series.

How simple?
You could arguably write a version control system with a single-line shell script, upversion.sh:
cp $WORKING_COPY $REPO/$(date +"%s")
For large binary assets, that is basically all you need! It could be improved quite easily, say by making the version folders read-only, perhaps recording metadata with each version (you could have a text file at $REPO/$(date...).meta for example)
That sounds like a huge simplification, but it's not far of the asset-management-systems many film post-production facilities use (for example)
You really need to know what you wish to version, and why..
With large-binary assets (video, say), you need to focus on tools to visually compare versions. You also probably need to deal with dependancies ("I need image123.jpg and video321.avi to generate this image")
With code, you need to focus on things like making diff's between any two versions really easy. Also since edits to source-code are usually small (a few characters from a project with many thousands of lines), it would be horribly inefficient to copy the entire project for each version - so you only store the differences between each version (delta encoding).
To version a database, you probably want to store information on the schema, tracking new tables, or columns, or adjustments to existing ones (rather than calculating deltas of the database files, or making copies like the previous two systems)
There's no perfect way to version everything, you have to focus on doing one thing well.. Git is great for text, but not for binary files. Adobe Version Cue is great with binary files (images), but useless for text..
I suppose the things to consider can be summarised as..
What do you want to version?
Why can I not use (or extend/modify) an existing system?
How will I track differences between versions? (entire files? deltas?)
What other data do I need to attach to versions? (Author? Time-stamp? Dependancies?)
What tasks would a user commonly need to do (diff'ing? reverting specific files?)

Have a look in the question "core concepts" about (D)VCS.
In short, writing a VCS would involve making a decisions about each of these core concepts (Central vs. Distributed, linear vs. DAG, file centric vs. repository centric, ...)
Not a "quick" project, I believe ;)

If you're Linus Torvalds, you can write something like Git in a month.
But "a version control system" is such a vague and stretchable concept, that your question is really unanswerable.
I'd consider asking yourself what you want to achieve (learn about VCS, learn a language, ...) and then define some clear goal. It's good to have a project, but it's also good to have a reachable goal in a small amount of time. Small successes are good for your morale.

That IS really a bad conclusion. My personal opinion here is that the problem domain is so wide and generally hard that nobody has gotten it "right" yet, thus people try to solve it over and over again, from different angles and under different assumptions.That of course doesn't mean you shouldn't try. Just be warned that many smart people were there before you, so you should do your homework.

What could give you a good overview in a less technical manner is The Git Parable.
It is a nice abstraction on the principles of git, but it gives a very good understanding what a VCS should be able to perform. All things beyond this are rather "low-level" decisions.

A good delta algorithm, good compression and network efficiency.

A simple one is doable by one person for a learning opportunity. One issue you might consider is how to efficiently store plain text deltas. A very popular delta format is the one from RCS (used by many version control programs). You might want to study it to get ideas.

To write a proof of concept, you probably could pull it off, implementing or borrowing the tools Alan mentions.
IMHO, the most important aspect of a VCS is ease-of-use. This sounds like an odd statement, but when you think about it, hard drive space is one of the easiest IT commodities to scale horizontally, so bad compression or even real sloppy deltas are going to be tolerated. The main reason people demand improvement in versioning systems is to do common tasks more intuitively or to support more features that droves of people eventually demand but that weren't obvious before release. And since versioning tools tend to be monolithic and thoroughly integrated at a company, the cost to switch is high, and it may not be possible to support a new feature without breaking an existing repo.

The very minimal necessary prerequisite is an exhaustive and accurate test suite. Nobody (including you) will want to use your new system unless you can demonstrate that it works, reliably and completely error free.

How to publish a game?

I don't just mean publish, but pretty much everything between when the pure coding is finished and the first version is released. For example, how do games make it so that their save files are hidden/unhackable, how do they include their resources within the game as opposed to having a resource file containing all of the sprites, etc., how do they make it so that there are special file extensions like .rect and .screen_mode, and so on and so forth.
So does anyone know any good books, articles, websites, etc. that explain the process between completing the pure code for a game and the release of it?

I don't think developers make much of an effort to ensure saves are hidden or unhackable. PC games usually just save out to a folder, one file per save, and any obfuscation is likely the result of using a binary file format (which requires some level of effort to reverse-engineer) or plaintext values that aren't very meaningful out of context, but not deliberate attempts to circumvent hacking. There are probably a ton of PC games that have shipped with very easily hackable text or XML save files, but I've never been a save hacker so I don't have any specific examples. On consoles the save files are going to a memory card or the console's hard drive, which makes them inherently inconvenient to access, but beyond that I don't think console developers make much of an effort to encrypt or otherwise obfuscate save data. That energy would more likely be directed towards securing the game against cheating if it's on online game or just making other systems work better.
Special file extensions come from just using your own extensions and/or defining your own file formats. You can use any extension for any file, so there are tons of "special" file formats that are just text files with a different extension, I've done this plenty of times myself. In other cases, if they have defined their own binary file format, that means they also have their own file parsers to process those files at runtime.
I don't know what platforms you have in mind, but for PC and console games, resources are not embedded in the executable. You will generally see a separate executable and then various archives and configuration files. Depending on the game, it may be a single resource pack, or perhaps a handful of packs for related resources like graphics, sound, level data, etc. As a general observation console games are more aggressively archived (to minimize file operations on slow optical media, and perhaps to overcome limitations of the native file systems on more primitive platforms). Some PC games have very loose assets, with even script files hanging out in the open.

If you develop for Windows or XBox 360, Microsoft might offer some help here. Check out their Game Development tools for Visual Studio C++ Express Edition.

If you are looking for books the Game Development Essentials series should answer your questions.

For circumventing saved file modifications, you can implement a simple encryption algorithm and use it to encrypt saved files, and then decrypt them when loading. File extensions are simply a matter of choice.

To use special file extensions in your game, just do the following:
Create some files in a format of your choice that have that extension, and then
write some code that knows how to read that format, and point it at those files.
File extensions are conventions, nothing more; there's nothing magic about them.
ETA: As for embedding resources, there are a few different ways to approach that problem. One common technique is to keep all your resources bundled together in a small number of files - maybe only one (Guild Wars takes that approach).
At the other extreme, you can leave your resources spread across many files in a directory tree, maybe in a custom format that requires special tools to modify, and maybe not. Civilization 4 does things this way, as do all the Turbine games I'm familiar with. This is a matter of taste, and not very important either way.

I think a better solution is two break your images in tiles of some known size and then join them back to back in some random order in a new file. This random order is only known to you and hence only you know how to jumble the tiles to get the original image back.
The approach would be to maintain a single dimensional array and maintains the position of tiles in it. Know use the crop functions of MIDP to extract each tile and render each tile back to the console.
If you need, I can post the code for you.

I would suggest to check the presentation from the developers of World of Goo (great game):
http://2dboy.com/public/eyawtkagibwata.pdf.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse