MD5 hash of a file in ABAP

MD5 hash of a file in ABAP - hash

I want to generate a MD5 hash of a text file in ABAP. I have not found any standard solution for generating it for a very big file. Function module CALCULATE_HASH_FOR_CHAR does not meet my requirements because it takes a string as an input parameter. Although it works for smaller files, in case of a for example 4 GB file one cannot construct such a big string.
Does anybody know whether there is a standard piece of coding for doing that (my google efforts did not bring me anything) or maybe someone has an MD5 algorithm in ABAP that calculates the hash of a file?

It looks like the implementation of this algorithm is impossible in ABAP because of the fact that the language does not allow arithmetic overflows during the calculations. This should also answer the question why it has not been implemented so far in SAP system. Either way looks that there is no other way as to call an external tool which of course is, regrettably, hardly platform independent.
EDIT: Ok! So with a great help of René and the code of Fast MD5 Implementation in Java I created the implementation of MD5 algorithm in ABAP . This implementation allows to update the calculated hash with more bytes, which of course might be coming from different sources.
There is no method which takes a file so far but anyways most of the work has been done.
Some simple ABAP Unit tests are included in the code, which also document how to use it.

Perhaps you could read the file in data blocks of a couple megabytes and create a hash list of those using the suggested function. And then create a single top hash using the generated hash list.

The SDN is usually a very good starting point for finding ABAP-related solutions. I was able to find this post: http://scn.sap.com/thread/1483479
The author suggests:
Upload the .txt file BUT as BIN.
Calculate the hash code using function MD5_CALCULATE_HASH_FOR_RAW
Are you able to get your file in binary format and use MD5_CALCULATE_HASH_FOR_RAW?
Edit: This post even has a more detailed answer using CALCULATE_HASH_FOR_RAW: http://scn.sap.com/thread/1298723
Quote of Shivanand Kalagi's answer:
STR_LEN = XSTRLEN( DATA ).
CALL FUNCTION 'CALCULATE_HASH_FOR_RAW'
EXPORTING
ALG = 'MD5'
DATA = DATA
LENGTH = STR_LEN
IMPORTING
HASH = L_MD5_HASH.

Related

How to protect files that will be read/written in a deployed application

I am building a Matlab application to be deployed as a compiled executable file.
This application will need to read/write files in a library.
These files contain data and I want to protect them from being read by whomever uses this application. Without any protection, these files would be saved as mat files and could be loaded into Matlab workspace.
I've tried to search for some solutions for encryption. I found some people suggesting AES, but this method seems to have an intrinsic problem of safely storing the encryption key (which I didn't understand exactly why).
Given that I simply want to avoid the user of the application to have access to those data files, what would be the best approach for doing so? If AES is actually a good solution, is it safe to write the encryption key in the code to be compiled?

It sounds like what you're looking for is functional encryption.
In functional encryption, a user holding
the master secret key msk can generate a function key skf
corresponding to a function f; then, anyone having a ciphertext Enc(x)
and a function key skf can compute f(x), but learns nothing else about
the input x.
Note that Enc(x) is the encrypted data and f(x) is some function of the unencrypted data.
Source: https://eprint.iacr.org/2013/229.pdf
Unfortunately, even cutting edge implementations of functional encryption are still impractically slow and not easily generalized to a MATLAB program.

When compiling an application, the MATLAB code files are encrypted; but not, as you’ve discovered, any extra files that you include.
If the data is not too large, consider saving it within a .m file rather than a .mat file. In other words, write a simple MATLAB function that returns your data, and has it hard-coded within the file. As this is now a code file, it will be encrypted as part of the compilation process.
You can even use the built in function matlab.io.saveVariablesToScript to auto-generate this file for you.

Reading & writing text in Scala, getting the encoding right?

I'm reading and writings some text files in Scala. As a complete beginner in the language, I wanted to make sure to find the right way to do it, e.g. get the encoding right.
So most of the stuff I found (also on SO ) recommends I use io.Source.fromFile.However, after trying it out like so, reading a UTF-8 file:
val user_list = Source.fromFile("usernames.txt").getLines.toList
val user_list = Source.fromFile("usernames.txt", enc="UTF8").getLines.toList
I looked at the docs but was left with some questions.
Get the encoding right:
the docs show that I can set an encoding in Source.fromFile as I tried above. Looking at the man on Codec and the types listed there, I was wondering if those are all my codec options - is there e.g. no Utf-16, Big-Endian vs Little-Endian, etc.?
I am slightly obsessed with this since it used to trip me up in Python a lot. Is this less of concern with Scala for some reason?
Get the reading in right:
All the examples I looked at used the getLines method and postprocessed it with MkString or List, etc. Is there any advantage to that over just reading in the entire file (my files are small) in one go?
Get the writing out right:
Every source I could find tells me that Scala has no file writing function and to use the Java FileWriter. I was surprised by this - is this still accurate?
Looking at it I feel the question might be a little broad for SO, so I'd be happy to take it back if it does not meet the requirements. At this point, I'm not struggling with specific examples but rather trying to set things up in a way I don't get in trouble later.
Thanks!

Scala only has a basic IO api in the standard library. For the most part you just use the java apis. The fact that a decent api from java exists is probably why the Scala team is not prioritizing having a robust and fully featured IO api.
There are also third party scala libraries you could use as well however. Better Files I've never used but heard good things about as a Scala file api. As well as fs2 which provides functional, streaming IO. I'm sure there are others out there as well.
For encoding, there are many possible encoding available. It's just that only a couple of the most common ones are available as static fields, the rest you typically access through Codec("Encoding Name"). Most apis will also let you just enter a String directly instead of needing to get a Codec instance first. The codec is really just a wrapper over java.nio.charset.Charset. You can run java.nio.charset.Charset.availableCharsets() to see all of the encodings available on your system.
As far as reading, if the files are small you can load them fully into memory if you prefer that. The only reason not to do so is if you want to avoid the extra memory use of loading the entire file at once if reading through line by line is enough. You may want to use Vector instead of List for efficiency reasons (Vector is better in many cases and should probably be preferred as a default collection, but tradition and old habits die hard and most people/guides seem to default to List, but this is a whole other topic)

Running OPTICS algorithm on ELKI

I'm normally an R user (a beginning R user, but I'm starting to get the hang of it). However, I have heard positive things about ELKI--in particular, its speed. I came across this old post "How to group nearby latitude and longitude locations stored in SQL" and the answer posted by Anony-Mousse is similar to what I'd like to do. I would like to be able to replicate each step he has done up to the KML file he has shared on Google Drive.
I've downloaded ELKI and am able to run the mini-GUI, which looks like the following:
Could someone post some steps on how to do what Anony-Mousse was able to do?
My data is very similar in nature. I have geocoded addresses in a csv file (more specifically, each tuple is an event and one of the variables/features/columns is the geocoded address of the event) and I'm looking to find clusters much like the OP in the link above.
Hopefully, Anony-Mousse will read this post and come to the rescue. But, I'd be grateful if anyone else could help get me on my way.

Sorry about not following up earlier.
I did not keep the code for my experiments you refer to. So I don't remember whether I used a python script to rewrite the output to KML (I believe I did so), or whether I just copy&pasted from the ELKI source to a custom ResultHandler to generate the file.
Probably the first, because writing XML in Java is a bit more complicated (although also more likely to be correct XML then) than just printing the document in Python. If so, I probably used the scipy.spatial package for computing the convex hull, reading the ELKI text output is fairly trivial (just skip comment lines, and take the two numeric columns of the other as coordinates)

What was the original reason for MATLAB's one function = one file and why is it still so?

What was the original reason for MATLAB's one (primary) function = one file, and why is it still so, after so many years of development?
What are the advantages of this approach, compared to its disadvantages (people put too many things in functions and scripts, when they should obviously be separated ... resulting in loss of code clarity)?

Matlab's schema of loading one class/function per file seems to match Java's choice in this matter. I am betting that there were other technical reasons for speeding up the parser in when it was introduced the 1980's. This schema was chosen by Java to discourage extremely large files with everything stuffed inside, which has been the primary argument for any language I've seen using one-file class symantics.
However, forcing one class per file semantics doesn't stop mega files -- KPIB is a perfect example of a complicated, horrifically long function/class file (though a quite useful maga file). So the one class file system is a way of trying to make the user aware about code abstraction more than a functionally useful mechanism.
A positive result of the one function/class file system of Matlab is that it's very easy to know what functions are available at a quick glance of a project directory. Additionally many of the names had to be made descriptive enough to differentiate them from other files, so naming as a minor form of documentation is present as a side effect.
In the end I don't think there are strong arguments for or against one file classes as it's usually just a minor semantically change to go from onw to the other (unless your code is in a horribly unorganized state... in which case you should be shamed into fixing it).
EDIT!
I fixed the bad reference to Matlab adopting Java's one class file system -- after more research it appears that both developers adopted this style independently (or rather didn't specify that the other language influenced their decision). This is especially true since Matlab didn't bundle Java until 2000.

I don't think there any advantage. But you can put as many functions as you need in a single file.
For example:
classdef UTILS
methods (Static)
function help
% prints help for all functions
disp(char(methods(mfilename, '-full')));
end
function func_01()
end
function func_02()
end
% ...more functions
end
end
I find it very neat.
>> UTILS.help
obj UTILS
Static func_01
Static func_02
Static help
>> UTILS.func_01()

testing strategies: generating a XML file

I'm writing a couple of classes that generate xml file. (Details probably not important at the moment).
I wondering the best testing strategy is.
I don't want to re-write the xml generation code just to compare the output, when I could write the file to disk and compare it at certain milestones (the xml spec won't change often, like once or twice every couple of years)
I'm more interested in testing the behaviour of the architecture instead of the getters & setters
Options that come to mind:
rebuilding the xml file in the testing environment and comparing the string representations
manually checking the result (writing to file, etc)
rebuilding the xml file in memory in the testing environment and comparing the in-memory elements.
Virtual Bonus if you know any libraries for C++ and/or Google Test.
Ideas?

Have you considered using XSD's and validating your XML to the XSD? You didn't mention if it was content or structure you were testing for (probably both).
If it validates, it will test the structure of the XML will conform to the required structure.

In the past I've approached this two ways:
Compare the xml file against the result stored as a string in the test file. This is easy to implement, and unless you are wanting to generate variations of the xml file for testing purposes, the string comparison method works fine.
In the case where you have a xml file writer and reader, you can compare the original with the round trip result.
I agree with you that you shouldn't replicate the logic to generate the file in the test function, just for the purpose of testing. Also, I would try to avoid the need to write to the file system -- this is unecessary dependence on the file system, and would probably result in slower running tests.

You might consider using XML Unit: http://xmlunit.sourceforge.net.
It provides JUnit extension classes which can be used to assert equality
of XML files.

You might consider an XML diff tool. There is a free one available on MSDN: XML Diff and Patch Tool.
I see you are looking for C++ tools. In that case, libxmldiff might be more suitable.