How can I set the SHA digest size in java.security.MessageDigest? - hash

I am kinda playing with the SHA-1 algorithm. I want to find out differences and variations in the results if I change few values in the SHA-1 algorithm for a college report. I have found a piece of java code to generate hash of a text. Its done by importing
java.security.MessageDigest
class. However, I want to change the h0-4 values and edit them but I don't know where can I find them? I had a look inside the MessageDigest class but couldn't find it there. Please help me out!
Thanx in advance.

I don't believe you can do that. Java doesn't provide any API for its MessageDigest Class, which can allow you change the values.
However, there are some workarounds (none of which I've ever tried). Take a look at this answer to the question "How to edit Java Platform Package (Built-in API) source code?"

If you're playing around with tweaks to an algorithm, you shouldn't be using a built-in class implementing that algorithm. The class you mention is designed to implement standard algorithms for people who just want to use them in production; if you're using SHA-1 (or any cryptographic algorithm) instead of playing around and tweaking it, it's never a good idea to change the algorithm yourself (e.g. by changing the initial hash value), so the class does not support modifying those constants.
Just implement the algorithm yourself; from Wikipedia's pseudocode, it doesn't look like it's all that complicated. I know that "don't implement your own crypto, use a standard and well-tested implementation" is a common mantra here, but that only applies to production-type code -- if you're playing around with an algorithm to see what effect tweaking it has, you should implement it yourself, so you have more flexibility in modifying it and seeing the effect of the modifications.

Basically adding to #Rahil's answer but too much for comments:
Even without API access, if MessageDigest were the implementation you could use reflection. But it's not.
Most of the java standard library is just commonly-useful classes in the usual way, e.g. java.util.ArrayList contains the implementation of ArrayList (or ArrayList<?> since 6), java.io.FileInputStream contains the implementation of FileInputStream (although it may use other classes in that implementation), etc. Java Cryptography uses a more complicated scheme where the implementations are not in the API classes but instead in "providers" that are mostly in their own jars (in JRE/lib and JRE/lib/ext) not rt.jar and mostly(?) don't have source in src.zip.
Thus the java.security.MessageDigest class does not have the code to implement SHA1, or SHA256, or MD5, etc etc. Instead it has code to search the JVM's current list of crypto providers to find an implementation of whatever algorithm is asked for, and instantiate and use that. Normally the list of providers used is set to (the list of) those included in the JRE distribution, although an admin or program can change it.
With the normal JRE7 providers, SHA1 is implemented by sun.security.provider.SHA.
In effect the API classes like MessageDigest Signature Cipher KeyGenerator etc function more like interfaces or facades by presenting the behavior that is common to possibly multiple underlying implementations, although in Java code terms they are actual classes and not interfaces.
This was designed back in 1990 or so to cope with legal restrictions on crypto in effect then, especially on export from the US. It allowed the base Java platform to be distributed easily because by itself it did no crypto. To use it -- and even if you don't do "real" crypto on user data in Java you still need things like verification of signed code -- you need to add some providers; you might have one set of providers, with complete and strong algorithms, used in US installations, and a different set, with fewer and weaker algorithms, used elsewhere. This capability is now much less needed since the US officially relaxed and in practice basically dropped enforcement about 2000, although there are periodically calls to bring it back. There is still one residual bit, however: JCE (in Oracle JREs) contains a policy that does not allow symmetric keys over 128 bits; to enable that you must download from the Oracle website and install an additional (tiny) file "JCE Unlimited Strength Policy".
TLDR: don't try to alter the JCE implementation. As #cpast says, in this case where you want to play with something different from the standard algorithm, do write your own code.

Related

How do I check valid method name for an Object in Pascal?

I have a class (character) with inherited classes (solider, medic etc) that have specific game related methods. E.g. Shoot or Heal.
I want it so that the user can type in Heal, for example, and the program can check what type of character they have and therefore see if that is a valid name of a method in that Object.
I know it's possible in other languages but can't see how to do it in Pascal. It must work in Free Pascal as well as Delphi. Thank you
You don't need to be able to check for the validity of a method name to do this, and it is probably preferable if you don't.
You could do check a method name's using RTTI, but that is implemented somewhat differently in FreePascal than Delphi, (in particular for extended RTTI).
However, it would be far more straightforward to implement your own look-up mechanism to resolve in-game entity-names, properties and verbs in a dictionary of some sort. That would be trivial in both FP and Delphi and independent of the compiler used. It would also allow the names used by the end-user to be independent of the names used in code, which would be easier internationalisation, etc. It would also avoid the problem which would arise if an in-game identifier contained a character not permitted in a Pascal identifier (such as a space, accented character or whatever).
PS: You didn't ask this, BUT ... if I were contemplating writing a text-game of any size, I would seriously consider doing it as a hybrid Delphi of Prolog: Delpi for the gui and Prolog as a far easier language in which to code in-game actions, objects and rules, and there is one paricular implementation, Amzi Prolog, which has a very rich interface for interfacing a Prolog engine with Delphi -see https://www.amzi.com/#apls. Amzi used to be commercial but is now PD, fwiw.

Representing classes and interfaces in a language neutral way

I need to define simple classes and interfaces (Ex. IClassInterface) in a language neutral way and then use a variety of code generation tools to generate the code files in a variety of languages such as C#, Java, etc... Does anyone know of a standard; ratified or otherwise; that I can use for the neutral representation. I know UML is often used for creating diagrams, but I am actually looking for something that can easily be parsed, extended, and used to drive other automated processes. Maybe this is actually possible with UML, although I am not sure what the markup language might look like if one exists.
I could create my own definition using XML or something similar, but I would prefer to avoid reinventing the wheel if possible.
UML
I think you might be looking for XMI (XML Metadata Interchange)
There is IDL (for example, Google's protocol buffers), and WSDL, which can be used to produce interfaces and classes by many web service frameworks. (You typically do not have to use the generated code as an actual webservice.)
The wikipedia entry for IDL lists a number of implementations of IDL. Although IDL is mainly for describing interfaces, some implementations also use it to describe objects (e.g. Microsoft IDL.)

Why use a post compiler?

I am battling to understand why a post compiler, like PostSharp, should ever be needed?
My understanding is that it just inserts code where attributed in the original code, so why doesn't the developer just do that code writing themselves?
I expect that someone will say it's easier to write since you can use attributes on methods and then not clutter them up boilerplate code, but that can be done using DI or reflection and a touch of forethought without a post compiler. I know that since I have said reflection, the performance elephant will now enter - but I do not care about the relative performance here, when the absolute performance for most scenarios is trivial (sub millisecond to millisecond).
Let's try to take an architectural point on the issue. Say you are an architect (everyone wants to be an architect ;)
You need to deliver the architecture to your team:
a selected set of libraries, architectural patterns, and design patterns. As a part of your design, you say: "we will implement caching using the following design pattern:"
string key = string.Format("[{0}].MyMethod({1},{2})", this, param1, param2 );
T value;
if ( !cache.TryGetValue( key, out value ) )
{
using ( cache.Lock(key) )
{
if (!cache.TryGetValue( key, out value ) )
{
// Do the real job here and store the value into variable 'value'.
cache.Add( key, value );
}
}
}
This is a correct way to do tracing. Developers are going to implement this pattern thousands of times, so you write a nice Word document telling how you want the pattern to be implemented. Yeah, a Word document. Do you have a better solution? I'm afraid you don't. Classic code generators won't help. Functional programming (delegates)? It works fairly well for some aspects, but not here: you need to pass method parameters to the pattern. So what's left? Describe the pattern in natural language and trust developers will implement them.
What will happen?
First, some junior developer will look at the code and tell "Hm. Two cache lookups. Kinda useless. One is enough." (that's not a joke -- ask the DNN team about this issue). And your patterns cease to be thread-safe.
As an architect, how do you ensure that the pattern is properly applied? Unit testing? Fair enough, but you will hardly detect threading issues this way. Code review? That's maybe the solution.
Now, what is you decide to change the pattern? For instance, you detect a bug in the cache component and decide to use your own? Are you going to edit thousands of methods? It's not just refactoring: what if the new component has different semantics?
What if you decide that a method is not going to be cached any more? How difficult will it be to remove caching code?
The AOP solution (whatever the framework is) has the following advantages over plain code:
It reduces the number of lines of code.
It reduces the coupling between components, therefore you don't have to change much things when you decide to change the logging component (just update the aspect), therefore it improves the capacity of your source code to cope with new requirements over time.
Because there is less code, the probability of bugs is lower for a given set of features, therefore AOP improves the quality of your code.
So if you put it all together:
Aspects reduce both development costs and maintenance costs of software.
I have a 90 min talk on this topic and you can watch it at http://vimeo.com/2116491.
Again, the architectural advantages of AOP are independent of the framework you choose. The differences between frameworks (also discussed in this video) influence principally the extent to which you can apply AOP to your code, which was not the point of this question.
Suppose you already have a class which is well-designed, well-tested etc. You want to easily add some timing on some of the methods. Yes, you could use dependency injection, create a decorator class which proxies to the original but with timing for each method - but even that class is going to be a mess of repetition...
... or you can add reflection to the mix and use a dynamic proxy of some description, which lets you write the timing code once, but requires you to get that reflection code just right -which isn't as easy as it might be, especially if generics are involved.
... or you can add an attribute to each method that you want timed, write the timing code once, and apply it as a post-compile step.
I know which seems more elegant to me - and more obvious when reading the code. It can be applied even in situations where DI isn't appropriate (and it really isn't appropriate for every single class in a system) and with no other changes elsewhere.
AOP (PostSharp) is for attaching code to all sorts of points in your application, from one location, so you don't have to place it there.
You cannot achieve what PostSharp can do with Reflection.
I personally don't see a big use for it, in a production system, as most things can be done in other, better, ways (logging, etc).
You may like to review the other threads on this matter:
Anyone with Postsharp experience in production?
Other than logging, and transaction management what are some practical applications of AOP?
Aspect Oriented Programming: What do you use PostSharp for?
etc (search)
Aspects take away all the copy & paste - code and make adding new features faster.
I hate nothing more than, for example, having to write the same piece of code over and over again. Gael has a very nice example regarding INotifyPropertyChanged on his website (www.postsharp.net).
This is exactly what AOP is for. Forget about the technical details, just implement what you are being asked for.
In the long run, I think we all should say goodbye to the way we are writing software now. It's tedious and plainly stupid to write boilerplate code and iterate manually.
The future belongs to declarative, functional style being held together by an object oriented framework - and the cross cutting concerns being handled by aspects.
I guess the only people who will not get it soon are the guys who are still payed for lines of code.

Code generation for Java JVM / .NET CLR

I am doing a compilers discipline at college and we must generate code for our invented language to any platform we want to. I think the simplest case is generating code for the Java JVM or .NET CLR. Any suggestion which one to choose, and which APIs out there can help me on this task? I already have all the semantic analysis done, just need to generate code for a given program.
Thank you
From what I know, on higher level, two VMs are actually quite similar: both are classic stack-based machines, with largely high-level operations (e.g. virtual method dispatch is an opcode). That said, CLR lets you get down to the metal if you want, as it has raw data pointers with arithmetic, raw function pointers, unions etc. It also has proper tailcalls. So, if the implementation of language needs any of the above (e.g. Scheme spec mandates tailcalls), or if it is significantly advantaged by having those features, then you would probably want to go the CLR way.
The other advantage there is that you get a stock API to emit bytecode there - System.Reflection.Emit - even though it is somewhat limited for full-fledged compiler scenarios, it is still generally enough for a simple compiler.
With JVM, two main advantages you get are better portability, and the fact that bytecode itself is arguably simpler (because of less features).
Another option that i came across what a library called run sharp that can generate the MSIL code in runtime using emit. But in a nicer more user friendly way that is more like c#. The latest version of the library can be found here.
http://code.google.com/p/runsharp/
In .NET you can use the Reflection.Emit Namespace to generate MSIL code.
See the msdn link: http://msdn.microsoft.com/en-us/library/3y322t50.aspx

Are there guidelines for updating C++Builder applications for C++Builder 2009?

I have a range of Win32 VCL applications developed with C++Builder from BCB5 onwards, and want to port them to ECB2009 or whatever it's now called.
Some of my applications use the old TNT/TMS unicode components, so I have a good mix of AnsiStrings and WideStrings throughout the code. The new version introduces UnicodeString, and a bunch of #defines that change the way functions like c_str behave.
I want to modify my code in a way that is as backwards-compatible as possible, so that the same code base can still be compiled and run (in a non-unicode fashion) on BCB2007 if necessary.
Particular areas of concern are:
Passing strings to/from Win32 API
functions
Interop with TXMLDocument
'Raw' strings used for RS232 comms, etc.
Rather than knife-and-fork the changes, I'm looking for guidelines that I can apply to ease the migration, while keeping backwards compatibility wherever possible.
If no such guidelines already exist, maybe we can formulate some here?
The biggest issue is compatibility for C++Builder 2009 and previous versions, the Unicode differences are some, but the project configuration files have changed as well. From the discussions I've been following on the CodeGear forums, there are not a whole lot of choices in the matter.
I think the first place to start, if you have not done so, is the C++Builder 2009 release notes.
The biggest thing seen has been the TCHAR mapping (to wchar or char); using the STL string varieties may be a help, since they shouldn't be very different between the two versions. The mapping existed in C++Builder 2007 as well (with the tchar header).
For any code that does not need to be explicitally Ansi or explitically Unicode, you should consider using the System::String, System::Char, and System::PChar typedefs as much as possible. That will help ease a lot of migration, and they work in previous versions.
When passing a System::String to an API function, you have to take into account the new "TCHAR maps to" setting in the Project options. If you try to pass AnsiString::c_str() when "TCHAR maps to" is set to "wchar_t", or UnicodeString::c_str() when "TCHAR maps to" is set to "char", you will have to perform appropriate typecasts. If you have "TCHAR maps to" set to "wchar_t". Technically, UnicodeString::t_str() does the same thing as TCHAR does in the API, however t_str() can be very dangerous if you misuse it (when "TCHAR maps to" is set to "char", t_str() transforms the UnicodeString's internal data to Ansi).
For "raw" strings, you can use the new RawByteString type (though I do not recommend it), or TBytes instead (which is an array of bytes - recommended). You should not be using Ansi/Wide/UnicodeString for non-character data to begin with. Most people used AnsiString as makeshift data buffers in past versions. Do not do that anymore. This is particularly important because AnsiString is now codepage-aware, and thus your data might get converted to other codepages when you least expect it.