Building ROM images on CP/M - intel-8080

I'm trying to use the venerable M80 and L80 tools on CP/M to build a ROM image. (It's for a CP/M emulator, hence why I'm using CP/M tools.)
Unfortunately L80 seems to be really crude --- AFAICT it just loads each object file at its absolute address, fixes it up, and then dumps everything from 0x0100 up out to disk. This means that object files that are based at addresses outside its own workspace don't appear to work at all (just producing an error message). My ROM has a base address of 0xd000, which is well outside this.
Does anyone know if it's possible to use M80 and L80 to do this, and if so, how? Alternatively can anyone recommend (and point me at!) a CP/M assembler/linker suite that will?
(Note that I'd like to avoid cross compiling, if possible.)

If you're just assembling one file, then you can use M80's .phase directive to have the assembler locate the output.
.phase 0D000h
If you want to build several source files and link them at the end, then you can still use M80 but you'll need DRI's linker LINK.COM, which can be found in http://www.cpm.z80.de/download/pli80_13.zip. The LINK command line to use would be
LINK result=module1,module2,module3[LD000
(The nearest L80 equivalent would, I think, be
L80 /P:D000,module1,module2,module3,result/N/E
but then you have to remove 0xCF00 bytes from the start of the resulting file).

Old question, but this may work for those who are still looking. I checked this out on my Ampro Little Board running 1980 M80/L80 on CP/M 2.2.
You can use the ASEG (absolute) directive in your starting .MAC file, specify 0D000H as the org, and then reference external modules. As long as those external modules don't include DSEG or PSEG directives you should be able to link them all together with 0D000H as the starting address. E.g.
; TEST.MAC
ASEG
ORG 0D000H
public tstart
tstart:
...
call myfoo## ; call routine myfoo in external module foo.rel
...
end tstart
Assemble it:
M80 TEST,=TEST
Link it with foo.rel and use /X on the output to produce a .HEX file (TEST.HEX):
L80 TEST,FOO,TEST/N/X/E
If you examine the resulting .HEX file you should see the starting address is 0D000H.
BTW: If you don't use /X option then L80 with /N/E will make a .COM with with all the code linked using an offset of 0D000H unless you also include a .phase directive. E.g.:
; TEST.MAC
ASEG
ORG 100H
.phase 0D000H
public tstart
tstart:
...
call myfoo## ; call routine myfoo in external module foo.rel
...
end tstart
Link to make a .COM instead of a .HEX:
L80 TEST,FOO,TEST/N/E <== note no '/X'
You can't run it, but you can consider the .COM file is really a .BIN padded to the nearest 128 byte boundary (assuming that your CP/M is using the typical approach of allocating 128 byte blocks). You can confirm the result by doing a DUMP of the .COM file. If the code was very short it may also include leftover pieces of L80 loader code that wasn't overwritten by your code.
Note you can use also the ASEG approach with org 0100H to make a regular CP/M .COM. In that case you don't need to use .phase assuming the start of your code is at 100H.

Related

IDA Pro string function

I have this binary file that I wish to edit, however after loading it, all strings are in some sort of gibberish symbols. Is there anyway to format it?
Why you are seeing "gibberish":
The strings are likely obfuscated. Chances are, before each of the strings is used in the program, a deobfuscation routine is run to convert the string in memory back into something meaningful. This is a common technique used to prevent static analysis tools (such as the GNU "strings" utility or IDA Pro) from properly analyzing the binary. The rest of this answer makes the assumption that this is true of your binary.
How to deobfuscate the strings (dynamic approach):
If you are able to run the binary, you can let it take care of the deobfuscation for you. All you need to do is run the binary in a debugger and analyze the memory after it has been deobfuscated.
Several binaries that obfuscate their strings never re-obfuscate them after their use, so one interesting shortcut you might want to try first is to run the binary in a debugger and break execution right before it exits. If the strings are still debofuscated, you can do a memory dump of the appropriate section to save the deobfuscated strings. (This will not necessarily deobfuscate all of the strings for you; you'll only get the strings that were deobfuscated along the path of the binary's execution)
If the previous method does not work for you, try setting a hardware write breakpoint on the first byte of an obfuscated string, then running the binary. If the breakpoint trips, step through the instructions to allow the rest of the string to be deobfuscated. If the deobfuscation always happens from a common routine, you can place a breakpoint near the end of that routine and possibly script your debugger to print the debofuscated string each time execution passes through that routine.
Once you have a list of deobfuscated strings, you can either patch them directly into the IDA database (discussed below), or you can leave repeatable comments (use the ' key) at the addresses of each of the strings in the database, such that the deobfuscated string will display as a comment on every instruction that references it.
For small binaries, you can get away with doing the annotations by hand, but it would be worthwhile to read into scripting IDA so that you can automate this process. The IDA Pro Book contains a great reference for this.
How to deobfuscate the strings (static approach):
If you can't run the binary, or if the dynamic approach isn't deobfuscating all the strings for you, then you can deobfuscate them yourself.
Chances are good that if you view the cross-references to any of the obfuscated strings in IDA Pro (view them with the x key), you should be taken to the deobfuscation routine. If the routine isn't too complicated -- and they usually aren't -- you should be able to write a script to emulate the debofuscation routine. This will allow you to replace the obfuscated strings with the deobfuscated strings in the IDA database.
(As a point of clarification, the IDA database is entirely separate from the binary itself. Anything you do to the database will have no effect on the actual binary, and anything you do to the binary will have no effect on the database)
Your options for scripting IDA are IDC (IDA's original built-in scripting language) and IDAPython. I highly recommend using IDAPython, as it is much easier to use, and a much more powerful language. I'm not sure if you can install IDAPython on IDA Free 5.0, but it should be bundled with all vaguely recent versions of IDA Pro.
Giving an overview of scripting IDA would be beyond the scope of this answer, but here's an example to get you started. I'm writing it in IDC in case you're using IDA Free. Let's say your deobfuscation routine simply XOR'd each successive byte with 0x1F until the null byte was decoded. Then the following loop might end up being part of your IDC script:
// *EXAMPLE*
auto addr = 0x00401000; // The address of your string
while(1){
auto b = Byte(addr) ^ 0x1F;
PatchByte(addr, b);
if (b == '\0'){
break;
}
addr = addr + 1;
}
Running a script can be done from File > IDC Command... or File > Script file....
As you might guess, Byte returns the byte stored at a given address, and PatchByte writes a byte to an address. Built-in functions in IDAPython share the same names with their IDC counterparts, so the IDAPython version would be nearly identical, sans the C-like syntax. As mentioned before, I highly recommend The IDA Pro Book for a walkthrough on scripting IDA. Once you have the basics down, you can use IDA's built-in help index and The IDAPython documentation as a couple other references.
Always save your database before running a script that patches code! There is no "undo" feature in IDA, so a small coding error could trash your entire database.
Good luck!

Cleanup huge Perl Codebase

I am currently working on a roughly 15 years old web application.
It contains mainly CGI perl scripts with HTML::Template templates.
It has over 12 000 files and roughly 260 MB of total code. I estimate that no more than 1500 perl scripts are needed and I want to get rid of all the unused code.
There are practically no tests written for the code.
My questions are:
Are you aware of any CPAN module that can help me get a list of only used and required modules?
What would be your approach if you'd want to get rid of all the extra code?
I was thinking at the following approaches:
try to override the use and require perl builtins with ones that output the loaded file name in a specific location
override the warnings and/or strict modules import function and output the file name in the specific location
study the Devel::Cover perl module and take the same approach and analyze the code when doing manual testing instead of automated tests
replace the perl executable with a custom one, which will log each name of file it reads (I don't know how to do that yet)
some creative use of lsof (?!?)
Devel::Modlist may give you what you need, but I have never used it.
The few times I have needed to do somehing like this I have opted for the more brute force approach of inspecting %INC at the end the program.
END {
open my $log_fh, ...;
print $log_fh "$_\n" for sort keys %INC;
}
As a first approximation, I would simply run
egrep -r '\<(use|require)\>' /path/to/source/*
Then spend a couple of days cleaning up the output from that. That will give you a list of all of the modules used or required.
You might also be able to play around with #INC to exclude certain library paths.
If you're trying to determine execution path, you might be able to run the code through the debugger with 'trace' (i.e. 't' in the debugger) turned on, then redirect the output to a text file for further analysis. I know that this is difficult when running CGI...
Assuming the relevant timestamps are turned on, you could check access times on the various script files - that should rule out any top-level script files that aren't being used.
Might be worth adding some instrumentation to CGI.pm to log the current script-name ($0) to see what's happening.

Is File::Spec really necessary?

I know all about the history of different OSes having different path formats, but at this point in time there seems to be a general agreement (with one sorta irrelevant holdout*) about how paths work. I find the whole File::Spec route of path management to be clunky and a useless pain.
Is it really worth having this baroque set of functions to manipulate paths? Please convince me I am being shortsighted.
* Irrelevant because even MS Windows allows forward slashes in paths, which means the only funky thing is the volume at the start and that has never really been a problem for me.
Two major systems have volumes. What's the parent of C:? In unix, it's C:/... In Windows, it's C:... (Unfortunately, most people misuse File::Spec to the point of breaking this.)
There are three different set of path separators in the major systems. The fact that Windows supports "/" could simplify building paths, but it doesn't help in parsing them or to canonising them.
File::Spec also provides useful functions that make it useful even if every system did use the same style of paths, such as the one that turns a path into a relative path.
That said, I never use File::Spec. I use Path::Class instead. Without sacrificing any usability or usefulness, Path::Class provides a much better interface. And it doesn't let users mishandle volumes.
For usual file management inside Perl, No, File::Spec is not necessary and using forward slahes everywhere makes much less pain and works on Win32 anyways.
cpanminus is a good example used by lots of people and have been proved work great on win32 platform. it doesn't use File::Spec for most file path manipulation and just uses forward slashes - that was even suggested so by the experienced Perl-Win32 developers.
The only place I had to use File::Spec's catfile in cpanm, though, is where I extract file paths from a perl error message (Can't locate File\Path.pm blah blah) and create a file path to pass to the command line (i.e. cmd.exe).
Meanwhile File::Spec provides useful functions such as canonical and rel2abs - that's not "necessary" per se but really useful.
Yes absolutely.
Golden rule of programming, never hard code string literals.
Edit: One of the best ways to avoid porting issues is to avoid OS specific constants especially in the form of inline literals.
i.e e.g drive + ":/" + path + "/" + filename
It is bad practice yet We all commit these attrocities in the haste of the moment or because it doesn't matter for that piece of code. File::Spec is there for when a programmer is adhering to gospel programming.
In addition it provides the values of special and often used system directories e.g tmp or devnull which can vary from one distribution/OS to another.
If anything it could probably do with some other members added to it like user to point to the users home directory
makepp (makepp.sourceforge.net) has a makefile variable $/ which is either / or \ (on non-Cygwin Win). The reason is that Win accepts / in filenames, but not in command names (where it starts an option).
From http://perldoc.perl.org/File/Spec.html:
catdir
Concatenate two or more directory names to form a complete path ending with a directory. But remove the trailing slash from the resulting string, because it doesn't look good, isn't necessary and confuses OS/2. Of course, if this is the root directory, don't cut off the trailing slash :-)
So for example in this example I wouldn't need the regex to remove the trailing slash if I would use catdir.

How to discover command line options (if any) for an undocumented executable of unknown origin?

Take an undocumented executable of unknown origin. Trying /?, -h, --help from the command line yields nothing. Is it possible to discover if the executable supports any command line options by looking inside the executable? Possibly reverse engineering? What would be the best way of doing this?
I'm talking about a Windows executable, but would be interested to hear what different approaches would be needed with another OS.
In linux, step one would be run strings your_file which dumps all the strings of printable characters in the file. Any constants chars will thus be shown, including any "usage" instructions.
Next step could be to run ltrace on the file. This shows all function calls the program does. If it includes getopt (or familiar), then it is a sure sign that it is processing input parameters. In fact, you should be able to see exactly what argument the program is expecting since that is the third parameter to the getopt function.
For Windows, you can see this question about decompiling Windows executables. It should be relatively easy to at least discover the options (what they actually do is a different story).
If it's a .NET executable try using Reflector. This will convert the MSIL code into the equivalent C# code which may make it easier to understand. Unfortunately private and local variable names will be lost, as these are not stored in the MSIL but it should still be possible to follow what's going on.

WinDbg, display Symbol Server paths of loaded modules (even if the symbols did not load)?

Is there a way from WinDbg, without using the DbgEng API, to display the symbol server paths (i.e. PdbSig70 and PdbAge) for all loaded modules?
I know that
lml
does this for the modules whose symbols have loaded. I would like to know these paths for the symbols that did not load so as to diagnose the problem. Anyone know if this is possible without having to utilize the DbgEng API?
edited:
I also realize that you can use
!sym noisy
to get error messages about symbols loading. While this does have helpful output it is interleaved with other output that I want and is not simple and clear like 'lml'
!sym noisy and !sym quiet can turn on additional output for symbol loading, i.e.:
!sym noisy
.reload <dll>
X <some symbol in that DLL to cause a load>
!sym quiet
When the debugger attempts to load the PDB you will see every path that it tries to load and if PDB's weren't found or were rejected.
To my knowledge there's no ready solution in windbg.
Your options would be to either write a nifty script or an extension dependent on where you're the fittest.
It is pretty doable within windbg as a script. The information you're after is described in the PE debug directory.
Here's a link to the c++ sample code that goes into detail on extracting useful information (like the name of the symbol file in your case). Adapting it to windbg script should be no sweat.
Here's another useful pointer with tons of information on automating windbg. In particular, it talks about ways of passing arguments to windbg scripts (which is useful in your case as well, to have a common debug info extraction code which you can invoke from within the loaded modules iteration loop).
You can use the command
lme
to show the modules that did not have any symbols loaded.
http://ntcoder.com/bab/tag/lme/