Function address not in the range of memory-mapped files for the lisp image process - lisp

After defining and disassembling the function fn, I can see that the function (or code component) resides at memory address 0x53675216. But I don't see said memory address to be in the range of memory-mapped files attributed to the lisp image process (I'm using SBCL).
Am I missing something about how the internals of a process work?
FWIW My goal was to dump the entire memory of the process and inspect some of the memory. But if I can't even get at a function that I defined, what's the point?

Please post the actual text rather than a picture of the text.
/proc/<pid>/map_files is not the right thing to look at: instead look at /proc/<pid>/maps which shows you all the memory maps.
In my case if I define & compile foo on SBCL on x64 / Linux as:
(defun foo ())
Then (disassemble 'foo) looks like:
; disassembly for foo
; Size: 21 bytes. Origin: #x53624A7C ; foo
[...]
And I can use my hexdump-thing function from this answer to check this:
> (hexdump-thing #'foo)
> (hexdump-thing #'foo)
lowtags: 1011
function: 0000000053624A6B : 0000000053624A60
So the actual address of the object is #x0000000053624A60, which is compatible with what disassemble is saying.
So then if I look at /proc/<pid>/maps I see, among all the other lines, two lines like:
52a00000-533f8000 rwxp 016a8000 fd:00 2758077 /local/environments/sbcl/lib/sbcl/sbcl.core
533f8000-5ac00000 rwxp 00000000 00:00 0
The fields in this file are address, permissions, offset, device, inode, file. You can see that the range that includes the function's address is not mapped to any file (note that p means 'copy on write' so the range mapped to the core file will never be written back to the core file).
The function's definition lives somewhere in this anonymous range of memory.
As a note: if you want to investigate the memory of the implementation, do it from within the implementation, which will be hugely easier than trying to investigate it from outside. SBCL has lots of support for this sort of thing, although you have to find some of them by grovelling around in the source. After all this sort of thing is exactly what a garbage collector has to do.

Related

How to pinpoint where in the program a crash happened using .dmp and WinDbg?

I have a huge application (made in PowerBuilder) that crashes every once in a while so it is hard to reproduce this error. We have it set up so that when a crash like this occurs, we recieve a .dmp file.
I used WinDbg to analyze my .dmp file with the command !analyze -v. From this I can deduct that the error that occured was an Access Violation C0000005. Based on the [0] and [1] parameters, it attempted to dereference a null pointer.
WinDbg also showed me STACK_TEXT consisting of around 30 lines, but I am not sure how to read it. From what I have seen I need to use some sort of symbols.
First line of my STACK_TEXT is this:
00000000`00efca10 00000000`75d7fa46 : 00000000`10df1ae0 00000000`0dd62828 00000000`04970000 00000000`10e00388 : pbvm!ob_get_runtime_class+0xad
From this, my goal is to analyze this file to figure out where exactly in the program this error happened or which function it was in. Is this something I will be able to find after further analyzing the stack trace?
How can I pinpoint where in the program a crash happened using .dmp and WinDbg so I can fix my code?
If you analyze a crash dump with !analyze -v, the lines after STACK TEXT is the stack trace. The output is equivalent to kb, given you set the correct thread and context.
The output of kb is
Child EBP
Return address
First 4 values on the stack
Symbol
The backticks ` tell you that you are running in 64 bit and they split the 64 bit values in the middle.
On 32 bit, the first 4 parameters on the stack were often equivalent to the first 4 parameters to the function, depending on the calling convention.
On 64 bit, the stack is not so relevant any more, because with the 64 bit calling convention, parameters are passed via registers. Therefore you can probably ignore those values.
The interesting part is the symbol like pbvm!ob_get_runtime_class+0xad.
In front of ! is the module name, typically a DLL or EXE name. Look for something that you built. After the ! and before the + is a method name. After the + is the offset in bytes from the beginning of the function.
As long as you don't have functions with thousands of lines of code, that number should be small, like < 0x200. If the number is larger than that, it typically means that you don't have correct symbols. In that case, the method name is no longer reliable, since it's probably just the last known (the last exported) method name and a faaaar way from there, so don't trust it.
In case of pbvm!ob_get_runtime_class+0xad, pbvm is the DLL name, ob_get_runtime_class is the method name and +0xad is the offset within the method where the instruction pointer is.
To me (not knowing anything about PowerBuilder) PBVM sounds like the PowerBuilder DLL implementation for Virtual Memory. So that's not your code, it's the code compiled by Sybase. You'd need to look further down the call stack to find the culprit code in your DLL.
After reading Wikipedia, it seems that PowerBuilder does not necessarily compile to native code, but to intermediate P-Code instead. In this case you're probably out of luck, since your code is never really on the call stack and you need a special debugger or a WinDbg extension (which might not exist, like for Java). Run it with the -pbdebug command line switch or compile it to native code and let it crash again.

Programming in QuickBasic with repl.it?

I'm trying to get a "retro-computing" class open and would like to give people the opportunity to finish projects at home (without carrying a 3kb monstrosity out of 1980 with them) I've heard that repl.it has every programming language, does it have QuickBasic and how do I use it online? Thanks for the help in advance!
You can do it (hint: search for QBasic; it shares syntax with QuickBASIC), but you should be aware that it has some limitations as it's running on an incomplete JavaScript implementation. For completeness, I'll reproduce the info from the original blog post:
What works
Only text mode is supported. The most common commands (enough to run
nibbles) are implemented. These include:
Subs and functions
Arrays
User types
Shared variables
Loops
Input from screen
What doesn't work
Graphics modes are not supported
No statements are allowed on the same line as IF/THEN
Line numbers are not supported
Only the built-in functions used by NIBBLES.BAS are implemented
All subroutines and functions must be declared using DECLARE
This is far from being done. In the comments, AC0KG points out that
P=1-1 doesn't work.
In short, it would need another 50 or 100 hours of work and there is
no reason to do this.
One caveat that I haven't been able to determine is a statement like INPUT or LINE INPUT... They just don't seem to work for me on repl.it, and I don't know where else one might find qb.js hosted.
My recommendation: FreeBASIC
I would recommend FreeBASIC instead, if possible. It's essentially a modern reimplementation coded in C++ (last I knew) with additional functionality.
Old DOS stuff like the DEF SEG statement and VARSEG function are no longer applicable since it is a modern BASIC implementation operating on a 32-bit flat address space rather than 16-bit segmented memory. I'm not sure what the difference between the old SADD function and the new StrPtr function is, if there is any, but the idea is the same: return the address of the bytes that make up a string.
You could also disable some stuff and maintain QB compatibility using #lang "qb" as the first line of a program as there will be noticeable differences when using the default "fb" dialect, or you could embrace the new features and avoid the "qb" dialect, focusing primarily on the programming concepts instead; the choice is yours. Regardless of the dialect you choose, the basic stuff should work just fine:
DECLARE SUB collatz ()
DIM SHARED n AS INTEGER
INPUT "Enter a value for n: ", n
PRINT n
DO WHILE n <> 4
collatz
PRINT n
LOOP
PRINT 2
PRINT 1
SUB collatz
IF n MOD 2 = 1 THEN
n = 3 * n + 1
ELSE
n = n \ 2
END IF
END SUB
A word about QB64
One might argue that there is a much more compatible transpiler known as QB64 (except for some things like DEF FN...), but I cannot recommend it if you want a tool for students to use. It's a large download for Windows users, and its syntax checking can be a bit poor at times, to the point that you might see the QB code compile only to see a cryptic message like "C++ compilation failed! See internals\temp\compile.txt for details". Simply put, it's usable and highly compatible, but it needs some work, like the qb.js script that repl.it uses.
An alternative: DOSBox and autorun
You could also find a way to run an actual copy of QB 4.5 in something like DOSBox and simply modify the autorun information in the default DOSBox.conf (or whatever it's called) to automatically launch QB. Then just repackage it with the modified DOSBox.conf in a nice installer for easy distribution (NSIS, Inno Setup, etc.) This will provide the most retro experience beyond something like a FreeDOS virtual machine as you'll be dealing with the 16-bit segmented memory, VGA, etc.—all emulated of course.

How to delete variable/forms in Lisp?

In Python we have the del statement for deleting variables.
E.g:
a = 1
del a
What the equivalent of this in Lisp?
(setq foo 1)
;; (del foo) ?
In Common Lisp.
For symbols as variables:
CL-USER 7 > (setf foo 42)
42
CL-USER 8 > foo
42
CL-USER 9 > (makunbound 'foo)
FOO
CL-USER 10 > foo
Error: The variable FOO is unbound.
See:
MAKUNBOUND (defined)
SLOT-MAKUNBOUND (defined)
FMAKUNBOUND (defined)
Python names reside in namespaces, del removes a name from a namespace. Common Lisp has a different design, one much more sympathetic to compiling efficient code.
In common lisp we have two kinds of "variables." Lexical variables account for the majority of these. Lexical variables are analogous to a C local. At runtime a lexical variable usually is implemented as a bit of storage (on the stack say) and the association with it's name is retained only for debugging purposes. It doesn't really make any sense to talk about deleting lexical variables in the sense python uses because the closest analogy to python's namespace that exists for lexical variables is the lexical scope, and that purely an abstraction used by the spec and the compiler/evaluator.
The second variable kind of "variable" in CL are "global" symbols. Symbols are very rich data structures, much richer than a label in python. They can have lots of information associated with them, a value, a printed name, their "home" package, a function, and other arbitrary information stored in a list of properties. Most of these are optional. When you use a name in your source code, e.g. (+ X 3), that name X will usually denote a lexical symbol. But failing that the compiler/evaluator will assume you want the value of the "global" symbol. I.e. you effectively wrote (symbol-value 'X) rather than X. Because of typos, programming conventions, and other things some decades ago the compilers started complaining about references to "global" symbols in the absence of a declaration that signaled that the symbols was intended to be a "global." This declaration is known as "special." Yes it's a stupid bit of nomenclature. And worse, special variables aren't just global they also have a very useful feature known as dynamic binding - but that's another topic.
Symbols that are special are almost always declared using defvar, defparameter, or defconstant. There is a nearly mandatory coding convention that they are spelled uniquely, i.e. *X* rather than X. Some compilers, and most developers, will complain if you deviate from that convention.
Ok. So now we can get back to del. Special variables are denoted with a symbol; and this is analogous to how in python variables are denoted with a name. In python the names are looked up in the current namespace. In Common Lisp they are looked up in the current package. But when the lookup happens differs. In python it's done at runtime, since names can by dynamically added and removed as the program is running. In Common Lisp the names are looked up as the program is read from a file prior to compiling it. (There are exceptions but let's avoid thinking about those.)
You can remove a symbol from a package (see unintern). But this is an rare thing and is likely to just make your brain hurt. It is a simple operation but it get's confusing around the edges because the package system has a small dose of clever features which while very helpful take a bit of effort to become comfortable with. So, in a sense, the short answer to your question is that for global symbols the unintern is the analogous operation. But your probably doing something quite exceptional (and likely wrong) if your using that.
While what #Ben writes is true, my guess is that what you are looking for is makunbound, not unintern. The former does not remove the symbol from the obarray (for Emacs Lisp) or package (for Common Lisp). It just removes its symbol-value, that is, its value as a variable. If you want the behavior that trying to get the variable value results in a not-bound (aka void) error, then try makunbound.
I am not contradicting the previous answers but adding.
Consider that Lisp has garbage collection. As you know, you get a symbol defined in many ways: setf, defvar, defparameter, make-symbol, and a lot of other ways.
But how do you get a clean system? How do you make sure that you don't use a variable by mistake? For example, you defined abc, then later decided you don't want to use abc. Instead you want an abc-1 and an abc-2. And you want the Lisp system to signal error if you try to use abc. If you cannot somehow erase abc, then the system won't stop you by signalling "undefined" error.
The other answers basically tell you that Lisp does not provide a way to get rid of abc. You can use makunbound so that the symbol is fresh with no value assigned to it. And you can use unintern so that the symbol in not in any package. Yet boundp will tell you the symbol still exists.
So, I think that as long as no other symbol refers to abc, garbage collection will eventually get rid of abc. For example, (setf pt-to-abc abc). As long as pt-to-abc is still bound to the symbol abc, abc will continue to exist even with garbage collection.
The other way to really get rid of abc is to close Lisp and restart it. Doesn't seem so desirable. But I think that closing Lisp and starting fresh is what actually gets rid of all the symbols. Then you define the symbols you want.
You probably usually want makunbound, because then, the system will signal error if you try to add the value of abc to something else because abc won't have any value. And if you try to append abc to a string, the system will signal error because abc has no string. Etc.

Building ROM images on CP/M

I'm trying to use the venerable M80 and L80 tools on CP/M to build a ROM image. (It's for a CP/M emulator, hence why I'm using CP/M tools.)
Unfortunately L80 seems to be really crude --- AFAICT it just loads each object file at its absolute address, fixes it up, and then dumps everything from 0x0100 up out to disk. This means that object files that are based at addresses outside its own workspace don't appear to work at all (just producing an error message). My ROM has a base address of 0xd000, which is well outside this.
Does anyone know if it's possible to use M80 and L80 to do this, and if so, how? Alternatively can anyone recommend (and point me at!) a CP/M assembler/linker suite that will?
(Note that I'd like to avoid cross compiling, if possible.)
If you're just assembling one file, then you can use M80's .phase directive to have the assembler locate the output.
.phase 0D000h
If you want to build several source files and link them at the end, then you can still use M80 but you'll need DRI's linker LINK.COM, which can be found in http://www.cpm.z80.de/download/pli80_13.zip. The LINK command line to use would be
LINK result=module1,module2,module3[LD000
(The nearest L80 equivalent would, I think, be
L80 /P:D000,module1,module2,module3,result/N/E
but then you have to remove 0xCF00 bytes from the start of the resulting file).
Old question, but this may work for those who are still looking. I checked this out on my Ampro Little Board running 1980 M80/L80 on CP/M 2.2.
You can use the ASEG (absolute) directive in your starting .MAC file, specify 0D000H as the org, and then reference external modules. As long as those external modules don't include DSEG or PSEG directives you should be able to link them all together with 0D000H as the starting address. E.g.
; TEST.MAC
ASEG
ORG 0D000H
public tstart
tstart:
...
call myfoo## ; call routine myfoo in external module foo.rel
...
end tstart
Assemble it:
M80 TEST,=TEST
Link it with foo.rel and use /X on the output to produce a .HEX file (TEST.HEX):
L80 TEST,FOO,TEST/N/X/E
If you examine the resulting .HEX file you should see the starting address is 0D000H.
BTW: If you don't use /X option then L80 with /N/E will make a .COM with with all the code linked using an offset of 0D000H unless you also include a .phase directive. E.g.:
; TEST.MAC
ASEG
ORG 100H
.phase 0D000H
public tstart
tstart:
...
call myfoo## ; call routine myfoo in external module foo.rel
...
end tstart
Link to make a .COM instead of a .HEX:
L80 TEST,FOO,TEST/N/E <== note no '/X'
You can't run it, but you can consider the .COM file is really a .BIN padded to the nearest 128 byte boundary (assuming that your CP/M is using the typical approach of allocating 128 byte blocks). You can confirm the result by doing a DUMP of the .COM file. If the code was very short it may also include leftover pieces of L80 loader code that wasn't overwritten by your code.
Note you can use also the ASEG approach with org 0100H to make a regular CP/M .COM. In that case you don't need to use .phase assuming the start of your code is at 100H.

IDA Pro string function

I have this binary file that I wish to edit, however after loading it, all strings are in some sort of gibberish symbols. Is there anyway to format it?
Why you are seeing "gibberish":
The strings are likely obfuscated. Chances are, before each of the strings is used in the program, a deobfuscation routine is run to convert the string in memory back into something meaningful. This is a common technique used to prevent static analysis tools (such as the GNU "strings" utility or IDA Pro) from properly analyzing the binary. The rest of this answer makes the assumption that this is true of your binary.
How to deobfuscate the strings (dynamic approach):
If you are able to run the binary, you can let it take care of the deobfuscation for you. All you need to do is run the binary in a debugger and analyze the memory after it has been deobfuscated.
Several binaries that obfuscate their strings never re-obfuscate them after their use, so one interesting shortcut you might want to try first is to run the binary in a debugger and break execution right before it exits. If the strings are still debofuscated, you can do a memory dump of the appropriate section to save the deobfuscated strings. (This will not necessarily deobfuscate all of the strings for you; you'll only get the strings that were deobfuscated along the path of the binary's execution)
If the previous method does not work for you, try setting a hardware write breakpoint on the first byte of an obfuscated string, then running the binary. If the breakpoint trips, step through the instructions to allow the rest of the string to be deobfuscated. If the deobfuscation always happens from a common routine, you can place a breakpoint near the end of that routine and possibly script your debugger to print the debofuscated string each time execution passes through that routine.
Once you have a list of deobfuscated strings, you can either patch them directly into the IDA database (discussed below), or you can leave repeatable comments (use the ' key) at the addresses of each of the strings in the database, such that the deobfuscated string will display as a comment on every instruction that references it.
For small binaries, you can get away with doing the annotations by hand, but it would be worthwhile to read into scripting IDA so that you can automate this process. The IDA Pro Book contains a great reference for this.
How to deobfuscate the strings (static approach):
If you can't run the binary, or if the dynamic approach isn't deobfuscating all the strings for you, then you can deobfuscate them yourself.
Chances are good that if you view the cross-references to any of the obfuscated strings in IDA Pro (view them with the x key), you should be taken to the deobfuscation routine. If the routine isn't too complicated -- and they usually aren't -- you should be able to write a script to emulate the debofuscation routine. This will allow you to replace the obfuscated strings with the deobfuscated strings in the IDA database.
(As a point of clarification, the IDA database is entirely separate from the binary itself. Anything you do to the database will have no effect on the actual binary, and anything you do to the binary will have no effect on the database)
Your options for scripting IDA are IDC (IDA's original built-in scripting language) and IDAPython. I highly recommend using IDAPython, as it is much easier to use, and a much more powerful language. I'm not sure if you can install IDAPython on IDA Free 5.0, but it should be bundled with all vaguely recent versions of IDA Pro.
Giving an overview of scripting IDA would be beyond the scope of this answer, but here's an example to get you started. I'm writing it in IDC in case you're using IDA Free. Let's say your deobfuscation routine simply XOR'd each successive byte with 0x1F until the null byte was decoded. Then the following loop might end up being part of your IDC script:
// *EXAMPLE*
auto addr = 0x00401000; // The address of your string
while(1){
auto b = Byte(addr) ^ 0x1F;
PatchByte(addr, b);
if (b == '\0'){
break;
}
addr = addr + 1;
}
Running a script can be done from File > IDC Command... or File > Script file....
As you might guess, Byte returns the byte stored at a given address, and PatchByte writes a byte to an address. Built-in functions in IDAPython share the same names with their IDC counterparts, so the IDAPython version would be nearly identical, sans the C-like syntax. As mentioned before, I highly recommend The IDA Pro Book for a walkthrough on scripting IDA. Once you have the basics down, you can use IDA's built-in help index and The IDAPython documentation as a couple other references.
Always save your database before running a script that patches code! There is no "undo" feature in IDA, so a small coding error could trash your entire database.
Good luck!