How does the BLC encode parenthesis? For example, how would this:
λa.λb.λc.(a ((b c) d))
Be encoded in BLC?
Note: the Wikipedia article is not very helpful as it uses an unfamiliar notation and provides only one simple example, which doesn't involve parenthesis, and a very complex example, which is hard to analyze. The paper is similar in that aspect.
If you mean the binary encoding based on De Bruijn indices discussed in the Wikipedia, that's actually quite simple. You first need to do De Bruijn encoding, which means replacing the variables with natural numbers denoting the number of λ binders between the variable and its λ binder. In this notation,
λa.λb.λc.(a ((b c) d))
becomes
λλλ 3 ((2 1) d)
where d is some natural number >=4. Since it is unbound in the expression, we can't really tell which number it should be.
Then the encoding itself, defined recursively as
enc(λM) = 00 + enc(M)
enc(MN) = 01 + enc(M) + enc(N)
enc(i) = 1*i + 0
where + denotes string concatenation and * means repetition. Systematically applying this, we get
enc(λλλ 3 ((2 1) d))
= 00 + enc(λλ 3 ((2 1) d))
= 00 + 00 + enc(λ 3 ((2 1) d))
= 00 + 00 + 00 + enc(3 ((2 1) d))
= 00 + 00 + 00 + 01 + enc(3) + enc((2 1) d)
= 00 + 00 + 00 + 01 + enc(3) + 01 + enc(2 1) + enc(d)
= 00 + 00 + 00 + 01 + enc(3) + 01 + 01 + enc(2) + enc(1) + enc(d)
= 000000011110010111010 + enc(d)
and as you can see, the open parentheses are encoded as 01 while the close parens are not needed in this encoding.
Related
I am trying to read the following data into MATLAB:
'0.000000 1 18EFFA59x Rx D 8 AD 09 02 00 00 00 00 30'
'0.004245 1 14EFF01Cx Rx D 6 DB 00 FF FF 00 71'
'0.004640 1 CEF801Cx Rx D 3 3F 00 3B'
'0.005130 1 14EF131Cx Rx D 6 DB 00 FF FF 00 71'
'0.005630 1 CEF801Cx Rx D 3 3F 00 C3'
'0.010015 1 18EFFA59x Rx D 8 AD 07 01 00 00 00 00 30'
'0.014145 1 CF004F0x Rx D 8 F0 FF 7D 00 00 FF FF FF'
'0.015060 1 18EFFA59x Rx D 8 AD 07 02 00 00 00 00 30'
'0.018235 1 18EF1CF0x Rx D 8 F2 1E 05 FF FF 00 71 FF'
'0.018845 1 18EA5941x Rx D 3 09 FF 00'
I can easily read in each line as a string - but to make post-processing more efficient I'd like to separate each line by its delimiter - which is whitespace. In other words, the end result should be a non-singleton cell array. I can't seem to find a very efficient way of doing this. Efficiency is important because these files are several million lines long and processing in MATLAB with strings/cells takes a long time.
Any help would be appreciated. Thanks.
You appear to have fixed-width fields, so I would treat it as such and let textscan do the most of the pre-processing for you by turning off delimiters and whitespace and defining the field widths and types explicitly:
test = {...
'0.000000 1 18EFFA59x Rx D 8 AD 09 02 00 00 00 00 30'
'0.004245 1 14EFF01Cx Rx D 6 DB 00 FF FF 00 71'
'0.004640 1 CEF801Cx Rx D 3 3F 00 3B'
'0.005130 1 14EF131Cx Rx D 6 DB 00 FF FF 00 71'
'0.005630 1 CEF801Cx Rx D 3 3F 00 C3'
'0.010015 1 18EFFA59x Rx D 8 AD 07 01 00 00 00 00 30'
'0.014145 1 CF004F0x Rx D 8 F0 FF 7D 00 00 FF FF FF'
'0.015060 1 18EFFA59x Rx D 8 AD 07 02 00 00 00 00 30'
'0.018235 1 18EF1CF0x Rx D 8 F2 1E 05 FF FF 00 71 FF'
'0.018845 1 18EA5941x Rx D 3 09 FF 00'};
test = strjoin(test', '\n');
C = textscan(test, '%8.6f %2u %11s %4s %2s %2u %33s', 'delimiter', '','whitespace','');
col1 = C{1};
col2 = C{2};
col3 = strtrim(C{3});
col3 = cellfun(#(x)hex2dec(x(1:end-1)), col3); % for instance.
col4 = strtrim(C{4});
col5 = strtrim(C{5});
col6 = C{6};
col7 = strtrim(C{7});
In the real world, you'd substitute the text string for a file id. For the last variable-length field, just read the whole thing in, making sure you specify the maximum possible length. MATLAB will read a field until it gets to the end or reaches a newline character (in fact, I made the last field width 1 larger, just to make sure). Each field is then aggregated into a cell. I also took the liberty of converting the third field from hex to decimal to show how you might post-process the numbers further.
As a further note, if you really do have gigantic files and need maximum speed, you could skip the strtrim step on the character fields by specifying %*ns where n is the desired field width, for any known gaps such as the 2 character gap between columns 3 and 4. The star says to ignore that field. I find this way of doing things a bit more readable and intuitive, however, and leaves a small margin of error in case one of the fields, such as the 4th, occasionally has a 3 character entry.
In WinDbg I can search the memory for bytes using the s command, e.g.
s 0012ff40 L?2000 48 65 6c 6c 6f
Is there also a way to include unknown bytes in the search sequence, e.g.
s 0012ff40 L?2000 48 65 ?? ?? ?? 6c 6f
where ?? is a byte with an arbitrary value?
Idea
How about doing ((memory XOR 48 65 00 00 00 6c 6f) AND FF FF 00 00 00 FF FF) and compare that against 00 00 00 00 00 00 00? But I don't know how to do that in WinDbg either.
Am not sure if the search command supports wild card. But you can use .foreach command, to achieve what you want.
Here is a sample that i used to search a memory pattern such as ff ?? 00
.foreach (hit {s -[1]b 00007ffabc520000 L100 ff }) {db hit L3; s ${hit}+2 L1 00}
Here is a brief description of how it works :
NOTE - Open up the debugger help from windbg to get complete documentation. That is within Windbg, Help | Contents
{s -[1]b 00007ffabc520000 L100 ff }
Use -[1] flag with s, so that only the memory address is given as the output.
s ${hit}+2 L1 00
For each hit, pass that memory address to the next search command. Increase the memory by the number of bytes that you want to skip and mention the last part of search pattern.
db hit L3
From the memory that has the beginning of the patter, dump the entire length. This is just to confirm that we are getting the right results!
Hope this helps. In case you need further clarification, i can try to provide that as well.
We can use pykd to achieve this. Find the downloads linked from PyKD Wiki or PyKD Downloads. When using WinDbg Preview, copy the DLLs into
%LOCALAPPDATA%\DBG\EngineExtensions
for 64 bit or
%LOCALAPPDATA%\DBG\EngineExtensions32
for 32 Bit.
Since this is only the WinDbg extension, you also need the Python module as well:
pip install pykd
Use the power of Python to do what WinDbg can't do. Save the following script in a good place for WinDbg, i.e. in a short path without spaces.
from pykd import *
import sys
import re
import struct
if len(sys.argv)<4:
print("Wildcard search for memory")
print("Usage:", sys.argv[0], "<address> <length> <pattern> [-v]", sep=" ")
print(" <address>: Memory address where searching begins.")
print(" This can be a WinDbg expression like ntdll!NtCreateThreadEx.")
print(" <length> : Number of bytes that will be considered as the haystack.")
print(" <pattern>: Bytes that you're looking for. May contain ?? for unknown bytes.")
print(" [-v] : (optional) Verbose output")
print()
print("Examples:")
print(" ", sys.argv[0], "00770000 L50 01 02 03 ?? 05")
print(" will find 01 02 03 04 05 or 01 02 03 FF 05, if present in memory")
sys.exit(0)
verbose = False
if sys.argv[-1][0:2] == "-v":
verbose = True
if verbose:
for n in range(1, len(sys.argv)):
print(f"param {n}: " + sys.argv[n])
address = expr(sys.argv[1])
if verbose: print("Start address:", "0x{:08x}".format(address), sep=" ")
length = sys.argv[2]
length = length.replace("L?","") # consider large address range syntax
length = length.replace("L","") # consider address range syntax
length = expr(length)
if verbose: print("Length:", "0n"+str(length), "bytes", sep=" ")
regex = b""
for n in range(3, len(sys.argv) - 1 if verbose else 0):
if sys.argv[n] == "??":
regex += bytes(".", "ascii")
else:
char = struct.pack("B", expr(sys.argv[n]))
if char == b".":
regex += struct.pack("B", ord("\\"))
regex += char
if verbose: print("Regex:", regex, sep=" ")
memorycontent = loadBytes(address, length)
if verbose: print("Memory:", memorycontent, sep=" ")
result = re.search(regex, bytes(memorycontent))
print("Found:", ' '.join("0x{:02x}".format(x) for x in result.group(0)), "at address", "0x{:08x}".format(address+result.start()))
The script constructs a Regex for a Bytes object. It uses . for the wildcard and escapes literal . to \..
Let's prepare a proper sample in WinDbg:
0:006> .dvalloc 1000
Allocated 1000 bytes starting at 00900000
0:000> eu 0x00900000 "Test.with.regex"
0:000> db 0x00900000 L0n30
00900000 54 00 65 00 73 00 74 00-2e 00 77 00 69 00 74 00 T.e.s.t...w.i.t.
00900010 68 00 2e 00 72 00 65 00-67 00 65 00 78 00 h...r.e.g.e.x.
Load the PyKD extension, so we'll be able to run the script:
0:006> .load pykd
and run the script:
0:000> !py d:\debug\scripts\memwild.py 00900000 L10 2e ?? 77
Found: 0x2e 0x00 0x77 at address 0x00900008
If the range of the search is not insanely large you could copy/paste the hex dump into sublime text and just do a find with regex mode enabled. For example I was looking for (1200 < X < 2400)
add esp, X
ret
In sublime text I searched using the regex 81 c4 .. .. .. 00 c3 and found an address with instructions for
add esp,600h
ret
While writing this post, I attempted b = fread(s, 1, 'uint32')
This would work great, but my poor data is sent LSB first! (no I can not change this)
Before, I was using b = fread(s, 4)' which gives me a vector similar to [47 54 234 0].
Here is my input stream:
0A
0D 39 EA 00 04 39 EA 00
4B 39 EA 00 D0 38 EA 00
0A
etc...
I can successfully delimit by 0x0A by
while ~isequal(fread(s, 1), 10) end
Basically I need to get the array of uint32s represented by [00EA390D 00EA3904 00EA394B 00EA38D0]
The documentation for swapbytes doesn't help me much and the uint32 operator operates on individual elements!!
The matlab fread function directly supports little endian machine format. Just set the 5th argument of the fread function to the string "L".
b = fread(s, 4, 'uint32',0,'l');
I am currently working on converting our PowerBuilder 12.1 application, which does not currently support Unicode, into a Unicode supporting application.
I have made some modifications to save Unicode data to our database, as well to files, but I have hit a slight snag in processing strings.
For example, the character 𠆾 is a Surrogate Pair and PowerBuilder interprets this as 2 characters (similar to how .NET operates). Thus:
LEN("𠆾") = 2
To me, this part makes sense, as it is count each code unit as a character.
Currently we have come up with two solutions to handle doing string functions with Unicode characters:
Callable OLEObjects written in C# .NET
using the PBNI interface to call C# .NET (want to stay away from this solution if possible)
An example of the .NET code we are thinking of using for determining the string length is:
StringInfo.ParseCombiningCharacters("𠆾").Length = 1
We are just worried about the impact on performance with constantly calling the OLEObjects/PBNI to do all of our string processing. Have any of the other PowerBuilder developers here done Unicode string manipulation (LEN, MID, POS, etc), and how did you do it?
Thank you.
This is in response to Seki's hex conversion function. I'm posting it as an answer so I can include source code. I use the Microsoft cryptographic functions to display blobs in my debugging tools. Here's a simplified version of my blob window. The one I use is PFC-based and uses an object that wraps the MS Crypto library. It's from PB 12.5 but should import into any Unicode version of PB.
HA$PBExportHeader$w_show_blob.srw
forward
global type w_show_blob from window
end type
type sle_1 from singlelineedit within w_show_blob
end type
type mle_1 from multilineedit within w_show_blob
end type
end forward
global type w_show_blob from window
integer width = 3081
integer height = 1988
boolean titlebar = true
boolean controlmenu = true
boolean minbox = true
boolean maxbox = true
boolean resizable = true
boolean center = true
sle_1 sle_1
mle_1 mle_1
end type
global w_show_blob w_show_blob
type prototypes
FUNCTION boolean CryptBinaryToString ( &
Blob pbBinary, &
ulong cbBinary, &
ulong dwFlags, &
Ref string pszString, &
Ref ulong pcchString ) &
LIBRARY "crypt32.dll" ALIAS FOR "CryptBinaryToStringW"
end prototypes
type variables
CONSTANT Ulong CRYPT_STRING_HEXASCIIADDR = 11
end variables
forward prototypes
public subroutine of_showblob (ref blob abl_data)
end prototypes
public subroutine of_showblob (ref blob abl_data);unsignedlong lul_size, lul_bufsize
string ls_hex
try
lul_size = len(abl_data)
sle_1.text = string(lul_size)
setnull(ls_hex)
cryptbinarytostring( abl_data, lul_size, CRYPT_STRING_HEXASCIIADDR, ls_hex, lul_bufsize)
ls_hex = space(lul_bufsize)
if not cryptbinarytostring( abl_data, lul_size, CRYPT_STRING_HEXASCIIADDR , ls_hex, lul_bufsize) then
mle_1.text = "error converting blob data"
else
mle_1.text = ls_hex
end if
catch(runtimeerror re)
messagebox("oops", re.text)
end try
end subroutine
on w_show_blob.create
this.sle_1=create sle_1
this.mle_1=create mle_1
this.Control[]={this.sle_1,&
this.mle_1}
end on
on w_show_blob.destroy
destroy(this.sle_1)
destroy(this.mle_1)
end on
type sle_1 from singlelineedit within w_show_blob
integer x = 73
integer width = 517
integer height = 88
integer taborder = 10
integer textsize = -10
integer weight = 400
fontcharset fontcharset = ansi!
fontpitch fontpitch = variable!
fontfamily fontfamily = swiss!
string facename = "Arial"
long textcolor = 33554432
long backcolor = 553648127
string text = "none"
boolean displayonly = true
borderstyle borderstyle = stylelowered!
end type
type mle_1 from multilineedit within w_show_blob
integer x = 64
integer y = 96
integer width = 2898
integer height = 1716
integer taborder = 10
integer textsize = -10
integer weight = 400
fontcharset fontcharset = ansi!
fontpitch fontpitch = fixed!
fontfamily fontfamily = modern!
string facename = "Courier New"
long textcolor = 33554432
string text = "none"
boolean hscrollbar = true
boolean vscrollbar = true
boolean displayonly = true
borderstyle borderstyle = stylelowered!
end type
To use it, assuming your blob is lbl_myBlob:
open(w_show_blob)
w_show_blob.of_showblob(lbl_myBlob)
The output in the MLE looks like this:
0000 42 4d ee 00 00 00 00 00 00 00 76 00 00 00 28 00 BM........v...(.
0010 00 00 10 00 00 00 0f 00 00 00 01 00 04 00 00 00 ................
0020 00 00 78 00 00 00 00 00 00 00 00 00 00 00 00 00 ..x.............
0030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 80 ................
0040 00 00 00 80 80 00 80 00 00 00 80 00 80 00 80 80 ................
0050 00 00 80 80 80 00 c0 c0 c0 00 00 00 ff 00 00 ff ................
0060 00 00 00 ff ff 00 ff 00 00 00 ff 00 ff 00 ff ff ................
0070 00 00 ff ff ff 00 88 88 88 88 88 88 88 88 88 88 ................
0080 80 88 88 88 88 88 88 88 80 08 88 88 88 88 88 88 ................
0090 80 00 88 88 88 88 88 88 80 00 08 88 88 88 88 88 ................
00a0 80 00 00 88 88 88 88 88 80 00 00 08 88 88 88 88 ................
00b0 80 00 00 00 88 88 88 88 80 00 00 08 88 88 88 88 ................
00c0 80 00 00 88 88 88 88 88 80 00 08 88 88 88 88 88 ................
00d0 80 00 88 88 88 88 88 88 80 08 88 88 88 88 88 88 ................
00e0 80 88 88 88 88 88 88 88 88 88 88 88 88 88 ..............
Since the release 10, PB is unicode (utf-16le)-aware. So the legacy Len() is implicit LenW() (as other string functions, and dealing with legacy data could imply to use explicit LenA()).
Are you sure that you are getting some utf-16le encoding ? Given the following function, what does it return on a string containing your data, if you call it with hexdump_blob(blob(your_string))?
Paste this code into the source code of a new global function named hexdump_blob to have an hexadecimal display (hex editor like) for blob contents.
global type hexdump_blob from function_object
end type
forward prototypes
global function string hexdump_blob (blob abl_data, boolean ab_fill_lastline)
end prototypes
global function string hexdump_blob (blob abl_data, boolean ab_fill_lastline);//hexify a blob content
string ls_tohex = "0123456789ABCDEF"
string ls_msg = "", ls_line, ls_binary
long i, j, length
byte b
string ls_fill
if isnull( abl_data ) then return ""
if ab_fill_lastline then
ls_fill = " __"
else
ls_fill = " "
end if
length = len( abl_data )
for i = 1 to length
GetByte( abl_data, i, b )
ls_line += mid( ls_tohex, 1+ mod(int(b/16),16), 1)
ls_line += mid( ls_tohex, 1+ mod(b,16), 1)
ls_line += " "
ls_binary += string( iif(b>31 and b<128,char(b)," "))
if mod(i,16) = 0 and i > 0 then
ls_binary = replaceall( ls_binary, "~r", "·") //no cr/lf
ls_binary = replaceall( ls_binary, "~n", "·")
ls_binary = replaceall( ls_binary, "~t", "·")
ls_msg += "[" + string( i - 16, "0000") + "] " + ls_line + "~t" + ls_binary + "~r~n"
ls_line = ""
ls_binary = ""
end if
next
i -- // i - 1 due to the last loop in for
ls_line += fill(ls_fill, 3 * ( 16 - mod(i, 16) ) )
ls_msg += "[" + string( i - mod(i,16), "0000") + "] " + ls_line + "~t" + ls_binary
return ls_msg
end function
Also, here is the replaceall() function that is used by hexdump_blob()
global type replaceall from function_object
end type
forward prototypes
global function string replaceall (string as_source, string as_pattern, string as_replace)
end prototypes
global function string replaceall (string as_source, string as_pattern, string as_replace);//remplace toute les occurences de as_pattern de as_source par as_replace
string ls_target
long i, j
ls_target=""
i = 1
j = 1
do
i = pos( as_source, as_pattern, j )
if i>0 then
ls_target += mid( as_source, j, i - j )
ls_target += as_replace
j = i + len( as_pattern )
else
ls_target += mid( as_source, j )
end if
loop while i>0
return ls_target
end function
and the iif() that simulates the C ternary operator, or the visual basic iif()
global type iif from function_object
end type
forward prototypes
global function any iif (boolean ab_cond, any aa_true, any aa_false)
end prototypes
global function any iif (boolean ab_cond, any aa_true, any aa_false);
// simulates the VB iif or C ternary operator
if ab_cond then
return aa_true
else
return aa_false
end if
end function
Wouldn't you want to use the LenA() method?
http://www.techno-kitten.com/Changes_to_PowerBuilder/New_in_PowerBuilder_10/PB10New_-_Unicode_Support/PB10New_-_Unicode_Related_Chan/PB10New_-_String-Related_Funct/pb10new_-_modified_processing_.html
The following byte sequence is encoded as Little Endian Unsigned Int.
F0 00 00 00
I just read about endianness. Just wanted to verify if it is 240 decimal.
Translating the byte sequence to bits...
[1111 0000] [0000 0000] [0000 0000] [0000 0000]
Converting the first byte to decimal...
= 0*2^0 + 0*2^1 + 0*2^2 + 0*2^3 + 1*2^4 + 1*2^5 + 1*2^6 + 1*2^7
Doing the math...
= 16 + 32 + 64 + 128 = 240
Yes, 0x000000F0 = 240.
If it were big-endian, it would be 0xF0000000 = 4026531840 (or -268435456 if signed).