ASN1 module compiled by asn1c fails to decode? - rsa

I'm trying to parse RSA (private) keys using asn1c. I based the asn1 module on https://www.rfc-editor.org/rfc/rfc3447 and it looks as following (i tried to only use the private and public key part):
-- ===================
-- Main structures
-- ===================
RSAPrivateKey DEFINITIONS ::=
BEGIN
--
-- Representation of RSA private key with information for the CRT
-- algorithm.
--
RSAPublicKey ::= SEQUENCE {
modulus INTEGER, -- n
publicExponent INTEGER -- e
}
RSAPrivateKey ::= SEQUENCE {
version Version,
modulus INTEGER, -- n
publicExponent INTEGER, -- e
privateExponent INTEGER, -- d
prime1 INTEGER, -- p
prime2 INTEGER, -- q
exponent1 INTEGER, -- d mod (p-1)
exponent2 INTEGER, -- d mod (q-1)
coefficient INTEGER, -- (inverse of q) mod p
otherPrimeInfos OtherPrimeInfos OPTIONAL
}
Version ::= INTEGER { two-prime(0), multi(1) }
(CONSTRAINED BY {
-- version must be multi if otherPrimeInfos present --
})
OtherPrimeInfos ::= SEQUENCE SIZE(1..MAX) OF OtherPrimeInfo
OtherPrimeInfo ::= SEQUENCE {
prime INTEGER, -- ri
exponent INTEGER, -- di
coefficient INTEGER -- ti
}
END -- PKCS1Definitions
However when i compile the module and try to parse .der private key using the following, it ends with RC_FAIL.
RSAPrivateKey_t *rsa_p_key;
rsa_p_key= (RSAPrivateKey_t*)calloc(1, sizeof *rsa_p_key);
asn_dec_rval_t rval = ber_decode(
0,
&asn_DEF_RSAPrivateKey,
(void**)&rsa_p_key,
buffer, // buffer containing key (unsigned char*)
buffer_len); // buffer length (amount of read bytes)
I've been trying to find the error in asn1 module, but with no luck. Using openssl to print out the .der file, it seems to match the RSAPrivateKey. I also tested for erronous file reading, but buffer matches and binary read mode is used.

You need to compile with -fwide-types flag.
-fwide-types Use the wide integer types (INTEGER_t, REAL_t) instead of ma-chine’s native data types (long, double).
The problem is that most of the INTEGER fields of RSAPrivateKey contain really big numbers that not fit in long type that is being used by default.
Without -fwide-types
/* RSAPrivateKey */
typedef struct RSAPrivateKey {
Version_t version;
long modulus;
long publicExponent;
long privateExponent;
With -fwide-types
/* RSAPrivateKey */
typedef struct RSAPrivateKey {
Version_t version;
INTEGER_t modulus;
INTEGER_t publicExponent;
INTEGER_t privateExponent;
$ asn1c -fwide-types rfc3447.asn1
$ make -f converter-example.mk
$ ./converter-example -p RSAPrivateKey -iber ./temp/key.der
<RSAPrivateKey>
<version>0</version>
<modulus>00:C1:3D:F5:BD:51:55:B8:94:45:B9:93:53:46:81:58:77:F0:3A:30:75:30:40:F5:84:C2:6C:11:00:6A:00:CC:15:5D:FD:6F:04:1F:3F:92:39:84:AF:8D:DC:3C:58:84:2A:54:9B:AC:E7:77:55:FF:C5:95:94:F6:D7:16:C3:AA:B7:31:0B:BB:72:1A:A4:45:50:D3:D7:3A:98:C6:A0:8E:1F:66:AE:5F:40:94:46:26:64:22:07:F5:84:D8:AD:BD:C6:D1:00:4B:50:FB:66:BF:D1:05:61:84:3B:0D:6D:1E:9C:1F:CB:0C:35:FC:A5:FA:AF:8A:DD:14:0A:3E:1F:8F:E2:40:C0:4F:A2:3D:90:97:10:7F:A1:92:74:C9:97:6A:AD:E9:66:C8:0A:DE:91:88:8B:CB:DD:09:7D:35:CD:78:00:88:0D:A6:CE:1E:71:8C:7D:96:68:74:9E:F4:7E:CA:1A:82:AF:5E:42:A3:6A:60:47:98:00:CA:2A:59:D4:86:CA:AF:B5:CA:C2:38:1A:04:3E:33:86:81:17:78:27:BA:E3:17:D1:52:5A:DF:B5:31:9C:D4:4B:5C:06:EC:93:80:06:86:FB:33:55:68:30:72:91:36:B4:C7:6B:EB:3A:A6:40:0F:C3:78:ED:1A:22:0E:F9:77:44:65:91:E5:C5:B1:53</modulus>
<publicExponent>65537</publicExponent>
<privateExponent>00:9E:85:B7:8B:80:A7:73:6D:9E:ED:27:60:4F:1C:58:78:BB:86:E0:AD:A1:D2:08:16:CA:6F:60:5B:18:9A:62:D0:BC:73:E4:98:5B:12:09:60:49:EA:C1:D3:03:66:11:B5:B0:06:AD:06:8C:AC:ED:CF:26:70:37:36:27:24:88:6D:13:3C:EE:9E:22:20:D4:04:04:64:31:5B:96:C5:AB:11:33:68:A4:17:14:0B:9F:FE:D0:B3:FA:C2:EA:05:4D:03:45:FC:99:CC:6B:0F:D5:17:20:F4:E8:46:91:33:0C:C3:42:89:8D:10:D4:9B:4C:54:A8:F3:C7:36:C7:D3:98:71:B3:3E:FC:37:CB:E4:7D:68:11:BF:42:6B:C4:1E:F9:0C:B8:AD:10:F1:58:C8:66:11:CF:84:09:B0:B2:BE:1E:50:3C:80:F1:09:D4:C7:FD:35:3B:EE:AB:AB:14:1F:D1:78:56:EC:98:AE:37:C6:7E:DF:87:EF:76:A6:C4:E5:21:84:73:12:41:F7:C7:17:69:85:B0:C1:FA:9F:E5:11:A0:29:F5:95:22:64:BB:5D:4E:6E:52:A6:BA:E6:50:23:8C:D7:BB:69:C4:D1:36:B3:62:37:0E:C5:76:B0:D3:14:D7:7C:B3:4A:4F:58:59:F0:16:F1:19:42:12:6B:A3:D6:73:71</privateExponent>
<prime1>00:E7:38:46:89:D6:43:E5:A2:80:F7:EB:03:8B:15:B0:24:CE:9E:0F:4F:4D:1C:C5:7F:15:9B:12:05:D5:4C:20:C8:D2:9C:7C:30:3F:99:84:3A:6E:4B:BA:A0:BF:34:40:C3:30:80:D1:AE:6B:04:C6:DB:E0:FA:5F:65:14:96:0B:3B:9A:75:8E:84:C2:F5:96:98:09:09:87:2F:62:19:87:AB:BB:C7:87:67:67:07:61:5B:8B:CC:B2:19:52:74:1C:91:F7:12:F0:E5:9B:B3:3C:81:2E:F4:2A:E4:AC:56:6D:38:95:DB:18:5E:7B:4F:96:B1:E5:80:41:C2:28:AC:2A:E7</prime1>
<prime2>00:D5:F3:BA:B4:3C:3B:B0:02:0C:E9:BA:21:CC:03:23:26:F5:0B:2B:27:B0:74:C6:E2:F8:FD:3F:CB:1F:CA:1A:B9:12:4C:B6:7E:56:D4:AB:A8:F2:8D:81:54:63:0C:0E:16:79:54:0B:7C:13:7F:E6:66:12:BD:A0:62:F3:D6:8A:AC:B5:2A:58:70:58:8C:16:94:95:97:2D:9C:2A:A8:30:3F:35:43:57:D7:79:3D:9B:EF:56:95:A0:81:24:DC:67:C5:DA:66:F0:7E:02:94:59:4C:1B:EB:AB:67:0E:B6:C3:BD:92:0B:7C:B4:8E:44:AE:32:1A:42:A1:C7:93:6A:44:B5</prime2>
<exponent1>00:82:E3:65:72:E3:9A:FD:E4:36:D3:A0:F3:19:89:C6:73:9F:8E:F4:25:B5:06:43:7A:84:55:8B:27:48:2E:57:24:B7:AC:A3:D4:80:3C:3C:11:03:9C:D4:E1:E8:3B:01:2A:3D:4B:BE:E6:D8:68:14:D6:25:8E:35:F0:37:6E:14:9F:C1:F9:28:1B:59:6D:C2:B8:FF:EC:A7:DD:17:D0:51:EF:D2:55:C9:FD:AB:E2:0E:A7:CF:04:AA:11:11:8E:EF:19:65:DF:10:05:3A:55:85:3B:AF:C3:C2:80:3E:5A:92:6B:84:D1:49:03:3B:14:BB:BE:AA:A7:27:12:6D:09:C1:23</exponent1>
<exponent2>6E:22:70:D1:A6:CF:F2:E2:9B:53:15:85:A0:47:5D:29:08:AB:1F:23:E7:29:B5:D7:D0:E4:4C:9A:7B:5A:C6:36:CE:BC:BE:94:7A:8E:2F:6F:60:AC:87:0E:B1:8D:DB:12:A6:92:24:F7:51:F2:5C:DF:DE:75:CE:C2:21:53:27:3F:90:62:A3:F3:F1:20:EB:DE:C0:C2:79:B0:12:25:51:F0:B7:B2:5A:DD:88:83:B6:69:95:E0:A0:26:DA:9A:BA:B0:96:A4:B6:D7:A6:EC:46:AB:6F:13:F9:BF:AB:4B:59:A7:94:2E:65:9B:6C:40:DE:8A:DC:09:C0:CD:C3:8C:C8:A1</exponent2>
<coefficient>43:9A:41:22:B1:F4:15:A9:C7:95:FD:F7:7E:55:BC:24:16:5F:E2:9D:B0:D5:74:54:1B:F6:C9:76:C4:6A:4E:5E:6C:AE:71:E1:9A:DE:F1:26:47:B4:41:45:BD:0A:2E:E4:02:DE:AD:28:21:2D:50:59:99:DA:26:E0:90:1A:84:2B:22:46:48:CC:DB:1F:7E:9B:9B:F5:02:D8:24:6F:7E:F3:D9:30:91:1F:83:22:9A:94:C7:F4:29:B2:93:68:CB:57:BC:C5:60:96:0E:42:42:55:D2:3A:71:B5:31:78:D3:D1:2A:74:03:C2:45:A2:9A:A2:89:6F:62:63:C6:42:7B:2D</coefficient>
</RSAPrivateKey>

Related

Inserting many rows causes locking conflicts with Hibernate and Postgres, leaving the table empty

We are benchmarking some queries to see if they will still work reliably for "a lot of" data. (1 million isn't that much to be honest, but Postgres already fails here, so it evidently is.)
Our Java code to call this queries looks something like that:
#PersistenceContext
private EntityManager em;
#Resource
private UserTransaction utx;
for (int i = 0; i < 20; i++) {
this.utx.begin();
for (int inserts = 0; inserts < 50_000; inserts ++) {
em.createNativeQuery(SQL_INSERT).executeUpdate();
}
this.utx.commit();
for (int parameter = 0; parameter < 25; parameter ++)
long time = System.currentTimeMillis();
Assert.assertNotNull(this.em.createNativeQuery(SQL_SELECT).getResultList());
System.out.println(i + " iterations \t" + parameter + "\t" + (System.currentTimeMillis() - time) + "ms");
}
}
Or with plain JDBC:
Connection connection = //...
for (int i = 0; i < 20; i++) {
for (int inserts = 0; inserts < 50_000; inserts ++) {
try (Statement statement = connection.createStatement();) {
statement.execute(SQL_INSERT);
}
}
for (int parameter = 0; parameter < 25; parameter ++)
long time = System.currentTimeMillis();
try (Statement statement = connection.createStatement();) {
statement.execute(SQL_SELECT);
}
System.out.println(i + " iterations \t" + parameter + "\t" + (System.currentTimeMillis() - time) + "ms");
}
}
The queries we tried were a simple INSERT into a table with JSON and a INSERT over two tables with about 25 lines. The SELECT has one or two JOINs and is pretty easy. One set of queries is (I had to anonymize the SQL else I wouldn't have been allowed to post it):
CREATE TABLE ts1.p (
id integer NOT NULL,
CONSTRAINT p_pkey PRIMARY KEY ("id")
);
CREATE TABLE ts1.m(
pId integer NOT NULL,
mId character varying(100) NOT NULL,
a1 character varying(50),
a2 character varying(50),
CONSTRAINT m_pkey PRIMARY KEY (pI, mId)
);
CREATE SEQUENCE ts1.seq_p;
/*
* SQL_INSERT
*/
WITH p AS (
INSERT INTO ts1.p (id)
VALUES (nextval('ts1.seq_p'))
RETURNING id AS pId
)
INSERT INTO ts1.m(pId, mId, a1, a2)
VALUES ((SELECT pId from p), 'M1', '11', '12'),
((SELECT pId from p), 'M2', '13', '14'),
/* ... about 20 to 25 rows of values */
/*
* SQL_SELECT
*/
WITH userInput (mId, a1, a2) AS (
VALUES
('M1', '11', '11'),
('M2', '12', '15'),
/* ... about "parameter" rows of values */
)
SELECT m.pId, COUNT(m.a1) AS matches
FROM userInput u
LEFT JOIN ts1.m m ON (m.mId) = (u.mId)
WHERE (m.a1 IS NOT DISTINCT FROM u.a1) AND
(m.a2 IS NOT DISTINCT FROM u.a2) OR
(m.a1 IS NULL AND m.a2 IS NULL)
GROUP BY m.pId
/* plus HAVING, additional WHERE clauses etc. according to the use case, but that just speeds up the query */
When executing, we get the following output (the values are supposed to rise steadly and linearly):
271ms
414ms
602ms
820ms
995ms
1192ms
1396ms
1594ms
1808ms
1959ms
110ms
33ms
14ms
10ms
11ms
10ms
21ms
8ms
13ms
10ms
As you can see, after some value (usually at around 300,000 to 500,000 inserts) the time needed for the query drops significantly. Sadly we can't really debug what the result is at that point (other than that it's not null), but we assume it's an empty list, because the database tables are empty.
Let me repeat that: After half a million INSERTS, Postgres clears tables.
Of course that's not acceptable at all.
We tried different queries, all of easy to medium difficulty, and all produced this behavior, so we assume it's not the queries.
We thought that maybe the sequence returned a value too high for a column integer, so we droped and recreated the sequence.
Once there was this exception:
org.postgresql.util.PSQLException : FEHLER: Verklemmung (Deadlock) entdeckt
Detail: Prozess 1620 wartet auf AccessExclusiveLock-Sperre auf Relation 2001098 der Datenbank 1937678; blockiert von Prozess 2480.
Which I'm entirely unable to translate. I guess it's something like:
org.postgresql.util.PSQLException : ERROR: Jamming? Clamping? Constipation? (Deadlock) found
But I don't think this error has anything to do with the clearing of the table. We just tested against the wrong database, so multiple queries were run on the same table. Normally we have one database per benchmark test.
Of course it's important that we find out what the error is, so that we can decide if there is any risk to our customers losing their data (because again, on error the database empties some table of its choice).
Postgres version: PostgreSQL 10.6, compiled by Visual C++ build 1800, 64-bit
We tried PostgreSQL 9.6.11, compiled by Visual C++ build 1800, 64-bit, too. And we never had the same problem there (even though that could just be luck, since it's not 100% reproducible).
Do you have any idea what the error is? Or how we could debug it? The entire benchmark test runs for an hour, so there is no immediate feedback.

How to define an IIO device in kernel in order to call the probe of the corresponding driver? [duplicate]

On my x86_64 board, there is i2c-bus coming out of a MFD device. There are devices on to this i2c-bus. I am able to detect these devices using i2cdetect program.
# i2cdetect -y 0
0 1 2 3 4 5 6 7 8 9 a b c d e f
00: -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- 4c -- -- --
50: -- -- -- -- -- -- -- 57 -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
I need the kernel to detect these devices automatically, So, I tried writing i2c_board_info as in given below code, But still, the kernel is not able to detect these devices automatically.
#include <linux/init.h>
#include <linux/i2c.h>
#define BUS_NUMBER 0
static struct __init i2c_board_info tst_i2c0_board_info[] = {
{
I2C_BOARD_INFO("ltc2990", 0x4c),
},
{
I2C_BOARD_INFO("24c128", 0x57),
},
};
static int tst_i2c_board_setup(void)
{
int ret=-1;
ret = i2c_register_board_info(BUS_NUMBER, tst_i2c0_board_info, ARRAY_SIZE(tst_i2c0_board_info));
return ret;
}
device_initcall(tst_i2c_board_setup);
Any suggestions on how can I solve this ?
Since you have an ACPI-enabled platform the best approach is to provide the ASL excerpts for given devices.
Because of Intel Galileo platform for IoT the Atmel 24 series EEPROM has got its own ACPI ID and an excerpt will be simple:
DefinitionBlock ("at24.aml", "SSDT", 5, "", "AT24", 1)
{
External (_SB_.PCI0.I2C2, DeviceObj)
Scope (\_SB.PCI0.I2C2)
{
Device (EEP0) {
Name (_HID, "INT3499")
Name (_DDN, "Atmel AT24 compatible EEPROM")
Name (_CRS, ResourceTemplate () {
I2cSerialBusV2 (
0x0057, // I2C Slave Address
ControllerInitiated,
400000, // Bus speed
AddressingMode7Bit,
"\\_SB.PCI0.I2C2", // Link to ACPI I2C host controller
0
)
})
Name (_DSD, Package () {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () {"size", 1024},
Package () {"pagesize", 32},
}
})
}
}
}
Note, the size property is being added in a pending patch series (patches add eeprom "size" property and add support to fetch eeprom device property "size").
Note, the address width is 8-bit as hard coded for now. In case you need to have 16-bit you need to create a similar patches as mentioned above.
For LTC2990 power monitor you need the following excerpt:
DefinitionBlock ("ltc2990.aml", "SSDT", 5, "", "PMON", 1)
{
External (\_SB_.PCI0.I2C2, DeviceObj)
Scope (\_SB.PCI0.I2C2)
{
Device (PMON)
{
Name (_HID, "PRP0001")
Name (_DDN, "Linear Technology LTC2990 power monitor")
Name (_CRS, ResourceTemplate () {
I2cSerialBus (
0x4c, // Bus address
ControllerInitiated, // Don't care
400000, // Fast mode (400 kHz)
AddressingMode7Bit, // 7-bit addressing
"\\_SB.PCI0.I2C2", // I2C host controller
0 // Must be 0
)
})
Name (_DSD, Package () {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () {"compatible", "lltc,ltc2990"},
}
})
}
}
}
Note, unfortunately there is no compatible string in the driver, so, one needs to add it like it's done here.
In the examples above \\_SB.PCI0.I2C2 is an absolute path to the I2C host controller.
How to get those files applied:
first of all, create a folder
mkdir -p kernel/firmware/acpi
save files under names mentioned in the DefinitionBlock() macro in that folder
create the uncompressed cpio archive and concatenate the original initrd on top:
find kernel | cpio -H newc --create > /boot/instrumented_initrd
cat /boot/initrd >> /boot/instrumented_initrd
More details are available in SSDT Overlays.
The other examples and description of the idea behind can be found on meta-acpi GitHub page, some materials from which are copied here.
After going through Documentation/i2c/instantiating-devices, I understand there are several methods to do the same(e.g. as 0andriy suggested usng acpi table etc), I used "i2c_new_probed_device" method. Below is the used code :
#include <linux/init.h>
#include <linux/i2c.h>
#define BUS_NUMBER 0
#define NUM_DEVICE 2
static const unsigned short normal_i2c[][2] = {
{0x4c, I2C_CLIENT_END},
{0x57, I2C_CLIENT_END},
};
static struct i2c_board_info tst_i2c0_board_info[2] = {
{I2C_BOARD_INFO("ltc2990", 0x4c), },
{I2C_BOARD_INFO("24c128", 0x57), },
};
static int tst_i2c_board_setup(void)
{
int i = 0;
struct i2c_adapter *i2c_adap;
i2c_adap = i2c_get_adapter(BUS_NUMBER);
for(i = 0; i < NUM_DEVICE; i++)
i2c_new_probed_device(i2c_adap, &tst_i2c0_board_info[i],
normal_i2c[i], NULL);
i2c_put_adapter(i2c_adap);
return 0;
}
late_initcall(tst_i2c_board_setup);

Creating user defined function for firebird 2.5 with c++builder 2010

I tried to create a simple user defined function (UDF) for Firebird 2.5 with C++ Builder 2010 but I don't manage to get it to work in Firebird.
Creating a DLL project with default setting in C++ Builder 2010.
Adding a unit with my example UDF including "ibase.h" and "ib_util.h":
extern "C" __declspec(dllexport) int __stdcall MYFUNC ( int i )
{
int result = 2 * i;
return result;
}
Building the DLL FBUDFMBD.dll in path C:\Program Files (x86)\Firebird\Firebird_2_5\UDF
Registering my UDF via IBExpert in a sample db with
DECLARE EXTERNAL FUNCTION F_MYFUNC
INTEGER
RETURNS INTEGER
ENTRY_POINT 'MYFUNC' MODULE_NAME 'FBUDFMBD';
Calling the UDF with
select F_MYFUNC( 3 ) from RDB$DATABASE;
results in error message
Invalid token.
invalid request BLR at offset 36.
function F_MYFUNC is not defined.
module name or entrypoint could not be found.
With the tool GExperts - PE Information I can see my UDF as DLL-Export MYFUNC ordinal $1 and entry point $1538.
What I am doing wrong, Firebird can't register my DLL and its UDF correctly?
Is there anything in my DLL project to change regarding to default compiler options?
Thanks a lot! I got it by your help.
top 2: Corrected C++-Code is:
extern "C" __declspec(dllexport) int MYFUNC ( int * val )
{
int result = 2 * *val;
return result;
}
Pay attention to reference call of the input parameter.
top 4: Register the UDF in a firebird 2.5 db by
DECLARE EXTERNAL FUNCTION F_MYFUNC
INTEGER
RETURNS INTEGER BY VALUE
ENTRY_POINT '_MYFUNC' MODULE_NAME 'FBUDFMBD';
Pay attention to the leading underscore at the function name!
top 5: select F_MYFUNC( 3 ) from RDB$DATABASE; works really fine!
In Delphi, you can write cdecl and not stdcall
i.e
function ExisteBase(const aBase:PChar):Integer; cdecl;
Not
function ExisteBase(const aBase:PChar):Integer; stdcall;
Maybe on C++ __cdecl
I hope I helped in some

libpqxx postgresql utf8 strings

Does it possible insert into database table (based on postgresql) utf8 (unicode) string?
pqxx::work tr(*_conn.get(), "notify");
std::stringstream ss;
ss << "INSERT INTO tbl (msg) VALUES ('" << msg << "');";
tr.exec(ss.str());
tr.commit();
I want message content will be for example キエオイウカクケコサシスセソタチツテア. But exec method waits char string, not wchar. How can I encode utf8 string to pass it into the query?
Additional question: how can I encode utf8 string using wchar_t type? I assume that wchar type represents 2-bytes symbols, but utf8 may contain up to 6-bytes symbols.
It's possible to convert wide char string into utf8 like this:
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
std::string u8str = conv.to_bytes(msg);
or this way:
std::wstring wmsg_text = L"キエオイウカクケコサシスセソタチツテア";
char buffer[100] = { 0 };
WideCharToMultiByte(CP_UTF8, 0, wmsg_text.data(), wmsg_text.size(), buffer, sizeof(buffer)-1, NULL, NULL);
Of course after obtaining the string from database it's necessarily to execute:
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
std::wstring u8str = conv.from_bytes(message);

SAS hash join in form LIKE or =:

is it possible to do a SAS hash lookup on a partial substring?
So the hash table key will contain: 'LongString' but my target table key has: 'LongStr'
(the target table key string length may vary)
You can but it's not pretty and you may not get the performance benefits you're looking for. Also, depending on the length of your strings and the size of your table you may not be able to fit all the hashtable elements into memory.
The trick is to first generate all of the possible substrings and then to use the 'multidata' option on the hashtable.
Create a dataset containing words we want to match against:
data keys;
length key $10 value $1;
input key;
cards;
LongString
LongOther
;
run;
Generate all possible substrings:
data abbreviations;
length abbrev $10;
set keys;
do cnt=1 to length(key);
abbrev = substr(key,1,cnt);
output;
end;
run;
Create a dataset containing terms we want to search for:
data match_attempts;
length abbrev $10;
input abbrev ;
cards;
L
Long
LongO
LongSt
LongOther
;
run;
Perform the lookup:
data match;
length abbrev key $10;
set match_attempts;
if _n_ = 1 then do;
declare hash h1(dataset:'abbreviations', multidata: 'y');
h1.defineKey('abbrev');
h1.defineData('abbrev', 'key');
h1.defineDone();
call missing(abbrev, key);
end;
if h1.find() eq 0 then do;
output;
h1.has_next(result: r);
do while(r ne 0);
h1.find_next();
output;
h1.has_next(result: r);
end;
end;
drop r;
run;
Output (notice how 'Long' returns 2 matches):
Obs abbrev key
=== ========= ==========
1 Long LongString
2 Long LongOther
3 LongO LongOther
4 LongSt LongString
5 LongOther LongOther
A few more notes. The reason the hash table will not support something like the like operator is because it 'hashes' the key prior to inserting a record into the hash table. When a lookup is performed the value to lookup is 'hashed' and then a match is performed on the hashed values. When a value is hashed even a small change in the value will yield a completely different result. Take the below example, hashing 2 almost identical strings yields 2 completely different values:
data _null_;
length hashed_value $16;
hashed_value = md5("String");
put hashed_value= hex32.;
hashed_value = md5("String1");
put hashed_value= hex32.;
run;
Output:
hashed_value=27118326006D3829667A400AD23D5D98
hashed_value=0EAB2ADFFF8C9A250BBE72D5BEA16E29
For this reason, the hash table cannot use the like operator.
Finally, thanks to #vasja for some sample data.
You have to use Iterator object to loop through the keys and do the matching by yourself.
data keys;
length key $10 value $1;
input key value;
cards;
LongString A
LongOther B
;
run;
proc sort data=keys;
by key;
run;
data data;
length short_key $10;
input short_key ;
cards;
LongStr
LongSt
LongOther
LongOth
LongOt
LongO
LongSt
LongOther
;
run;
data match;
set data;
length key $20 outvalue value $1;
drop key value rc;
if _N_ = 1 then do;
call missing(key, value);
declare hash h1(dataset:"work.keys", ordered: 'yes');
declare hiter iter ('h1');
h1.defineKey('key');
h1.defineData('key', 'value');
h1.defineDone();
end;
rc = iter.first();/* reset to beginning */
do while (rc = 0);/* loop through the long keys and find a match */
if index(key, trim(short_key)) > 0 then do;
outvalue = value;
iter.last(); /* leave after match */
end;
rc = iter.next();
end;
run;