Does Kyoto Cabinet support searching for a range of keys?
If so, what types of keys do support range search?
Can I do range search on a long (64bit) key?
Thanks
RG
it supports key prefix query, however, the efficiency of prefix query depends on what internal storage structure is. If you are using hashdb, it may be not a good idea, as keys & values are scattered around in the underline file.
Yes, for integers.
B+ tree database supports sequential access in order of the keys, which realizes forward matching search for strings and range search for integers - from docs
Yes you can, you just need a forward jump.
An example using C. Stores 5 records with 64 bits keys (from 1 to 5) and then apply a filter (from 2 to 4):
#include <kclangc.h>
#include <inttypes.h>
int main(void)
{
KCDB *db;
KCCUR *cur;
char *kbuf;
size_t ksiz, vsiz;
const char *cvbuf;
int64_t i, val, min, max;
int64_t keys[] = {1, 2, 3, 4, 5};
const char *values[] = {"one", "two", "three", "four", "five"};
char i64[8]; /* A buffer to store byte sequences */
/* create the database object */
db = kcdbnew();
/* open the database */
if (!kcdbopen(db, "db64.kct", KCOWRITER | KCOCREATE)) {
fprintf(stderr, "open error: %s\n", kcecodename(kcdbecode(db)));
}
/* store records */
for (i = 0; i < 5; i++) {
memcpy(i64, &keys[i], 8);
if (!kcdbset(db, i64, 8, values[i], strlen(values[i]))) {
fprintf(stderr, "set error: %s\n", kcecodename(kcdbecode(db)));
exit(EXIT_FAILURE);
}
}
/* traverse records */
min = 2;
max = 4;
printf("Range from %" PRId64 " to %" PRId64 "\n", min, max);
memcpy(i64, &min, 8);
cur = kcdbcursor(db);
kccurjumpkey(cur, i64, 8);
while ((kbuf = kccurget(cur, &ksiz, &cvbuf, &vsiz, 1)) != NULL) {
memcpy(&val, kbuf, 8);
if (val > max) {
break;
}
printf("Found %s\n", cvbuf);
kcfree(kbuf);
}
kccurdel(cur);
/* close the database */
if (!kcdbclose(db)) {
fprintf(stderr, "close error: %s\n", kcecodename(kcdbecode(db)));
}
/* delete the database object */
kcdbdel(db);
return 0;
}
LevelDB supports binary keys and ranged queries.
Edit: I forgot to mention that in order for the range query to work, the binary value needs to be packed in a comparable way. For your long example, you need to make sure it's big-endian encoded.
Related
I realised that my MongoDB backend stores UUID data as BinData type 3 and I'm finding it hard to manually trace or query documents because the UUIDs encoded through my application seem to differ from what I can see in the database. I might have to consider migrating all these to type 4, but I'm not sure how.
For example the UUID b36148dd-e185-428d-94d9-35dacabfa635 would normally encode in base64 to s2FI3eGFQo2U2TXayr+mNQ==. However, it is represented in my MongoDB as jUKF4d1IYbM1pr/K2jXZlA== with BinData type 3 (BinData(3, "jUKF4d1IYbM1pr/K2jXZlA==")).
I tried creating UUID from the given UUID string above:
> var uuid = UUID("b36148dde185428d94d935dacabfa635")
> uuid
BinData(3,"s2FI3eGFQo2U2TXayr+mNQ==")
So, if I understand it correctly BinData(3, "s2FI3eGFQo2U2TXayr+mNQ==") is not the same as BinData(3, "jUKF4d1IYbM1pr/K2jXZlA=="). However, based on my tests, BinData(4, "s2FI3eGFQo2U2TXayr+mNQ==") (note the type 4) seems to translate to the same UUID as BinData(3, "jUKF4d1IYbM1pr/K2jXZlA=="). If I have a BinData type 3 object, how do I convert to type 4 correctly? Another question is, if I have a UUID string, how to do I properly initialise a type 3 BinData?
If it is not supported by the driver, https://studio3t.com/knowledge-base/articles/mongodb-best-practices-uuid-data/#mongodb-best-practices has a java example which can be easily translated to target language:
/**
* Convert a UUID object to a Binary with a subtype 0x04
*/
public static Binary toStandardBinaryUUID(java.util.UUID uuid) {
long msb = uuid.getMostSignificantBits();
long lsb = uuid.getLeastSignificantBits();
byte[] uuidBytes = new byte[16];
for (int i = 15; i >= 8; i--) {
uuidBytes[i] = (byte) (lsb & 0xFFL);
lsb >>= 8;
}
for (int i = 7; i >= 0; i--) {
uuidBytes[i] = (byte) (msb & 0xFFL);
msb >>= 8;
}
return new Binary((byte) 0x04, uuidBytes);
}
/**
* Convert a Binary with a subtype 0x04 to a UUID object
* Please note: the subtype is not being checked.
*/
public static UUID fromStandardBinaryUUID(Binary binary) {
long msb = 0;
long lsb = 0;
byte[] uuidBytes = binary.getData();
for (int i = 8; i < 16; i++) {
lsb <<= 8;
lsb |= uuidBytes[i] & 0xFFL;
}
for (int i = 0; i < 8; i++) {
msb <<= 8;
msb |= uuidBytes[i] & 0xFFL;
}
return new UUID(msb, lsb);
}
Basically type 3 differs from type 4 by byte order.
It's internal data storage format, so it might be enough to change the way you explore your data. Robo 3T has following options:
UPDATE
A quick and dirty way to find documents matching a given UUID string using mongo shell:
var binData3 = UUID("b36148dde185428d94d935dacabfa635")
var binData4 = UUID("b36148dd-e185-428d-94d9-35dacabfa635")
I am new in C and writing a code to help my data analysis. Part of it opens predetermined files.
This piece of code is giving me problems and I cannot understand why.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXLOGGERS 26
// Declare the input files
char inputfile[];
char inputfile_hum[MAXLOGGERS][8];
// Declare the output files
char newfile[];
char newfile_hum[MAXLOGGERS][8];
int main()
{
int n = 2;
while (n > MAXLOGGERS)
{
printf("n error, n must be < %d: ", MAXLOGGERS);
scanf("%d", &n);
}
// Initialize the input and output file names
strncpy(inputfile_hum[1], "Ahum.csv", 8);
strncpy(inputfile_hum[2], "Bhum.csv", 8);
strncpy(newfile_hum[1], "Ahum.txt", 8);
strncpy(newfile_hum[2], "Bhum.txt", 8);
for (int i = 1; i < n + 1; i++)
{
strncpy(inputfile, inputfile_hum[i], 8);
FILE* file1 = fopen(inputfile, "r");
// Safety check
while (file1 == NULL)
{
printf("\nError: %s == NULL\n", inputfile);
printf("\nPress enter to exit:");
getchar();
return 0;
}
strncpy(newfile, newfile_hum[i], 8);
FILE* file2 = fopen(newfile, "w");
// Safety check
if (file2 == NULL)
{
printf("Error: file2 == NULL\n");
getchar();
return 0;
}
for (int c = fgetc(file1); c != EOF; c = fgetc(file1))
{
fprintf(file2, "%c", c);
}
fclose(file1);
fclose(file2);
}
// system("Ahum.txt");
// system("Bhum.txt");
}
This code produces two files but instead of the names:
Ahum.txt
Bhum.txt
the files are named:
Ahum.txtv
Bhum.txtv
The reason I am using strncpy in the for loop is because n will actually be inputted by the user later.
I see at least three problems here.
The first problem is that your character array is too small for your strings.
"ahum.txt", etc. will need to take nine characters. Eight for the actual text plus one more for the null terminating character.
The second problem is that you have declared the character arrays "newfile" and "inputfile" as empty arrays. These also need to be a number able to contain the strings (at least 9).
You're lucky to have not had a crash from overwriting memory out the program space.
The third and final problem is your use of strcpy().
strncpy(dest, src, n) will copy n characters from src to dest, but it won't copy final null terminator character if n is equal or less than size of the src string.
From strncpy() manpage: https://linux.die.net/man/3/strncpy
The strncpy() function ... at most n bytes of src are copied.
Warning: If there is no null byte among the first n bytes of src,
the string placed in dest will not be null-terminated.
Normally what you would want to do is have "n" be the size of the destination buffer minus 1 to allow for the null character.
For example:
strncpy(dest, src, sizeof(dest) - 1); // assuming dest is char array
There are a couple of problems with your code.
inputfile_hum, newfile_hum, need to be to be one char bigger for the trailing '\0' on strings.
char inputfile_hum[MAXLOGGERS][9];
...
char newfile_hum[MAXLOGGERS][9];
strncpy expects the first argument to be a char * region big enough to hold the expected results, so inputfile[] and outputfile[] need to be declared:
char inputfile[9];
char outputfile[9];
I found it in some exsiting code, it looks some problems, but the code works fine, can you help if this piece of code has any tricking things in.
why ignore two unsigned when calculate the size of the structure?
tmsg_sz = sizeof(plfm_xml_header_t) + sizeof(oid_t) + sizeof(char*)
+ sizeof(unsigned) + sizeof(snmp_varbind_t)*5 ;
tmsg = (snmp_trap_t*) malloc(tmsg_sz);
if (!tmsg) {
PRINTF("malloc failed \n");
free(trap_msg);
return -1;
}
memset (tmsg, 0, tmsg_sz);
tmsg->hdr.type = PLFM_SNMPTRAP_MSG;
copy_oid_oidt(clog_msg_gen_notif_oid, OID_LENGTH(clog_msg_gen_notif_oid), &tmsg->oid);
tmsg->trap_type = SNMP_TRAP_ENTERPRISESPECIFIC;
tmsg->trap_specific = 1;
tmsg->trapmsg = strdup("Trap Message");
tmsg->numofvar = 5;
build_snmp_varbind(&(tmsg->vars[0]), facility, STR_DATA_TYPE, sizeof(facility)+1, clog_hist_facility_oid, 14);
build_snmp_varbind(&(tmsg->vars[1]), &sev, U32_DATA_TYPE, sizeof(sev),clog_hist_severity_oid, 14);
build_snmp_varbind(&(tmsg->vars[2]), name, STR_DATA_TYPE, sizeof(name)+1, clog_hist_msgname_oid, 14);
build_snmp_varbind(&(tmsg->vars[3]), trap_msg, STR_DATA_TYPE, strlen(trap_msg)+1,clog_hist_msgtext_oid, 14);
// get system uptime
long uptime = get_uptime();
build_snmp_varbind(&(tmsg->vars[4]), (long*)&uptime, TMR_DATA_TYPE, sizeof(uptime),clog_hist_timestamp_oid, 14);
typedef struct snmp_trap_s {
plfm_xml_header_t hdr;
oid_t oid; /* trap oid */
unsigned trap_type;
unsigned trap_specific;
char *trapmsg; /* text message for this trap */
unsigned numofvar;
snmp_varbind_t vars[0];
} __attribute__((__packed__)) snmp_trap_t;
Compilers try hard to put multibyte data aligned in various ways. For example, an int variable, in an architecture where sizeof int == 4, may need to be placed in a location divisible by 4. This may be a hard requirement, or this may just make the system more efficient; it depends on the computer. So, consider
typedef struct combo {
char c;
int i;
} combo;
Depending on the architecture, sizeof combo may be 5, 6, or most often 8. Swap the two members, and the size should be 5.
typedef struct combo2 {
int i;
char c;
} combo2;
However, an array of combo2s may have a size you do not expect:
combo2 cb[2];
The size of cb could very well be 16, as 3 bytes of wasted space follow combo2[0] and combo2[1]. This lets combo2[1].i start at a location divisible by 4.
A recommendation is to order the members of a structure by size; the 8-byte members should precede the 4-byte members, then the 2-byte members, then the 1-byte members. Of course, you have to be aware of typical sizes, and you can't be working on an oddball architecture where characters are not packed into larger words. Cray? cough-cough.
I'm trying to draw a line, broken out into segments dependent on values. For example, if there are 5 fields, and all 5 fields were true, then my Line would look like
-----
If say the first and last fields were true, and everything else would be false, then it would be
- -
I thought I could do this with a bit mask of some sort. First of all, I've never done a bit mask before, but I think I've seen them here and there. I was wondering how I could go about this, and use enumerations instead of 1/0 for readability. As far as I can see from my data, I would only need values of either 1 or 0 for the different properties. However, it would be good to know how to have one of the values be a three level or higher enumeration for future reference. Thanks!
Trying to do something like:
enum CodingRegions {
Coding = 0x01,
NonCoding = 0x02
};
enum Substitution {
Synonymous = 0x04,
NonSynonmous = 0x05
};
Then based on the value of the object, I could do
bitmask???? = object.CodingRegion | object.Substitution;
Then later, check the value of the bitmask somehow, and then draw the line accordingly based on what the values are.
Not sure exactly what your requirements are, but here is one way it might be written in C:
#include <stdio.h>
typedef enum MyField_ {
hasWombat = 1 << 0,
hasTrinket = 1 << 1,
hasTinRoof = 1 << 2,
hasThreeWheels = 1 << 3,
myFieldEnd = 1 << 4,
} MyField;
void printMyField(MyField data) {
MyField field = 1;
while (field != myFieldEnd) {
printf("%c", data & field ? '-' : ' ');
field <<= 1;
}
printf("\n");
}
int main() {
MyField data = hasTrinket | hasThreeWheels;
printMyField(data);
data |= hasWombat; // set a field
data &= ~hasTrinket; // clear a field
printMyField(data);
return 0;
}
Not sure this is what you want, but:
// assumed Coding/NonCoding, Synonomous/NonSynonymous are opposites of each other. If not, add more bit fields
enum CodingRegions
{
Coding = 1 << 0
} ;
enum Substitution
{
Synonymous = 1 << 1
}
void PrintBitmask( NSUInteger bitmask )
{
printf( "%s", bitmask & Coding != 0 ? "-" : " " ) ;
printf( "%s", bitmask & Substitution != 0 ? "-" : " " ) ;
printf( "\n" ) ;
}
Your PrintBitmask() could also look like this:
void PrintBitmask( NSUInteger bitmask )
{
printf( "%s", bitmask & Coding != 0 ? "Coding" : "Noncoding" ) ;
printf( "|" ) ;
printf( "%s", bitmask & Substitution != 0 ? "Synonymous-" : "Nonsynonymous" ) ;
printf( "\n" ) ;
}
/* I prefer macros over enums (at least for something this simple) */
#define SPACE 0x0
#define DASH 0x1
/* input fields */
int fields[5] = {DASH,SPACE,SPACE,SPACE,DASH};
/* create bitmask */
for (int i=0; i<5; i++) {
mask |= (fields[i] << i);
}
/* interpret bitmask and print the line */
for (int i=0; i<5; i++) {
if (mask & (fields[i] << i)) {
printf("%c", '-');
} else {
printf("%c", ' ');
}
}
Imagine a std:vector, say, with 100 things on it (0 to 99) currently. You are treating it as a loop. So the 105th item is index 4; forward 7 from index 98 is 5.
You want to delete N items after index position P.
So, delete 5 items after index 50; easy.
Or 5 items after index 99: as you delete 0 five times, or 4 through 0, noting that position at 99 will be erased from existence.
Worst, 5 items after index 97 - you have to deal with both modes of deletion.
What's the elegant and solid approach?
Here's a boring routine I wrote
-(void)knotRemovalHelper:(NSMutableArray*)original
after:(NSInteger)nn howManyToDelete:(NSInteger)desired
{
#define ORCO ((NSInteger)[original count])
static NSInteger kount, howManyUntilLoop, howManyExtraAferLoop;
if ( ... our array is NOT a loop ... )
// trivial, if messy...
{
for ( kount = 1; kount<=desired; ++kount )
{
if ( (nn+1) >= ORCO )
return;
[original removeObjectAtIndex:( nn+1 )];
}
return;
}
else // our array is a loop
// messy, confusing and inelegant. how to improve?
// here we go...
{
howManyUntilLoop = (ORCO-1) - nn;
if ( howManyUntilLoop > desired )
{
for ( kount = 1; kount<=desired; ++kount )
[original removeObjectAtIndex:( nn+1 )];
return;
}
howManyExtraAferLoop = desired - howManyUntilLoop;
for ( kount = 1; kount<=howManyUntilLoop; ++kount )
[original removeObjectAtIndex:( nn+1 )];
for ( kount = 1; kount<=howManyExtraAferLoop; ++kount )
[original removeObjectAtIndex:0];
return;
}
#undef ORCO
}
Update!
InVariant's second answer leads to the following excellent solution. "starting with" is much better than "starting after". So the routine now uses "start with". Invariant's second answer leads to this very simple solution...
N times do if P < currentsize remove P else remove 0
-(void)removeLoopilyFrom:(NSMutableArray*)ra
startingWithThisOne:(NSInteger)removeThisOneFirst
howManyToDelete:(NSInteger)countToDelete
{
// exception if removeThisOneFirst > ra highestIndex
// exception if countToDelete is > ra size
// so easy thanks to Invariant:
for ( do this countToDelete times )
{
if ( removeThisOneFirst < [ra count] )
[ra removeObjectAtIndex:removeThisOneFirst];
else
[ra removeObjectAtIndex:0];
}
}
Update!
Toolbox has pointed out the excellent idea of working to a new array - super KISS.
Here's an idea off the top of my head.
First, generate an array of integers representing the indices to remove. So "remove 5 from index 97" would generate [97,98,99,0,1]. This can be done with the application of a simple modulus operator.
Then, sort this array descending giving [99,98,97,1,0] and then remove the entries in that order.
Should work in all cases.
This solution seems to work, and it copies all remaining elements in the vector only once (to their final destination).
Assume kNumElements, kStartIndex, and kNumToRemove are defined as const size_t values.
vector<int> my_vec(kNumElements);
for (size_t i = 0; i < my_vec.size(); ++i) {
my_vec[i] = i;
}
for (size_t i = 0, cur = 0; i < my_vec.size(); ++i) {
// What is the "distance" from the current index to the start, taking
// into account the wrapping behavior?
size_t distance = (i + kNumElements - kStartIndex) % kNumElements;
// If it's not one of the ones to remove, then we keep it by copying it
// into its proper place.
if (distance >= kNumToRemove) {
my_vec[cur++] = my_vec[i];
}
}
my_vec.resize(kNumElements - kNumToRemove);
There's nothing wrong with two loop solutions as long as they're readable and don't do anything redundant. I don't know Objective-C syntax, but here's the pseudocode approach I'd take:
endIdx = after + howManyToDelete
if (Len <= after + howManyToDelete) //will have a second loop
firstloop = Len - after; //handle end in the first loop, beginning in second
else
firstpass = howManyToDelete; //the first loop will get them all
for (kount = 0; kount < firstpass; kount++)
remove after+1
for ( ; kount < howManyToDelete; kount++) //if firstpass < howManyToDelete, clean up leftovers
remove 0
This solution doesn't use mod, does the limit calculation outside the loop, and touches the relevant samples once each. The second for loop won't execute if all the samples were handled in the first loop.
The common way to do this in DSP is with a circular buffer. This is just a fixed length buffer with two associated counters:
//make sure BUFSIZE is a power of 2 for quick mod trick
#define BUFSIZE 1024
int CircBuf[BUFSIZE];
int InCtr, OutCtr;
void PutData(int *Buf, int count) {
int srcCtr;
int destCtr = InCtr & (BUFSIZE - 1); // if BUFSIZE is a power of 2, equivalent to and faster than destCtr = InCtr % BUFSIZE
for (srcCtr = 0; (srcCtr < count) && (destCtr < BUFSIZE); srcCtr++, destCtr++)
CircBuf[destCtr] = Buf[srcCtr];
for (destCtr = 0; srcCtr < count; srcCtr++, destCtr++)
CircBuf[destCtr] = Buf[srcCtr];
InCtr += count;
}
void GetData(int *Buf, int count) {
int srcCtr = OutCtr & (BUFSIZE - 1);
int destCtr = 0;
for (destCtr = 0; (srcCtr < BUFSIZE) && (destCtr < count); srcCtr++, destCtr++)
Buf[destCtr] = CircBuf[srcCtr];
for (srcCtr = 0; srcCtr < count; srcCtr++, destCtr++)
Buf[destCtr] = CircBuf[srcCtr];
OutCtr += count;
}
int BufferOverflow() {
return ((InCtr - OutCtr) > BUFSIZE);
}
This is pretty lightweight, but effective. And aside from the ctr = BigCtr & (SIZE-1) stuff, I'd argue it's highly readable. The only reason for the & trick is in old DSP environments, mod was an expensive operation so for something that ran often, like every time a buffer was ready for processing, you'd find ways to remove stuff like that. And if you were doing FFT's, your buffers were probably a power of 2 anyway.
These days, of course, you have 1 GHz processors and magically resizing arrays. You kids get off my lawn.
Another method:
N times do {remove entry at index P mod max(ArraySize, P)}
Example:
N=5, P=97, ArraySize=100
1: max(100, 97)=100 so remove at 97%100 = 97
2: max(99, 97)=99 so remove at 97%99 = 97 // array size is now 99
3: max(98, 97)=98 so remove at 97%98 = 97
4: max(97, 97)=97 so remove at 97%97 = 0
5: max(96, 97)=97 so remove at 97%97 = 0
I don't program iphone for know, so I image std::vector, it's quite easy, simple and elegant enough:
#include <iostream>
using std::cout;
#include <vector>
using std::vector;
#include <cassert> //no need for using, assert is macro
template<typename T>
void eraseCircularVector(vector<T> & vec, size_t position, size_t count)
{
assert(count <= vec.size());
if (count > 0)
{
position %= vec.size(); //normalize position
size_t positionEnd = (position + count) % vec.size();
if (positionEnd < position)
{
vec.erase(vec.begin() + position, vec.end());
vec.erase(vec.begin(), vec.begin() + positionEnd);
}
else
vec.erase(vec.begin() + position, vec.begin() + positionEnd);
}
}
int main()
{
vector<int> values;
for (int i = 0; i < 10; ++i)
values.push_back(i);
cout << "Values: ";
for (vector<int>::const_iterator cit = values.begin(); cit != values.end(); cit++)
cout << *cit << ' ';
cout << '\n';
eraseCircularVector(values, 5, 1); //remains 9: 0,1,2,3,4,6,7,8,9
eraseCircularVector(values, 16, 5); //remains 4: 3,4,6,7
cout << "Values: ";
for (vector<int>::const_iterator cit = values.begin(); cit != values.end(); cit++)
cout << *cit << ' ';
cout << '\n';
return 0;
}
However, you might consider:
creating new loop_vector class, if you use this kind of functionality enough
using list if you perform many deletions (or few deletions (not from end, that's simple pop_back) but large array)
If your container (NSMutableArray or whatever) is not list, but vector (i.e. resizable array), you most definitely don't want to delete items one by one, but whole range (e.g. std::vector's erase(begin, end)!
Edit: reacting to comment, to fully realize what must be done by vector, if you erase element other than the last one: it must copy all values after that element (e.g. 1000 items in array, you erase first, 999x copying (moving) of item, that is very costly).
Example:
#include <iostream>
#include <vector>
#include <ctime>
using namespace std;
int main()
{
clock_t start, end;
vector<int> vec;
const int items = 64 * 1024;
cout << "using " << items << " items in vector\n";
for (size_t i = 0; i < items; ++i) vec.push_back(i);
start = clock();
while (!vec.empty()) vec.erase(vec.begin());
end = clock();
cout << "Inefficient method took: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC << " ms\n";
for (size_t i = 0; i < items; ++i) vec.push_back(i);
start = clock();
vec.erase(vec.begin(), vec.end());
end = clock();
cout << "Efficient method took: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC << " ms\n";
return 0;
}
Produces output:
using 65536 items in vector
Inefficient method took: 1.705 ms
Efficient method took: 0 ms
Note it's very easy to get inefficient, look e.g. have at http://www.cplusplus.com/reference/stl/vector/erase/