MongoDB: How can I add a new hash field directly from the console? - mongodb

I have objects like:
{ "_id" : ObjectId( "4e00e83608146e71e6edba81" ),
....
"text" : "Text now exists in the database"}
and I can add hash fields through java using the com.mongodb.util.Hash.longHash method to create
{ "_id" : ObjectId( "4e00e83608146e71e6edba81" ),
....
"text" : "Text now exists in the database",
"tHash" : -4375633875013353634 }
But this is quite slow. I would like to be able to do something within the database like:
db.foo.find( {} ).forEach( function (x) {
x.tHash = someFunction(x.text); // create a long hash compatible with com.mongodb.util.Hash.longHash
db.foo.save(x);
});
Does anyone know how I can call this long hash within the Javascript function?

First define a nice hashCode function to use. JavaScript does not have a hashCode function by default on all objects so you will need to write one. Or just use this one:
var hashCode = function(s) {
if (s == null) return 0;
if (s.length == 0) return 1;
var hash = 0;
for (var i = 0; i < s.length; i++) {
hash = ((hash << 5) - hash) + s.charCodeAt(i);
hash = hash & hash; // Convert to 32bit integer
}
return hash;
};
Alternatively use another hash function like MD5 - there are scripts that can generate them for you.

I gave up trying to replicate the Mongo Java driver Hash.longHash method in Javascript, since JS treats everything as a float and doesn't handle the overflow like Java does. I found some examples of replicating the Java hashCode function in JS and so I did this:
longHash = function(s){
var hash = 0;
if (s.length == 0) return hash;
for (i = 0; i < s.length; i++) {
char = s.charCodeAt(i);
hash = ((hash<<5)-hash)+char;
hash = hash & hash; // Convert to 32bit integer
}
return NumberInt(hash);
};
db.foo.find( {} ).forEach( function (x) {
x.cHash = longHash(x.c);
db.foo.save(x);
});
which at least let me do a integer level hash code on the existing data. This will be enough to narrow down data for indexing.
Update: I just updated with by returning a NumberInt type instead. By default the hash was a Javascript number and was stored in Mongo as a Double taking much more space than necessary. The NumberInt is a 32-bit signed integer, and NumberLong is a 64-bit version.

Related

CS50 pset 3: Tideman sort_pairs function

I need some assistance in understanding the logic behind this function. This is my current sort_pairs function in Tideman:
// Sort pairs in decreasing order by the strength of victory
void sort_pairs(void)
{
qsort(pairs, pair_count, sizeof(pair), compare);
return;
}
// Function for sort_pairs
int compare(const void *a, const void *b)
{
const pair *p1 = (const pair *) a;
const pair *p2 = (const pair *) b;
if (p1->winner < p2->winner)
{
return -1;
}
else if (p1->winner > p2->winner)
{
return 1;
}
else
{
return 0;
}
}
This does not clear check50 and I looked online to find how to approach this problem. It seems that most functions compare the values from the preferences array instead (eg preferences[pairs[i].winner][pairs[i].loser]) . My previous functions vote, record_preferences, and add_pairs all clear check50. I have not advanced beyond sort_pairs yet.
Why can't I compare the strength of victory directly from the pairs array instead since I already have the data stored there?
You don't need to make this so complex, you can use your own sorting here. Let's try a simple insertion sort-
void sort_pairs()
{
pair temp;
for (int i = 1, j; i < pair_count; i++)
{
temp = pairs[i];
j = i - 1;
for (; j >= 0 && preferences[pairs[j].winner][pairs[j].loser] < preferences[temp.winner][temp.loser]; j--)
{
pairs[j + 1] = pairs[j];
}
pairs[j + 1] = temp;
}
}
The pair struct looks like-
typedef struct
{
int winner;
int loser;
}
pair;
Explanation:-
We go through each pair of elements inside the pairs array - starting at 1 since I'm going to compare with the previous element (j = i - 1)
Now we check all the previous elements from the current element and compare them with the key - preferences[pairs[INDEX].winner][pairs[INDEX].loser]
This is the key you should be sorting by. preferences[WINNER_ID][LOSER_ID] means the amount of people that prefer WINNER_ID over LOSER_ID.
And that's pretty much it!, it's simply a insertion sort but the key is the important part.

how can this Repeated string concatenation function

using UnityEngine;
using System.Collections;
public class NewMonoBehaviour1 : MonoBehaviour
{
void ConcatExample(int[] intArray)
{
string line = intArray[0].ToString(); // the line is the var of the first in array
for(i =1;i <intArray.Length; i++) // the length is unknown ?
{
line += ", " + intArray[i].ToString(); //
}
return line;
//each time allocate new in original place
}
}
How can this function work ? the length of array is unknown , so how the for loop works ?Besides, this is void function but shouldn't return anythings right ,or is there any exceptional case ,finally,according to the unity manual, it is said that the function will keep producing a string but with new contents in the same place , resulting in consuming large memory space .Why ?thx
What makes you think that the Length should be unknown? It is a property that any array simply has
Gets the total number of elements in all the dimensions of the Array.
Of course it is not unknown the moment you call your method with an according parameter!
The return line; will not even compile since as you say the method is of type void so it can not return anything. It should probably be private string ConcatExample
Then what the unity manual (don't know where exactly you read this) means lies in
line += ", " + intArray[i].ToString();
under the hood every string in c# is an immutable char[]. So everytime you do a string concatenation via stringC = stringA + stringB what happens under the hood is basically something similar to
char[] stringC = new char[stringA.Length + stringB.Length];
for(var iA = 0; iA < stringA.Length; iA++)
{
stringC[i] = stringA[i];
}
for(var iB = 0; iB < stringB.Length; iB++)
{
stringC[iB + stringA.Length] = stringB[iB];
}
so whenever dealing with loops especially with large data it is strongly recommended to rather use a StringBuilder like
private string ConcatExample(int[] intArray)
{
var builder = new StringBuilder(intArray[0]);
for(i =1; i < intArray.Length; i++)
{
builder.Append(", ").Append(intArray[i].ToString());
}
return builder.ToString();
}
The length of the array will be the length of the array of ints you pass into the function as an argument.
say you pass it
Int[] ints = {1,2,3}
ConcatExample(ints); //the length of the array is now 3
add a debug.log() function to the ConcatExample method
void ConcatExample(int[] intArray)
{
string line = intArray[0].ToString();
for (int i = 1; i < intArray.Length; i++)
{
line += ", " + intArray[i].ToString(); //
Debug.Log(line);
}
}
debug.log would produce the following in the console
1, 2
1, 2, 3
and finally the return line; at the end would just result in an error because yes you are correct void returns nothing
This function CANNOT work, unless it gets the data it expects. A NULL passed to this function, for example, would generate a runtime null-reference exception. Passing a valid integer array, of length zero would generate an invalid index error on the first line.
You are correct, the function returns nothing, and appears pointless. In fact, I would have expected return line; to generate a complier error.
The string type appears "dynamic" meaning, it will indeed allocate more and more memory as needed. Technically, it is actually the string "+" operator, (a function that takes two strings as parameters) that is allocating this space. This function returns a new string, of the appropriate size. The garbage collector will DEallocate "old" strings when they are no longer referenced by any variables.

Support for basic datatypes in H5Attributes?

I am trying out the beta hdf5 toolkit of ilnumerics.
Currently I see H5Attributes support only ilnumerics arrays. Is there any plan to extend it for basic datatypes (such as string) as part of the final release?
Does ilnumerics H5 wrappers provide provision for extending any functionality to a particular
datatype?
ILNumerics internally uses the official HDF5 libraries from the HDF Group, of course. H5Attributes in HDF5 correspond to datasets with the limitation of being not capable of partial I/O. Besides that, H5Attributes are plain arrays! Support for basic (scalar) element types is given by assuming the array stored to be scalar.
Strings are a complete different story: strings in general are variable length datatypes. In terms of HDF5 strings are arrays of element type Char. The number of characters in the string determines the length of the array. In order to store a string into a dataset or attribute, you will have to store its individual characters as elements of the array. In ILNumerics, you can convert your string into ILArrray or ILArray (for ASCII data) and store that into the dataset/ attribute.
Please consult the following test case which stores a string as value into an attribute and reads the content back into a string.
Disclaimer: This is part of our internal test suite. You will not be able to compile the example directly, since it depends on the existence of several functions which may are not available. However, you will be able to understand how to store strings into datasets and attributes:
public void StringASCIAttribute() {
string file = "deleteA0001.h5";
string val = "This is a long string to be stored into an attribute.\r\n";
// transfer string into ILArray<Char>
ILArray<Char> A = ILMath.array<Char>(' ', 1, val.Length);
for (int i = 0; i < val.Length; i++) {
A.SetValue(val[i], 0, i);
}
// store the string as attribute of a group
using (var f = new H5File(file)) {
f.Add(new H5Group("grp1") {
Attributes = {
{ "title", A }
}
});
}
// check by reading back
// read back
using (var f = new H5File(file)) {
// must exist in the file
Assert.IsTrue(f.Get<H5Group>("grp1").Attributes.ContainsKey("title"));
// check size
var attr = f.Get<H5Group>("grp1").Attributes["title"];
Assert.IsTrue(attr.Size == ILMath.size(1, val.Length));
// read back
ILArray<Char> titleChar = attr.Get<Char>();
ILArray<byte> titleByte = attr.Get<byte>();
// compare byte values (sum)
int origsum = 0;
foreach (var c in val) origsum += (Byte)c;
Assert.IsTrue(ILMath.sumall(ILMath.toint32(titleByte)) == origsum);
StringBuilder title = new StringBuilder(attr.Size[1]);
for (int i = 0; i < titleChar.Length; i++) {
title.Append(titleChar.GetValue(i));
}
Assert.IsTrue(title.ToString() == val);
}
}
This stores arbitrary strings as 'Char-array' into HDF5 attributes and would work just the same for H5Dataset.
As an alternative solution you may use HDF5DotNet (http://hdf5.net/default.aspx) wrapper to write attributes as strings:
H5.open()
Uri destination = new Uri(#"C:\yourFileLocation\FileName.h5");
//Create an HDF5 file
H5FileId fileId = H5F.create(destination.LocalPath, H5F.CreateMode.ACC_TRUNC);
//Add a group to the file
H5GroupId groupId = H5G.create(fileId, "groupName");
string myString = "String attribute";
byte[] attrData = Encoding.ASCII.GetBytes(myString);
//Create an attribute of type STRING attached to the group
H5AttributeId attrId = H5A.create(groupId, "attributeName", H5T.create(H5T.CreateClass.STRING, attrData.Length),
H5S.create(H5S.H5SClass.SCALAR));
//Write the string into the attribute
H5A.write(attributeId, H5T.create(H5T.CreateClass.STRING, attrData.Length), new H5Array<byte>(attrData));
H5A.close(attributeId);
H5G.close(groupId);
H5F.close(fileId);
H5.close();

Querying Mongodb Key and Value using C driver

mongo_cursor *cursor=mongo_find(conn,TEST_NS,query,NULL,0,0,0);
count_matched=0;
bson *doc;
while(mongo_cursor_next(cursor)==MONGO_OK)
{
count_matched++;
doc=(bson *)mongo_cursor_bson(cursor);
bson_iterator_init(&it,doc);
while(bson_iterator_next(&it) != BSON_EOO)
{
fprintf(stderr,"%s : %s\n\n",bson_iterator_key(&it),bson_iterator_string(&it));
}
}
This code is working perfectly and i can see the matched documents (Key + Value) but now i want to save the matched document's key and value to a string. Can any tell me how i can save the return value of key and value in to a string?
One document includes (all strings)
Total Key=10
Total value=10
and i want to save 10 document's key and value at one time. I am using C driver of mongodb.
The following code shows how you would be doing copy of the key and values from the bson iterator into your key-value arrays temp_key and temp_value. The specific block of code is in between the comments marked START and END.
Additionally, you can find documentation for accessing BSON document contents at http://api.mongodb.org/c/current/bson.html .
mongo_cursor *cursor = mongo_find(&conn, TEST_NS, &query, NULL, 0, 0, 0);
int count_matched = 0;
bson *doc;
// Assuming you are just looking for 100 key / value pair of max length of 99 characters
const unsigned KV_ARRAY_LENGTH = 100;
const unsigned MAX_KV_LENGTH = 105;
char temp_key[KV_ARRAY_LENGTH][MAX_KV_LENGTH + 1], temp_value[KV_ARRAY_LENGTH][MAX_KV_LENGTH + 1];
int i = 0;
while (mongo_cursor_next(cursor) == MONGO_OK) {
count_matched++;
doc=(bson *)mongo_cursor_bson(cursor);
bson_iterator it;
bson_iterator_init(&it,doc);
while (bson_iterator_next(&it) != BSON_EOO) {
fprintf(stderr,"%s : %s\n", bson_iterator_key(&it), bson_iterator_string(&it));
/******* START - Code to capture key-value into appropriate array */
if (i < KV_ARRAY_LENGTH) {
/* - Collect key-value pairs only if there is space in the array
* - Key / Value woud be captured only till the max amount of space available for them i.e. MAX_KV_LENGTH in this case
* */
strncpy(temp_key[i], bson_iterator_key(&it), MAX_KV_LENGTH);
strncpy(temp_value[i], bson_iterator_string(&it), MAX_KV_LENGTH);
temp_key[i][MAX_KV_LENGTH] = temp_value[i][MAX_KV_LENGTH] = '\0';
++i;
} else {
/* whatever need to be done if there is no room in the array */
}
/******* END - Code to capture key-value into appropriate array */
}
}
/* Test iterating through the key-value pair constructed in query iteration */
fprintf(stdout, "--- Fields collected ---\n");
int keyIndex = 0;
for ( ; keyIndex < i; ++keyIndex) {
fprintf(stdout, "{key: %s, value: %s}\n", temp_key[keyIndex], temp_value[keyIndex]);
}
mongo_cursor *cursor=mongo_find(conn,TEST_NS,query,NULL,0,0,0);
count_matched=0;
bson *doc;
//Answer
const char* temp_key[100][100],temp_value[100][100];
int i=0;
while(mongo_cursor_next(cursor)==MONGO_OK)
{
count_matched++;
doc=(bson *)mongo_cursor_bson(cursor);
bson_iterator_init(&it,doc);
while(bson_iterator_next(&it) != BSON_EOO)
{
fprintf(stderr,"%s : %s\n\n",bson_iterator_key(&it),bson_iterator_string(&it));
temp[i][0]=bson_iterator_key[&it]; //Answer
temp_value[i][0]=bson_iterator_key[&it]; //Answer
i++; //Answer
}
}
Just for the record, this is the rough sketch and i know about corruption of the temp variables and their overflow but i will remove it according to my code.

is there any way to use .indexOf to search a javascript array in mirth?

I am trying to find a string in a javascript array in the transformer of a mirth channel. Mirth throws an error when I try to use indexOf function. My understanding is that indexOf is something that browsers add in, rather than a native part of the javascript language itself. ( How do I check if an array includes an object in JavaScript? )
So is array.indexOf just not supported in Mirth? Is there any way to use .indexOf in Mirth? Maybe an alternate syntax? Or do I need to just loop thru the array to search?
This is how I search arrays in a Mirth js transformer:
var Yak = [];
Yak.push('test');
if(Yak.indexOf('test') != -1)
{
// do something
}
Does this give you error?
Mirth uses the Rhino engine for Javascript, and on some earlier versions of the JVM, indexOf appeared to not be supported on arrays. Since upgrading our JVM to 1.6.23 (or higher), indexOf has started working. However, we still have legacy code that, when searching arrays of strings, I just use a loop each time:
var compareString = "blah";
var index = -1;
for (var i = 0; i < myArray.length; ++i)
{
if (myArray[i] == compareString)
{
index = i;
break;
}
}
If you need to do this frequently, you should be able to use a code template to manually add the indexOf function to Array.
Set the code template to global access, and try out something like this (untested code):
Array.prototype.indexOf = function(var compareObject)
{
for (var i = 0; i < myArray.length; ++i)
{
// I don't think this is actually the right way to compare
if (myArray[i] == compareObject)
{
return i;
}
}
return -1;
}
var arr = ['john',1,'Peter'];
if(arr.indexOf('john') > -1)
{
//match. what to do?
console.log("found");
}
else
{
console.log("not found");//not found .. do something
}
var i = ['a', 'b', 'c']
if(i.indexOf('a') > -1)
{
///do this, if it finds something in the array that matches what inside the indexOf()
}
else
{
//do something else if it theres no match in array
}