Strange object value instead of float by using mapReduce in mongodb with Doctrine - mongodb

I use mongo query for calculating sum price for every item.
My query looks like so
$queryBuilder = new Query\Builder($this, $documentName);
$queryBuilder->field('created')->gte($startDate);
$queryBuilder->field('is_test_value')->notEqual(true);
..........
$queryBuilder->map('function() {emit(this.item, this.price)}');
$queryBuilder->reduce('function(item, valuesPrices) {
return {sum: Array.sum(valuesPrices)}
}');
And this works, no problem. But I found that in some cases (approximately 20 cases from 200 results) I have strange result in field sum - instead of sum value I see construction like
[objectObject]444444444444444
4 - is price for item.
I tried to replace reduce block to block like this:
var sum = 0;
for (var i = 0; i < valuesPrices.length; i++) {
sum += parseFloat(valuesPrices[i]);
}
return {sum: sum}
In that case I see NAN value.
I suspected that some data in field price was inserted incorrectly (not as float, but as string, object etc). I tried execute my query from mongo cli and I see that all price values are integer.

It's not "strange" at all. You "broke the rules" and now you are paying for it.
"MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key."
The primary rule of mapReduce (as cited ) is that you must return exactly the same structure from the "reducer" as you do from the "mapper". This is because the "reducer" can actually run several times for the same "key". This is how mapReduce processes large lists.
You fix this by just returning a singular value, just like you did in the emit:
return Array.sum(values);
And then there will not be a problem. Adding an object key to that makes the data inconsistent, and thus you get an error when the "reduced" result gets fed back into the "reducer" again.

Related

Why Iam getting ReferenceOutOfRangeException while PlayerPref a list in Unity [duplicate]

I have some code and when it executes, it throws a IndexOutOfRangeException, saying,
Index was outside the bounds of the array.
What does this mean, and what can I do about it?
Depending on classes used it can also be ArgumentOutOfRangeException
An exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll but was not handled in user code Additional information: Index was out of range. Must be non-negative and less than the size of the collection.
What Is It?
This exception means that you're trying to access a collection item by index, using an invalid index. An index is invalid when it's lower than the collection's lower bound or greater than or equal to the number of elements it contains.
When It Is Thrown
Given an array declared as:
byte[] array = new byte[4];
You can access this array from 0 to 3, values outside this range will cause IndexOutOfRangeException to be thrown. Remember this when you create and access an array.
Array Length
In C#, usually, arrays are 0-based. It means that first element has index 0 and last element has index Length - 1 (where Length is total number of items in the array) so this code doesn't work:
array[array.Length] = 0;
Moreover please note that if you have a multidimensional array then you can't use Array.Length for both dimension, you have to use Array.GetLength():
int[,] data = new int[10, 5];
for (int i=0; i < data.GetLength(0); ++i) {
for (int j=0; j < data.GetLength(1); ++j) {
data[i, j] = 1;
}
}
Upper Bound Is Not Inclusive
In the following example we create a raw bidimensional array of Color. Each item represents a pixel, indices are from (0, 0) to (imageWidth - 1, imageHeight - 1).
Color[,] pixels = new Color[imageWidth, imageHeight];
for (int x = 0; x <= imageWidth; ++x) {
for (int y = 0; y <= imageHeight; ++y) {
pixels[x, y] = backgroundColor;
}
}
This code will then fail because array is 0-based and last (bottom-right) pixel in the image is pixels[imageWidth - 1, imageHeight - 1]:
pixels[imageWidth, imageHeight] = Color.Black;
In another scenario you may get ArgumentOutOfRangeException for this code (for example if you're using GetPixel method on a Bitmap class).
Arrays Do Not Grow
An array is fast. Very fast in linear search compared to every other collection. It is because items are contiguous in memory so memory address can be calculated (and increment is just an addition). No need to follow a node list, simple math! You pay this with a limitation: they can't grow, if you need more elements you need to reallocate that array (this may take a relatively long time if old items must be copied to a new block). You resize them with Array.Resize<T>(), this example adds a new entry to an existing array:
Array.Resize(ref array, array.Length + 1);
Don't forget that valid indices are from 0 to Length - 1. If you simply try to assign an item at Length you'll get IndexOutOfRangeException (this behavior may confuse you if you think they may increase with a syntax similar to Insert method of other collections).
Special Arrays With Custom Lower Bound
First item in arrays has always index 0. This is not always true because you can create an array with a custom lower bound:
var array = Array.CreateInstance(typeof(byte), new int[] { 4 }, new int[] { 1 });
In that example, array indices are valid from 1 to 4. Of course, upper bound cannot be changed.
Wrong Arguments
If you access an array using unvalidated arguments (from user input or from function user) you may get this error:
private static string[] RomanNumbers =
new string[] { "I", "II", "III", "IV", "V" };
public static string Romanize(int number)
{
return RomanNumbers[number];
}
Unexpected Results
This exception may be thrown for another reason too: by convention, many search functions will return -1 (nullables has been introduced with .NET 2.0 and anyway it's also a well-known convention in use from many years) if they didn't find anything. Let's imagine you have an array of objects comparable with a string. You may think to write this code:
// Items comparable with a string
Console.WriteLine("First item equals to 'Debug' is '{0}'.",
myArray[Array.IndexOf(myArray, "Debug")]);
// Arbitrary objects
Console.WriteLine("First item equals to 'Debug' is '{0}'.",
myArray[Array.FindIndex(myArray, x => x.Type == "Debug")]);
This will fail if no items in myArray will satisfy search condition because Array.IndexOf() will return -1 and then array access will throw.
Next example is a naive example to calculate occurrences of a given set of numbers (knowing maximum number and returning an array where item at index 0 represents number 0, items at index 1 represents number 1 and so on):
static int[] CountOccurences(int maximum, IEnumerable<int> numbers) {
int[] result = new int[maximum + 1]; // Includes 0
foreach (int number in numbers)
++result[number];
return result;
}
Of course, it's a pretty terrible implementation but what I want to show is that it'll fail for negative numbers and numbers above maximum.
How it applies to List<T>?
Same cases as array - range of valid indexes - 0 (List's indexes always start with 0) to list.Count - accessing elements outside of this range will cause the exception.
Note that List<T> throws ArgumentOutOfRangeException for the same cases where arrays use IndexOutOfRangeException.
Unlike arrays, List<T> starts empty - so trying to access items of just created list lead to this exception.
var list = new List<int>();
Common case is to populate list with indexing (similar to Dictionary<int, T>) will cause exception:
list[0] = 42; // exception
list.Add(42); // correct
IDataReader and Columns
Imagine you're trying to read data from a database with this code:
using (var connection = CreateConnection()) {
using (var command = connection.CreateCommand()) {
command.CommandText = "SELECT MyColumn1, MyColumn2 FROM MyTable";
using (var reader = command.ExecuteReader()) {
while (reader.Read()) {
ProcessData(reader.GetString(2)); // Throws!
}
}
}
}
GetString() will throw IndexOutOfRangeException because you're dataset has only two columns but you're trying to get a value from 3rd one (indices are always 0-based).
Please note that this behavior is shared with most IDataReader implementations (SqlDataReader, OleDbDataReader and so on).
You can get the same exception also if you use the IDataReader overload of the indexer operator that takes a column name and pass an invalid column name.
Suppose for example that you have retrieved a column named Column1 but then you try to retrieve the value of that field with
var data = dr["Colum1"]; // Missing the n in Column1.
This happens because the indexer operator is implemented trying to retrieve the index of a Colum1 field that doesn't exist. The GetOrdinal method will throw this exception when its internal helper code returns a -1 as the index of "Colum1".
Others
There is another (documented) case when this exception is thrown: if, in DataView, data column name being supplied to the DataViewSort property is not valid.
How to Avoid
In this example, let me assume, for simplicity, that arrays are always monodimensional and 0-based. If you want to be strict (or you're developing a library), you may need to replace 0 with GetLowerBound(0) and .Length with GetUpperBound(0) (of course if you have parameters of type System.Array, it doesn't apply for T[]). Please note that in this case, upper bound is inclusive then this code:
for (int i=0; i < array.Length; ++i) { }
Should be rewritten like this:
for (int i=array.GetLowerBound(0); i <= array.GetUpperBound(0); ++i) { }
Please note that this is not allowed (it'll throw InvalidCastException), that's why if your parameters are T[] you're safe about custom lower bound arrays:
void foo<T>(T[] array) { }
void test() {
// This will throw InvalidCastException, cannot convert Int32[] to Int32[*]
foo((int)Array.CreateInstance(typeof(int), new int[] { 1 }, new int[] { 1 }));
}
Validate Parameters
If index comes from a parameter you should always validate them (throwing appropriate ArgumentException or ArgumentOutOfRangeException). In the next example, wrong parameters may cause IndexOutOfRangeException, users of this function may expect this because they're passing an array but it's not always so obvious. I'd suggest to always validate parameters for public functions:
static void SetRange<T>(T[] array, int from, int length, Func<i, T> function)
{
if (from < 0 || from>= array.Length)
throw new ArgumentOutOfRangeException("from");
if (length < 0)
throw new ArgumentOutOfRangeException("length");
if (from + length > array.Length)
throw new ArgumentException("...");
for (int i=from; i < from + length; ++i)
array[i] = function(i);
}
If function is private you may simply replace if logic with Debug.Assert():
Debug.Assert(from >= 0 && from < array.Length);
Check Object State
Array index may not come directly from a parameter. It may be part of object state. In general is always a good practice to validate object state (by itself and with function parameters, if needed). You can use Debug.Assert(), throw a proper exception (more descriptive about the problem) or handle that like in this example:
class Table {
public int SelectedIndex { get; set; }
public Row[] Rows { get; set; }
public Row SelectedRow {
get {
if (Rows == null)
throw new InvalidOperationException("...");
// No or wrong selection, here we just return null for
// this case (it may be the reason we use this property
// instead of direct access)
if (SelectedIndex < 0 || SelectedIndex >= Rows.Length)
return null;
return Rows[SelectedIndex];
}
}
Validate Return Values
In one of previous examples we directly used Array.IndexOf() return value. If we know it may fail then it's better to handle that case:
int index = myArray[Array.IndexOf(myArray, "Debug");
if (index != -1) { } else { }
How to Debug
In my opinion, most of the questions, here on SO, about this error can be simply avoided. The time you spend to write a proper question (with a small working example and a small explanation) could easily much more than the time you'll need to debug your code. First of all, read this Eric Lippert's blog post about debugging of small programs, I won't repeat his words here but it's absolutely a must read.
You have source code, you have exception message with a stack trace. Go there, pick right line number and you'll see:
array[index] = newValue;
You found your error, check how index increases. Is it right? Check how array is allocated, is coherent with how index increases? Is it right according to your specifications? If you answer yes to all these questions, then you'll find good help here on StackOverflow but please first check for that by yourself. You'll save your own time!
A good start point is to always use assertions and to validate inputs. You may even want to use code contracts. When something went wrong and you can't figure out what happens with a quick look at your code then you have to resort to an old friend: debugger. Just run your application in debug inside Visual Studio (or your favorite IDE), you'll see exactly which line throws this exception, which array is involved and which index you're trying to use. Really, 99% of the times you'll solve it by yourself in a few minutes.
If this happens in production then you'd better to add assertions in incriminated code, probably we won't see in your code what you can't see by yourself (but you can always bet).
The VB.NET side of the story
Everything that we have said in the C# answer is valid for VB.NET with the obvious syntax differences but there is an important point to consider when you deal with VB.NET arrays.
In VB.NET, arrays are declared setting the maximum valid index value for the array. It is not the count of the elements that we want to store in the array.
' declares an array with space for 5 integer
' 4 is the maximum valid index starting from 0 to 4
Dim myArray(4) as Integer
So this loop will fill the array with 5 integers without causing any IndexOutOfRangeException
For i As Integer = 0 To 4
myArray(i) = i
Next
The VB.NET rule
This exception means that you're trying to access a collection item by index, using an invalid index. An index is invalid when it's lower than the collection's lower bound or greater than equal to the number of elements it contains. the maximum allowed index defined in the array declaration
Simple explanation about what a Index out of bound exception is:
Just think one train is there its compartments are D1,D2,D3.
One passenger came to enter the train and he have the ticket for D4.
now what will happen. the passenger want to enter a compartment that does not exist so obviously problem will arise.
Same scenario: whenever we try to access an array list, etc. we can only access the existing indexes in the array. array[0] and array[1] are existing. If we try to access array[3], it's not there actually, so an index out of bound exception will arise.
To easily understand the problem, imagine we wrote this code:
static void Main(string[] args)
{
string[] test = new string[3];
test[0]= "hello1";
test[1]= "hello2";
test[2]= "hello3";
for (int i = 0; i <= 3; i++)
{
Console.WriteLine(test[i].ToString());
}
}
Result will be:
hello1
hello2
hello3
Unhandled Exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
Size of array is 3 (indices 0, 1 and 2), but the for-loop loops 4 times (0, 1, 2 and 3). So when it tries to access outside the bounds with (3) it throws the exception.
A side from the very long complete accepted answer there is an important point to make about IndexOutOfRangeException compared with many other exception types, and that is:
Often there is complex program state that maybe difficult to have control over at a particular point in code e.g a DB connection goes down so data for an input cannot be retrieved etc... This kind of issue often results in an Exception of some kind that has to bubble up to a higher level because where it occurs has no way of dealing with it at that point.
IndexOutOfRangeException is generally different in that it in most cases it is pretty trivial to check for at the point where the exception is being raised. Generally this kind of exception get thrown by some code that could very easily deal with the issue at the place it is occurring - just by checking the actual length of the array. You don't want to 'fix' this by handling this exception higher up - but instead by ensuring its not thrown in the first instance - which in most cases is easy to do by checking the array length.
Another way of putting this is that other exceptions can arise due to genuine lack of control over input or program state BUT IndexOutOfRangeException more often than not is simply just pilot (programmer) error.
These two exceptions are common in various programming languages and as others said it's when you access an element with an index greater than the size of the array. For example:
var array = [1,2,3];
/* var lastElement = array[3] this will throw an exception, because indices
start from zero, length of the array is 3, but its last index is 2. */
The main reason behind this is compilers usually don't check this stuff, hence they will only express themselves at runtime.
Similar to this:
Why don't modern compilers catch attempts to make out-of-bounds access to arrays?

magento2 - How to get a product's stock status enabled/disabled?

I'm trying to get whether the product's stock status is instock/outofstock (Integers representing each state are fine. i don't necessarily need the "in stock"/"out of stock" strings per se).
I've tried various things to no avail.
1)
$inStock = $obj->get('Magento\CatalogInventory\Api\Data\StockItemInterface')->getisInStock()'
// Magento\CatalogInventory\Api\Data\StockItemInterface :: getisInStock returns true no matter what, even for 0qty products
// summary: not useful. How do you get the real one?
2)
$inStock = $obj->get('\Magento\CatalogInventory\Api\StockStateInterface')->verifyStock($_product->getId());
// test results for "verifyStock":
// a 0 qty product is in stock
// a 0 qty product is out of stock
// summary: fail. find correct method, with tests.
3)
$stockItemRepository = $obj->get('Magento\CatalogInventory\Model\Stock\StockItemRepository');
stockItem = $stockItemRepository->get($_product->getId());
$inStock = $stockItem->getIsInStock();
// Uncaught Magento\Framework\Exception\NoSuchEntityException: Stock Item with id "214"
// summmary: is stockitem not 1to1 with proudctid?
The weird thing is, getting stock quantities works just fine.
$availability = (String)$obj->get('\Magento\CatalogInventory\Api\StockStateInterface')->getStockQty($_product->getId(), $_product->getStore()->getWebsiteId());
So why isn't getIsInStock working?
This was one way I did it.
$stockItemResource = $obj->create('Magento\CatalogInventory\Model\ResourceModel\Stock\Item');
// grab ALL stock items (i.e. object that contains stock information)
$stockItemSelect = $stockItemResource->getConnection()->select()->from($stockItemResource->getMainTable());
$stockItems = $stockItemResource->getConnection()->fetchAll($stockItemSelect);
$inStock = null;
foreach($stockItems as $k => $item) {
if ($item['product_id'] == $_productId) {
$inStock = $item['is_in_stock'];
break; // not breaking properly. 'qz' still prints
}
}
Notes on efficiency:
I'm sure there are another ways to target the single item specifically, instead of getting all. Either through a method, or by adjusting the query passed in somehow.
But this method is probably more efficient for large n, avoiding the n+1 query problem.
You do still end up iterating through a lot, but perhaps theta(n) of iterating through a cached PHP variable is probably lower than n+1 querying the database. Haven't tested, just a hypothesis.
The returned structure is an array of arrays, where the sub-array (which also happens to be a stock item) has the product ID and the stock status value. And because the product ID and the stock status value is on the same level of nesting, we have no choice but to iterate through each sub-array to check the product_id, choose that sub-array, and grab the stock value. In short, we can't just utilize the hashmap, since the keys of the sub-array are not product IDs.
Ultimately, the efficiency of this depends on your use case. Rarely will you grab all stock items, unless doing mass exports. So the ultimate goal is to really just stay within the configured time limit is allowed for a request to persist.

MongoDB Query advice for weighted randomized aggregation

By far I have encountered ways for selecting random documents but my problem is a bit more of a pickle.So here goes
I have a collection which contains say a 1000+ documents (products)
say each document has a more or less generic format of .Say for simplicity it is
{"_id":{},"name":"Product1","groupid":5}
The groupid is a number say between 1 to 20 denoting the product belongs to that group.
Now if my query input is something like an array of {groupid->weight} for eg {[{"2":4},{"7":6}]} and say another parameter n(=10 say) Then I need to be able to pick 4 random documents that belong to groupid 2 and 6 random documents that belong to groupid 7.
The only solution i can think of is to run 'm' subqueries where m is the array length in the query input.
How do I accomplish this an efficient manner in MongoDB using probably a Mapreduce.
Picking up n random documents for each group.
Group the records by the groupid field. Emit the groupid as key
and the record as value.
For each group pick n random documents from the values array.
Let,
var parameter = {"5":1,"6":2}; //groupid->weight, keep it as an Object.
be the input to the map reduce functions.
The map function, emit only those group ids which we have provided as the parameter.
var map = function map(){
if(parameter.hasOwnProperty(this.groupid)){
emit(this.groupid,this);
}
}
The reduce function, for each group, get random records based on the parameter object in scope.
var reduce = function(key,values){
var length = values.length;
var docs = [];
var added = [];
var i= 1;
while(i<=parameter[key]){
var index = Math.floor(Math.random()*length);
if(added.indexOf(index) == -1){
docs.push(values[index]);
added.push(index);
i++;
}
else{
i--;
}
}
return {result:docs};
}
Invoking map reduce on the collection, by passing the parameter object in scope.
db.collection.mapReduce(map,
reduce,
{out: "sam",
scope:{"parameter":{"5":1,"6":2,"n":10}}})
To get the dumped output:
db.sam.find({},{"_id":0,"value.result":1}).pretty()
When you bring the parameter n into picture, you need to specify the number of documents for each group as a ratio, or else that parameter is not necessary at all.

Number of items in the aggregation with MongoDB 2.6

My query looks like that:
var x = db.collection.aggregate(...);
I want to know the number of items in the result set. The documentation says that this function returns a cursor. However it contains far less methods/fields than when using db.collection.find().
for (var k in x) print(k);
Produces
_firstBatch
_cursor
hasNext
next
objsLeftInBatch
help
toArray
forEach
map
itcount
shellPrint
pretty
No count() method! Why is this cursor different from the one returned by find()? itcount() returns some type of count, but the documentation says "for testing only".
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count, like that:
var cnt = x.hasNext() ? x.next().cnt : 0;
Is there a more straight forward way to get this count? As in db.collection.find(...).count()?
Barno's answer is correct to point out that itcount() is a perfectly good method for counting the number of results of the aggregation. I just wanted to make a few more points and clear up some other points of confusion:
No count() method! Why is this cursor different from the one returned by find()?
The trick with the count() method is that it counts the number of results of find() on the server side. itcount(), as you can see in the code, iterates over the cursor, retrieving the results from the server, and counts them. The "it" is for "iterate". There's currently (as of MongoDB 2.6), no way to just get the count of results from an aggregation pipeline without returning the cursor of results.
Using a group stage in my aggregation ({$group:{_id:null,cnt:{$sum:1}}}), I can get the count
Yes. This is a reasonable way to get the count of results and should be more performant than itcount() since it does the work on the server and does not need to send the results to the client. If the point of the aggregation within your application is just to produce the number of results, I would suggest using the $group stage to get the count. In the shell and for testing purposes, itcount() works fine.
Where have you read that itcount() is "for testing only"?
If in the mongo shell I do
var p = db.collection.aggregate(...);
printjson(p.help)
I receive
function () {
// This is the same as the "Cursor Methods" section of DBQuery.help().
print("\nCursor methods");
print("\t.toArray() - iterates through docs and returns an array of the results")
print("\t.forEach( func )")
print("\t.map( func )")
print("\t.hasNext()")
print("\t.next()")
print("\t.objsLeftInBatch() - returns count of docs left in current batch (when exhausted, a new getMore will be issued)")
print("\t.itcount() - iterates through documents and counts them")
print("\t.pretty() - pretty print each document, possibly over multiple lines")
}
If I do
printjson(p)
I find that
"itcount" : function (){
var num = 0;
while ( this.hasNext() ){
num++;
this.next();
}
return num;
}
This function
while ( this.hasNext() ){
num++;
this.next();
}
It is very similar var cnt = x.hasNext() ? x.next().cnt : 0; And this while is perfect for count...

Random Sampling from Mongo

I have a mongo collection with documents. There is one field in every document which is 0 OR 1. I need to random sample 1000 records from the database and count the number of documents who have that field as 1. I need to do this sampling 1000 times. How do i do it ?
For people coming to the answer, you should now use the new $sample aggregation function, new in 3.2.
https://docs.mongodb.org/manual/reference/operator/aggregation/sample/
db.collection_of_things.aggregate(
[ { $sample: { size: 15 } } ]
)
Then add another step to count up the 0s and 1s using $group to get the count. Here is an example from the MongoDB docs.
For MongoDB 3.0 and before, I use an old trick from SQL days (which I think Wikipedia use for their random page feature). I store a random number between 0 and 1 in every object I need to randomize, let's call that field "r". You then add an index on "r".
db.coll.ensureIndex(r: 1);
Now to get random x objects, you use:
var startVal = Math.random();
db.coll.find({r: {$gt: startVal}}).sort({r: 1}).limit(x);
This gives you random objects in a single find query. Depending on your needs, this may be overkill, but if you are going to be doing lots of sampling over time, this is a very efficient way without putting load on your backend.
Here's an example in the mongo shell .. assuming a collection of collname, and a value of interest in thefield:
var total = db.collname.count();
var count = 0;
var numSamples = 1000;
for (i = 0; i < numSamples; i++) {
var random = Math.floor(Math.random()*total);
var doc = db.collname.find().skip(random).limit(1).next();
if (doc.thefield) {
count += (doc.thefield == 1);
}
}
I was gonna edit my comment on #Stennies answer with this but you could also use a seprate auto incrementing ID index here as an alternative if you were to skip over HUGE amounts of record (talking huge here).
I wrote another answer to another question a lot like this one where some one was trying to find nth record of the collection:
php mongodb find nth entry in collection
The second half of my answer basically describes one potential method by which you could approach this problem. You would still need to loop 1000 times to get the random row of course.
If you are using mongoengine, you can use a SequenceField to generate an incremental counter.
class User(db.DynamicDocument):
counter = db.SequenceField(collection_name="user.counters")
Then to fetch a random list of say 100, do the following
def get_random_users(number_requested):
users_to_fetch = random.sample(range(1, User.objects.count() + 1), min(number_requested, User.objects.count()))
return User.objects(counter__in=users_to_fetch)
where you would call
get_random_users(100)