$pull operation in MongoDB not working for me - mongodb

I have a document with the following key-array pair:
"home" : [
"Kevin Garnett",
"Paul Pierce",
"Rajon Rondo",
"Brandon Bass",
" 5 sec inbound",
"Kevin Seraphin"
]
I want to remove the element " 5 sec inbound" from the array and use the following command (in the MongoDB shell):
>coll.update({},{"$pull":{"home":" 5 sec inbound"}})
This is not working as verified by a query:
>coll.findOne({"home":/5 sec inbound/})
"home" : [
"Kevin Garnett",
"Paul Pierce",
"Rajon Rondo",
"Brandon Bass",
" 5 sec inbound",
"Kevin Seraphin"
]
Any help would be greatly appreciated!

That very same statement works for me:
> db.test.insert({"home" : [
... "Kevin Garnett",
... "Paul Pierce",
... "Rajon Rondo",
... "Brandon Bass",
... " 5 sec inbound",
... "Kevin Seraphin"
... ]})
> db.test.find({"home":/5 sec inbound/}).count()
1
> db.test.update({},{"$pull":{"home":" 5 sec inbound"}})
> db.test.find({"home":/5 sec inbound/}).count()
0

Related

How to loop in NetLogo?

I have the following problem. I need to loop through the code. However, it doesn't work.
Contextualizing the problem: I have 3 files (in .asc) that represent data referring to 3 ages of the turtles (age 2, age 4 and age 8). I would like if the user puts the value 1 in num-user, it would only do the simulation with only one file (L_2) that would represent [ "2" ] ; if the user puts the value 2 in num-user, he would do the simulation with the files (L_2 and L_4) that would represent [ "2" "4" ] and finally, if the user puts the value 3 in num-use, it would simulate the files (L_2 , L_4 and L_8) that would represent [ "2" "4" "8" ]. The problem is that the loop is not working and gives various errors. Like:
Extension exception: ascii file ./L_8.asc not found or Can't find element 3 of the list [2 4 8], which is only of length 3 or go runs more than 3 simulations.
I was unable to attach the .ascii files here in the question. But if anyone can look at the code and identify the error I would be very grateful. I can't use BehaviouSpace to solve the situation, I need this loop in the code.
Thanks in advance!
extensions [ gis ]
globals [ num turtle-ages num-ages files random-seeds num-user repetitions ]
to setup
ca
set random-seeds 1
random-seed random-seeds
set num 0
set turtle-ages [ "2" "4" "8" ]
set num-ages item num turtle-ages
setup-asc
setup-turtles
reset-ticks
end
to setup-2
clear
random-seed random-seeds
set num-ages item num turtle-ages
setup-asc
setup-turtles
reset-ticks
end
to setup-turtles
ask n-of 5 patches [ sprout 1 ]
end
to clear
set files 0
clear-ticks
clear-turtles
end
to setup-asc
let number1 num-ages
set files gis:load-dataset ( word "./L_" number1 ".asc" ) ;; this loads a one raster file. There are 3 files in the folder with the names: (L_2.asc ; L_4.asc and L_8.asc
end
to go
move
tick
let n count turtles
if n = 0 or ticks = 10
[
set random-seeds random-seeds + 1
set repetitions 1
if random-seeds = repetitions + 1
[
set random-seeds 1
set num num + 1
;; if the user puts the value 1 in num-user, it would only do the simulation with only one file (L_2) that would represent [ "2" ]
;;; if the user puts the value 2 in num-user, he would do the simulation with the files (L_2 and L_4) that would represent [ "2" "4" ]
;;;; and finally, if the user puts the value 3 in num-use, it would simulate the files (L_2 , L_4 and L_8) that would represent [ "2" "4" "8" ]
set num-user 1 ;;
if num = num-user [ stop ]
]
setup-2
]
end
to move
ask turtles [
right random 360
fd 1
if ticks = 5 [ die ]
]
end
Whenever possible, I'd suggest paring down your code to a MRE to a) make sure that users here can run your code (without your files, for example, this is not really viable) and b) to see if reframing your question / goals in simpler terms would help get things working- at least that's what works for me!
I think that you might find foreach useful here as a way to loop through your desired simulations instead of manually tracking the number of iterations. For this example, assuming num-user is a numeric input widget on the interface, the setup below will determine which of the ages to process:
globals [ ages-to-run ]
to base-setup
ca
; Determine how many simulations to run, based on user input into
; 'num-user' numerical input widget on the interface
ifelse num-user < 1 or num-user > 3 [
print "Incorrect number of simulations indicated"
] [
let possible-sims [ "2" "4" "8" ]
set ages-to-run sublist possible-sims 0 num-user
]
reset-ticks
end
After running the above, the ages-to-run variable will contain ["2"], ["2" "4"], or ["2" "4" "8"]. Next, you can iterate over those desired ages to run your simulations (a little more detail in comments):
to run-simulations
if ages-to-run = 0 [
print "Simulation setup not complete"
]
foreach ages-to-run [
; This is the loop proper where each "age" is iterated.
; All of your simulation calls (manual variable resetting,
; etc) should all go within this loop.
current-age ->
print word "Running simulations for age: " current-age
let file-to-load ( word "./L_" current-age ".asc" )
print file-to-load
clear-turtles
ask n-of 5 patches [
sprout 1 [
pd
set color runresult current-age + 55
]
]
repeat 20 [
ask turtles [
rt random 60 - 30
fd 1
]
tick
]
print ( word "Simulations complete for age: " current-age "\n" )
]
end
Running that code above with 3 entered into num-user will run a simulation for each age (different colors indicate different runs):
So to run simulations proper, all your per-age code should go within that foreach loop indicated above- and be careful, as you were in your question, to not reset global variables.

MongoDB query to compute percentage

I am new to MongoDB and kind of stuck at this query. Any help/guidance will be highly appreciated. I am not able to calculate the percentage in the desired way. There is something wrong with my pipeline where prerequisites of percentage are not computed correctly. Following I provide my unsuccessful attempt along with the desired output.
Single entry in the collection looks like below:
_id : ObjectId("602fb382f060fff5419fd0d1")
time : "2019/05/02 00:00:00"
station_id : 3544
station_name : "Underhill Ave &; Pacific St"
station_status : "In Service"
latitude : 40.6804836
longitude : -73.9646795
zipcode : 11238
borough : "Brooklyn"
neighbourhood : "Prospect Heights"
available_bikes : 5
available_docks : 21
The query I am trying to solve is:
Given a station_id (e.g., 522) and a num_hours (e.g., 3) passed as parameters:
- Consider only the measurements where the station_status = “In Service”.
- Consider only the measurements for that concrete
“station_id”.
- Compute the percentage of measurements with
available_bikes = 0 for each hour of the day (e.g., for the period
[8am, 9am) the percentage is 15.06% and for the period [9am, 10am)
the percentage is
27.32%).
- Sort the percentage results in decreasing order.
- Return the top “num_hours” documents.
The desired output is:
--- DOCUMENT 0 INFO ---
---------------------------------
hour : 19
percentage : 65.37
total_measurements : 283
zero_bikes_measurements : 185
---------------------------------
--- DOCUMENT 1 INFO ---
---------------------------------
hour : 21
percentage : 64.79
total_measurements : 284
zero_bikes_measurements : 184
---------------------------------
--- DOCUMENT 2 INFO ---
---------------------------------
hour : 00
percentage : 63.73
total_measurements : 284
zero_bikes_measurements : 181
My attempt is:
command_1 = {"$match": {"station_status": "In Service", "station_id": station_id, "available_bikes": 0}}
my_query.append(command_1)
command_2 = {"$group": {"_id": "null", "total_measurements": {"$sum": 1}}}
my_query.append(command_2)
command_3 = {"$project": {"_id": 0,
"station_id": 1,
"station_status": 1,
"hour": {"$substr": ["$time", 11, 2]},
"available_bikes": 1,
"total_measurements": {"$sum": 1}
}
}
my_query.append(command_3)
command_4 = {"$group": {"_id": "$hour", "zero_bikes_measurements": {"$sum": 1}}}
my_query.append(command_4)
command_5 = {"$project": {"percent": {
"$multiply": [{"$divide": ["$total_measurements", "$zero_bikes_measurements"]},
100]}}}
my_query.append(command_5)
I've taken a look at this and I'm going to offer some sincere advice:
Don't try and do this in an aggregate query. Just go back to basics and pull the numbers out using find()s and then calculate the numbers in python.
If you want to persist with an aggregate query, I will say that your match command filters on available_bikes equal to zero. You never have the total number of measurements, so you can never find the percentage. Also when you have done your first $group, your "lose" your projection so at that point in the pipeline you only have total_measurements and that's it (comment out the commands 3 to 5 to see what I mean).

Import Amazon txt dataset into Neo4j

I am fairly new to Neo4j and until now just loaded some csv files.
Now I am trying to load the Amazon "Product co-purchasing network" dataset from:
https://snap.stanford.edu/data/#amazon
More precisely this one:
https://snap.stanford.edu/data/amazon-meta.html
I am wondering what is the correct way to load this file?
The file is a simple text file, where products are separated by an empty new line. So it looks like this:
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X
categories: 2
|Books[283155]|Subjects[1000]|Religion &
Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
reviews: total: 2 downloaded: 2 avg rating: 5
2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9
2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5
Id: 2
ASIN: 0738700797
...
So far I tried to load it like a normal txt file
LOAD CSV FROM "file:///data/amazon-meta.txt" AS line
Return line
Skip 2
limit 10
The code returns the data in the following format:
["Id: 1"]
["ASIN: 0827229534"]
[" title: Patterns of Preaching: A Sermon Sampler"]
[" group: Book"]
[" salesrank: 396585"]
[" similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X"]
[" categories: 2"]
[" |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]"]
[" |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]"]
[" reviews: total: 2 downloaded: 2 avg rating: 5"]
[" 2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9"]
[" 2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5"]
["Id: 2"]
["ASIN: 0738700797"]
So my next thought was to just merge all the lines from "id" to the next "id" together, but I am not sure if this is even possible or a good solution, since it seems quite complicated.
What would be a good way to load this dataset?

MongoDB benchmarking inserts

I am trying to benchmark MongoDB with the JS harness. I am trying to do inserts. The example given in mongo website.
However, I am trying an insert operation, which works totally fine, but gives out wrong queries/sec.
ops = [{op: "insert", ns: "benchmark.bench", safe: false, doc: {"a": 1}}]
The above works fine. Then, I have run the following in mongo shell:
for ( x = 1; x<=128; x*=2){
res = benchRun( { parallel : x ,
seconds : 5 ,
ops : ops
} )
print( "threads: " + x + "\t queries/sec: " + res.query )
}
It gives out:
threads: 1 queries/sec: 0
threads: 2 queries/sec: 0
threads: 4 queries/sec: 0
threads: 8 queries/sec: 0
threads: 16 queries/sec: 0
threads: 32 queries/sec: 1.4
threads: 64 queries/sec: 0
threads: 128 queries/sec: 0
I dont understand why the queries/sec is 0 and not a single doc has been inserted. Is this right was testing performance for inserts?
Answering because I just encountered a similar problem.
Try replacing your print statement with printjson(res).
You will see that res has the following fields:
{
"note" : "values per second",
"errCount" : NumberLong(0),
"trapped" : "error: not implemented",
"insertLatencyAverageMicros" : 8.173300153139357,
"totalOps" : NumberLong(130600),
"totalOps/s" : 25366.173139864142,
"findOne" : 0,
"insert" : 25366.173139864142,
"delete" : 0,
"update" : 0,
"query" : 0,
"command" : 0
}
As you can see, the query count is 0, hence when you print res.query it gives 0. To get the number of insert operations per second you would want to print res.insert. I believe res.query corresponds to the "find" operation.

Mongo staled with too much insertions

I trying to use mongodb to run a multiagent simulation.
I have one mongo instance in the same server that runs the simulation program, but when I have too much agents (~100.000 in 10 simulation steps) the mongodb becomes stalled during seconds.
The code for insert data in mongo is similar to:
if( mongo_client( &m_conn , m_dbhost.c_str(), m_dbport ) != MONGO_OK ) {
cout << "failed to connect '" << m_dbhost << ":" << m_dbport << "'\n";
cout << " mongo error: " << m_conn.err << endl;
return;
}
bson_init( &b );
bson_append_new_oid( &b, "_id" ) != BSON_OK );
bson_append_double( &b, "time", time );
bson_append_double( &b, "x", posx );
bson_append_double( &b, "y", posy );
bson_finish( &b );
if( mongo_insert( &m_conn , ns.c_str() , &b, NULL ) != MONGO_OK ){
cout << "failed to insert in mongo\n";
}
bson_destroy( &b );
mongo_disconnect( &m_conn );
Also, during the simulation, If I try to access using the mongo-shell, I also get errors:
$ mongo
MongoDB shell version: 2.4.1
connecting to: test
Wed Apr 3 10:10:24.870 JavaScript execution failed: Error: couldn't connect to server 127.0.0.1:27017 at src/mongo/shell/mongo.js:L112
exception: connect failed
After the simulation is ended, the mongo shell gets responsive again, and I can check that there is data in the database but it is discontinued. In the example, the agent m0n999 saved only 6 of 10 steps:
> show dbs
dB0B7F527F0FA45518712C8CB27611BD7 5.951171875GB
local 0.078125GB
> db.ins.m0n999.find()
{ "_id" : ObjectId("515bdf564c60ec1e000003e7"), "time" : 1, "x" : 1.1, "y" : 8.1 }
{ "_id" : ObjectId("515be0214c60ec1e0001075f"), "time" : 2, "x" : 1.2000000000000002, "y" : 8.2 }
{ "_id" : ObjectId("515be1c04c60ec1e0002da3a"), "time" : 4, "x" : 1.4000000000000004, "y" : 8.399999999999999 }
{ "_id" : ObjectId("515be2934c60ec1e0003b82c"), "time" : 5, "x" : 1.5000000000000004, "y" : 8.499999999999998 }
{ "_id" : ObjectId("515be3664c60ec1e000497cf"), "time" : 6, "x" : 1.6000000000000005, "y" : 8.599999999999998 }
{ "_id" : ObjectId("515be6cc4c60ec1e000824b2"), "time" : 10, "x" : 2.000000000000001, "y" : 8.999999999999996 }
>
How can solve this problem? How can avoid the lost of connections and recover from mongo stalls?
UPDATE
I'm getting in the global log errors like:
"Wed Apr 3 11:53:00.379 [conn1378573] error: hashtable namespace index max chain reached:1335",
"Wed Apr 3 11:53:00.379 [conn1378573] error: hashtable namespace index max chain reached:1335",
"Wed Apr 3 11:53:00.379 [conn1378573] error: hashtable namespace index max chain reached:1335",
"Wed Apr 3 11:53:00.379 [conn1378573] error: hashtable namespace index max chain reached:1335",
"Wed Apr 3 11:53:00.379 [conn1378573] end connection 127.0.0.1:40748 (1 connection now open)",
I solved the problem, it has two errors:
I was creating too much collections. I changed from one collection per Agent, to only on collection per simulation proccess.
I was creating too much connections. I changed from one connection per Agent iteration, to only one connection per simulation step.