Mongodb doesn't retrieve all documents in a collection with 2 million records using cursor - mongodb

I have a collections of 2,000,000 records
> db.events.count(); │
2000000
and I use golang mongodb client to connect to the database
package main
import (
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://localhost:27888").SetAuth(options.Credential{
Username: "mongoadmin",
Password: "secret",
}))
if err != nil {
panic(err)
}
defer func() {
if err = client.Disconnect(ctx); err != nil {
panic(err)
}
}()
collection := client.Database("test").Collection("events")
var bs int32 = 10000
var b = true
cur, err := collection.Find(context.Background(), bson.D{}, &options.FindOptions{
BatchSize: &bs, NoCursorTimeout: &b})
if err != nil {
log.Fatal(err)
}
defer cur.Close(ctx)
s, n := runningtime("retrive db from mongo and publish to kafka")
count := 0
for cur.Next(ctx) {
var result bson.M
err := cur.Decode(&result)
if err != nil {
log.Fatal(err)
}
bytes, err := json.Marshal(result)
if err != nil {
log.Fatal(err)
}
count++
msg := &sarama.ProducerMessage{
Topic: "hello",
// Key: sarama.StringEncoder("aKey"),
Value: sarama.ByteEncoder(bytes),
}
asyncProducer.Input() <- msg
}
But the the program only retrives only about 600,000 records instead of 2,000,000 every times I ran the program.
$ go run main.go
done
count = 605426
nErrors = 0
2020/09/18 11:23:43 End: retrive db from mongo and publish to kafka took 10.080603336s
I don't know why? I want to retrives all 2,000,000 records. Thanks for any help.

Your loop fetching the results may end early because you are using the same ctx context for iterating over the results which has a 10 seconds timeout.
Which means if retrieving and processing the 2 million records (including connecting) takes more than 10 seconds, the context will be cancelled and thus the cursor will also report an error.
Note that setting FindOptions.NoCursorTimeout to true is only to prevent cursor timeout for inactivity, it does not override the used context's timeout.
Use another context for executing the query and iterating over the results, one that does not have a timeout, e.g. context.Background().
Also note that for constructing the options for find, use the helper methods, so it may look as simple and as elegant as this:
options.Find().SetBatchSize(10000).SetNoCursorTimeout(true)
So the working code:
ctx2 := context.Background()
cur, err := collection.Find(ctx2, bson.D{},
options.Find().SetBatchSize(10000).SetNoCursorTimeout(true))
// ...
for cur.Next(ctx2) {
// ...
}
// Also check error after the loop:
if err := cur.Err(); err != nil {
log.Printf("Iterating over results failed: %v", err)
}

Related

MongoDB | Go using a pipeline to listen to updates on a document by id

I'm trying to make a function that watches the database for a certain document with a certain id to update but it does not work. It just stays alive while updating the document while the function should return. I've tried multiple things and the rest of the code works fine. When i remove the id part and listen for all document updates in that collection the function does as it should
func iterateChangeStream(routineCtx context.Context,stream *mongo.ChangeStream, chn chan string) {
defer stream.Close(routineCtx)
for stream.Next(routineCtx) {
var data bson.M
if err := stream.Decode(&data); err != nil {
fmt.Println(err)
}
chn <- "updated"
err := stream.Close(routineCtx)
if err != nil {
return
}
return
}
return
}
func (s Storage) ListenForScannerUpdateById(id primitive.ObjectID) {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
chn := make(chan string)
coll := s.db.Collection("scanners")
scan, err := s.GetScannerById(id)
fmt.Println(scan)
matchPipeline := bson.D{
{
"$match", bson.D{
{"operationType", "update"},
{"fullDocument._id", bson.D{
{"$eq", id},
}},
},
},
}
scannerStream, err := coll.Watch(ctx, mongo.Pipeline{matchPipeline})
if err != nil {
err := scannerStream.Close(ctx)
if err != nil {
panic( err)
}
fmt.Printf("err: %v", err)
}
routineCtx, _ := context.WithCancel(context.Background())
go iterateChangeStream(routineCtx, scannerStream, chn)
msg, _ := <- chn
defer close(chn)
fmt.Println(msg)
return
}
Ok, so after reading the documentation for a seccond time i found this:
For update operations, this field only appears if you configured the change stream with fullDocument set to updateLookup. This field then represents the most current majority-committed version of the document modified by the update operation. This document may differ from the changes described in updateDescription if other majority-committed operations modified the document between the original update operation and the full document lookup.
so after setting the fullDocument option to updateLookup like this it works perfect:
scannerStream, err := coll.Watch(ctx, mongo.Pipeline{matchPipeline}, options.ChangeStream().SetFullDocument(options.UpdateLookup))

Why I get an error "client disconnected" when trying to get documents from mongo collection in go?

I have mongo capped collection and a simple API, written on Go. I built and run it. When I try to sent Get request or simply go localhost:8000/logger in browser - my process closes. Debug shows this happens, while executing "find" in collection. It produces error "client is disconnected". Collection has 1 document, and debug shows it is connected with my helper.
Go version 1.13
My code:
func main() {
r := mux.NewRouter()
r.HandleFunc("/logger", getDocs).Methods("GET")
r.HandleFunc("/logger", createDoc).Methods("POST")
log.Fatal(http.ListenAndServe("localhost:8000", r))
}
func getDocs(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
var docs []models.Logger
//Connection mongoDB with helper class
collection := helper.ConnectDB()
cur, err := collection.Find(context.TODO(), bson.M{})
if err != nil {
helper.GetError(err, w)
return
}
defer cur.Close(context.TODO())
for cur.Next(context.TODO()) {
var doc models.Logger
err := cur.Decode(&doc)
if err != nil {
log.Fatal(err)
}
docs = append(docs, doc)
}
if err := cur.Err(); err != nil {
log.Fatal(err)
}
json.NewEncoder(w).Encode(docs)
}
func ConnectDB() *mongo.Collection {
client, err := mongo.NewClient(options.Client().ApplyURI("mongodb://127.0.0.1:27017"))
if err != nil {
log.Fatal(err)
}
fmt.Println("Connected to MongoDB!")
logCollection := client.Database("local").Collection("loggerCollection")
return logCollection
}
According to the documentation, the call to mongo.NewClient doesn't ensure that you can connect the Mongo server. You should first call mongo.Client.Ping() to verify if you can connect to the database or not.
client, err := mongo.NewClient(options.Client().ApplyURI("mongodb://127.0.0.1:27017"))
if err != nil {
log.Fatal(err)
}
if err := client.Ping(context.TODO(), readpref.Primary()); err != nil {
// Can't connect to Mongo server
log.Fatal(err)
}
There could be several reasons behind failing to connect, the most obvious one is incorrect setup of ports. Is your mongodb server up and listening on port 27017? Is there any change you're running mongodb with Docker and it's not forwarding to the correct port?
I faced similar issue , read #Jay answer it definitely helped , as I checked my MongoDB was running using "MongoDB Compass" , then I changed the location of my insert statement , previously I was calling before the call of "context.WithTimeout". Below is working code.
package main
import (
"context"
"log"
"time"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
type Book struct {
Name string `json:"name,omitempty"`
PublisherID string `json:"publisherid,omitempty"`
Cost string `json:"cost,omitempty"`
StartTime string `json:"starttime,omitempty"`
EndTime string `json:"endtime,omitempty"`
}
func main() {
client, err := mongo.NewClient(options.Client().ApplyURI("mongodb://localhost:27017"))
if err != nil {
log.Fatal(err)
}
ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
err = client.Connect(ctx)
if err != nil {
log.Fatal(err)
}
defer client.Disconnect(ctx)
testCollection := client.Database("BooksCollection").Collection("BooksRead")
inserRes, err := testCollection.InsertOne(context.TODO(), Book{Name: "Harry Potter", PublisherID: "IBN123", Cost: "1232", StartTime: "2013-10-01T01:11:18.965Z", EndTime: "2013-10-01T01:11:18.965Z"})
log.Println("InsertResponse : ", inserRes)
log.Println("Error : ", err)
}
I can see document inserted in console as well as in "MongoDB Comapass."
In heiper function "ConnectDB" after "NewClient" I must use "client.Connect(context.TODO())"
before any other use of client

mongodb getting 10_000 rows at a time

I'm trying to get 10000 document at a time in mongodb, but i got :
Information :
Driver https://github.com/mongodb/mongo-go-driver
opt.SetBatchSize(15_000)
opt.SetAllowPartialResults(false)
index on timestamp
Code :
package main
import (
"context"
"fmt"
"net/http"
"os"
"time"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
var database *mongo.Database
func main() {
ctx, _ := context.WithTimeout(context.Background(), 10*time.Second)
client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://20.20.20.43:27017"))
if err != nil {
panic(err)
}
database = client.Database("chat_data")
chatText := make([]chat, 0)
now := time.Now().Unix()
ctx, _ = context.WithTimeout(context.Background(), 30*time.Second)
// mongodb batch option
opt := options.Find()
opt.SetBatchSize(15_000)
opt.SetAllowPartialResults(false)
// mongodb filter
filter := bson.M{"timestamp": bson.M{"$gte": now - 108000}}
cur, err := database.Collection("chat").Find(ctx, filter, opt)
if err != nil {
// fmt.Fprint(w, err)
fmt.Println(err)
return
}
defer cur.Close(ctx)
for cur.Next(ctx) {
var result chat
err := cur.Decode(&result)
if err != nil {
fmt.Println(err)
continue
}
// do something with result....
// fmt.Println(result)
chatText = append(chatText, result)
}
if err := cur.Err(); err != nil {
// fmt.Fprint(w, cur.Err())
fmt.Println(err)
return
}
fmt.Println("done")
fmt.Println(len(chatText))
}
can i achieve this with mongodb & go driver ?, 30 second timeout are always reached
Edit 1
i try in python (with pymongo) it's only need 0m2.159s to query 36k doc with that filter
Try 7000, if it works try 12000, if it doesn't work try 4000, etc.
Make note of how long these requests take to figure out if the execution time is proportional to batch size.
You are querying on just the timestamp field. If you create an index on that collection with the timestamp field first, you should get faster results, and get a free sort in the process.

Find all documents in a collection with mongo go driver

I checked out the answer here but this uses the old and unmaintained mgo. How can I find all documents in a collection using the mongo-go-driver?
I tried passing a nil filter, but this does not return any documents and instead returns nil. I also checked the documentation but did not see any mention of returning all documents. Here is what I've tried with aforementioned result.
client, err := mongo.Connect(context.TODO(), "mongodb://localhost:27017")
coll := client.Database("test").Collection("albums")
if err != nil { fmt.Println(err) }
// we can assume we're connected...right?
fmt.Println("connected to mongodb")
var results []*Album
findOptions := options.Find()
cursor, err := coll.Find(context.TODO(), nil, findOptions)
if err != nil {
fmt.Println(err) // prints 'document is nil'
}
Also, I'm about confused about why I need to specify findOptions when I've called the Find() function on the collection (or do I not need to specify?).
Here is what I came up with using the official MongoDB driver for golang. I am using godotenv (https://github.com/joho/godotenv) to pass the database parameters.
//Find multiple documents
func FindRecords() {
err := godotenv.Load()
if err != nil {
fmt.Println(err)
}
//Get database settings from env file
//dbUser := os.Getenv("db_username")
//dbPass := os.Getenv("db_pass")
dbName := os.Getenv("db_name")
docCollection := "retailMembers"
dbHost := os.Getenv("db_host")
dbPort := os.Getenv("db_port")
dbEngine := os.Getenv("db_type")
//set client options
clientOptions := options.Client().ApplyURI("mongodb://" + dbHost + ":" + dbPort)
//connect to MongoDB
client, err := mongo.Connect(context.TODO(), clientOptions)
if err != nil {
log.Fatal(err)
}
//check the connection
err = client.Ping(context.TODO(), nil)
if err != nil {
log.Fatal(err)
}
fmt.Println("Connected to " + dbEngine)
db := client.Database(dbName).Collection(docCollection)
//find records
//pass these options to the Find method
findOptions := options.Find()
//Set the limit of the number of record to find
findOptions.SetLimit(5)
//Define an array in which you can store the decoded documents
var results []Member
//Passing the bson.D{{}} as the filter matches documents in the collection
cur, err := db.Find(context.TODO(), bson.D{{}}, findOptions)
if err !=nil {
log.Fatal(err)
}
//Finding multiple documents returns a cursor
//Iterate through the cursor allows us to decode documents one at a time
for cur.Next(context.TODO()) {
//Create a value into which the single document can be decoded
var elem Member
err := cur.Decode(&elem)
if err != nil {
log.Fatal(err)
}
results =append(results, elem)
}
if err := cur.Err(); err != nil {
log.Fatal(err)
}
//Close the cursor once finished
cur.Close(context.TODO())
fmt.Printf("Found multiple documents: %+v\n", results)
}
Try passing an empty bson.D instead of nil:
cursor, err := coll.Find(context.TODO(), bson.D{})
Also, FindOptions is optional.
Disclaimer: I've never used the official driver, but there are a few examples at https://godoc.org/go.mongodb.org/mongo-driver/mongo
Seems like their tutorial is outdated :/

bulkwrite does not support multi-document transaction by using mongo-go-driver

I's using mongo-go-driver 0.0.18 to build a bulk write which is consisted of a "NewUpdateManyModel" and several "NewInsertOneModel". My mongo server is atlas M10 with replica sets. I built some goroutines to test if the transactions are atomic, the result shows that each bulk write is not atomic, they would interfered with each other. I am wondering if mongo-go-driver supports for multi-document transaction?
func insertUpdateQuery(counter int, col *mongo.Collection, group *sync.WaitGroup){
var operations []mongo.WriteModel
var items = []item{}
items=append(items,item{"Name":strconv.Itoa(counter),"Description":"latest one"})
for _,v := range items{
operations = append(operations, mongo.NewInsertOneModel().Document(v))
}
updateOperation := mongo.NewUpdateManyModel()
updateOperation.Filter(bson.D{
{"Name", bson.D{
{"$ne", strconv.Itoa(counter)},
}},
})
updateOperation.Update(bson.D{
{"$set", bson.D{
{"Description", strconv.Itoa(counter)},
}},
},)
operations = append(operations,updateOperation)
bulkOps:=options.BulkWrite()
result, err := col.BulkWrite(
context.Background(),
operations,
bulkOps,
)
if err != nil{
fmt.Println("err:",err)
}else{
fmt.Printf("IU: %+v \n",result)
}
group.Done()
}
func retrieveQuery(group *sync.WaitGroup, col *mongo.Collection){
var results []item
qctx:=context.Background()
qctx, c := context.WithTimeout(qctx, 10*time.Second)
defer c()
cur, err := col.Find(qctx, nil)
if err != nil {
log.Fatal(err)
}
defer cur.Close(context.Background())
res := item{}
for cur.Next(context.Background()) {
err := cur.Decode(&res)
if err != nil {
log.Println(err)
}else {
results=append(results,res)
}
}
if err := cur.Err(); err != nil {
log.Println(err)
}
fmt.Println("res:",results)
group.Done()
}
func main() {
ctx := context.Background()
ctx, cancel := context.WithTimeout(ctx,10*time.Second)
defer cancel()
uri := "..."
client, err := mongo.NewClient(uri)
if err != nil {
fmt.Printf("todo: couldn't connect to mongo: %v", err)
}
defer cancel()
err = client.Connect(ctx)
if err != nil {
fmt.Printf("todo: mongo client couldn't connect with background context: %v", err)
}
col:=client.Database("jistest").Collection("Rules")
wg :=&sync.WaitGroup{}
for i:=0; i<100; i++{
wg.Add(2)
go insertUpdateQuery(i,col,wg)
go retrieveQuery(wg,col)
}
wg.Wait()
fmt.Println("All Done!")
}
I am wondering if mongo-go-driver supports for multi-document transaction?
mongo-go-driver does support multi-document transaction since v0.0.12 (currently in beta version 0.1.0).
MongoDB multi-document transactions are associated with a session. That is, you start a transaction for a session. When using any of MongoDB officially supported drivers, you must pass the session to each operation in the transaction.
Your example does not seem to utilise session nor transactions. An example of multi-document transaction in mongo-go-driver (v0.1.0) is as below:
client, err := mongo.NewClient("<MONGODB URI>")
if err != nil { return err }
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
defer cancel()
err = client.Connect(ctx)
if err != nil { return err }
session, err := client.StartSession()
database := client.Database("databaseName")
collection := database.Collection("collectionName")
err = mongo.WithSession(ctx, session, func(sctx mongo.SessionContext) error {
// Start a transaction in the session
sctx.StartTransaction()
var operations []mongo.WriteModel
// Create an insert one operation
operations = append(operations,
mongo.NewInsertOneModel().Document(
bson.D{{"Name", counter},
{"Description", "latest"}}))
// Create an update many operation
updateOperation := mongo.NewUpdateManyModel()
updateOperation.Filter(bson.D{{"Name", bson.D{
{"$ne", counter},
}}})
updateOperation.Update(bson.D{
{"$set", bson.D{
{"Description", counter},
}},
})
operations = append(operations, updateOperation)
// Execute bulkWrite operation in a transactional session.
_, err := collection.BulkWrite(sctx, operations)
if err != nil {
fmt.Println(err)
return err
}
// Committing transaction
session.CommitTransaction(sctx)
return nil
})
session.EndSession(ctx)
See also Transactions and Retryable Writes for examples to retry transactions.
I built some goroutines to test if the transactions are atomic
Just be mindful about how the processes are executed. For example, depending on the racing condition, you may get the latest performance overwriting the result. i.e.
transaction 1 finished
transaction 2 finished
transaction 3 and transaction 4 conflict
transaction 5 finished
...