Following is the code for fetching the results from the db providing collection, filter query, sorting query and number of limit.
func DBFetch(collection *mongo.Collection, filter interface{}, sort interface{}, limit int64) ([]bson.M, error) {
findOptions := options.Find()
findOptions.SetLimit(limit)
findOptions.SetSort(sort)
cursor, err := collection.Find(context.Background(), filter, findOptions)
var result []bson.M
if err != nil {
logger.Client().Error(err.Error())
sentry.CaptureException(err)
cursor.Close(context.Background())
return nil, err
}
if err = cursor.All(context.Background(), &result); err != nil {
logger.Client().Error(err.Error())
sentry.CaptureMessage(err.Error())
return nil, err
}
return result, nil
}
I am using mongo-go driver version 1.8.2
mongodb community version 4.4.7 sharded mongo with 2 shards
Each shard is with 30 CPU in k8 with 245Gb memory having 1 replica
200 rpm for the api
Api fetches the data from mongo and format it and the serves it
We are reading and writing both on primary.
Heavy writes occur every hour approximately.
Getting timeouts in milliseconds ( 10ms-20ms approx. )
As pointed out by #R2D2 in the comment, no cursor timeout error occurs when the default timeout (10 minutes) exceeds and there was no request from go for next set of data.
There are couple of workarounds you can do to mitigate getting this error.
First option is to set batch size for your find query by using the below option. By doing do, you are instructing MongoDB to send data in specified chunks rather than sending more data. Note that this will usually increase the roundtrip time between MongoDB and Go server.
findOptions := options.Find()
findOptions.SetBatchSize(10) // <- Batch size is set to `10`
cursor, err := collection.Find(context.Background(), filter, findOptions)
Furthermore, you can set the NoCursorTimeout option which will keep your MongoDB find query result cursor pointer to stay alive unless you manually close it. This option is a double edge sword since you have to manually close the cursor once you no longer need that cursor, else that cursor will stay in memory for a prolonged time.
findOptions := options.Find()
findOptions.SetNoCursorTimeout(true) // <- Applies no cursor timeout option
cursor, err := collection.Find(context.Background(), filter, findOptions)
// VERY IMPORTANT
_ = cursor.Close(context.Background()) // <- Don't forget to close the cursor
Combine the above two options, below will be your complete code.
func DBFetch(collection *mongo.Collection, filter interface{}, sort interface{}, limit int64) ([]bson.M, error) {
findOptions := options.Find()
findOptions.SetLimit(limit)
findOptions.SetSort(sort)
findOptions.SetBatchSize(10) // <- Batch size is set to `10`
findOptions.SetNoCursorTimeout(true) // <- Applies no cursor timeout option
cursor, err := collection.Find(context.Background(), filter, findOptions)
var result []bson.M
if err != nil {
//logger.Client().Error(err.Error())
//sentry.CaptureException(err)
_ = cursor.Close(context.Background())
return nil, err
}
if err = cursor.All(context.Background(), &result); err != nil {
//logger.Client().Error(err.Error())
//sentry.CaptureMessage(err.Error())
return nil, err
}
// VERY IMPORTANT
_ = cursor.Close(context.Background()) // <- Don't forget to close the cursor
return result, nil
}
Related
I am trying something simple using the mongo-go-driver.
I insert some datas in a collection, and I want them to be automaticaly deleted after a number of seconds.
I have read the following documentation : https://docs.mongodb.com/manual/tutorial/expire-data/#expire-documents-after-a-specified-number-of-seconds
Then I have wrote something in GO, but it does not seems to work as I expected. Maybe there is something I did not get, or I am doing the wrong way.
package main
import (
"bytes"
"context"
"fmt"
"log"
"text/tabwriter"
"time"
"github.com/Pallinder/go-randomdata"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/bson/primitive"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
func main() {
ctx := context.TODO()
client, err := mongo.NewClient(options.Client().ApplyURI("mongodb://localhost:27017"))
if err != nil {
log.Fatal(err)
}
err = client.Connect(ctx)
if err != nil {
log.Fatal(err)
}
db := client.Database("LADB")
col := db.Collection("LACOLL")
// add index to col
// the goal is to set a TTL for datas to only 1 secondes (test purpose)
model := mongo.IndexModel{
Keys: bson.M{"createdAt": 1},
Options: options.Index().SetExpireAfterSeconds(1),
}
ind, err := col.Indexes().CreateOne(ctx, model)
if err != nil {
log.Fatal(err)
}
fmt.Println(ind)
// insert some datas each seconds
for i := 0; i < 5; i++ {
name := randomdata.SillyName()
res, err := col.InsertOne(ctx, NFT{Timestamp: time.Now(), CreatedAt: time.Now(), Name: name})
if err != nil {
log.Fatal(err)
}
fmt.Println("Inserted", name, "with id", res.InsertedID)
time.Sleep(1 * time.Second)
}
// display all
cursor, err := col.Find(ctx, bson.M{}, nil)
if err != nil {
log.Fatal(err)
}
var datas []NFT
if err = cursor.All(ctx, &datas); err != nil {
log.Fatal(err)
}
// I expect some datas not to be there (less than 5)
fmt.Println(datas)
}
type NFT struct {
ID primitive.ObjectID `bson:"_id,omitempty"`
CreatedAt time.Time `bson:"createdAt,omitempty"`
Timestamp time.Time `bson:"timestamp,omitempty"`
Name string `bson:"name,omitempty"`
}
There's nothing wrong with your example, it works.
Please note that the expireAfterSeconds you specify is the duration after createdAt when the document expires, and that instant is the earliest time at which the document may be deleted, but there is no guarantee that the deletion will happen "immediately", exactly at that time.
Quoting from MongoDB docs: TTL indexes: Timing of the Delete Operation:
The TTL index does not guarantee that expired data will be deleted immediately upon expiration. There may be a delay between the time a document expires and the time that MongoDB removes the document from the database.
The background task that removes expired documents runs every 60 seconds. As a result, documents may remain in a collection during the period between the expiration of the document and the running of the background task.
Because the duration of the removal operation depends on the workload of your mongod instance, expired data may exist for some time beyond the 60 second period between runs of the background task.
As you can see, if a document expires, at worst case it may take 60 seconds for the background task to kick in and start removing expired documents, and if there are many (or the database is under heavy load), it may take some time to delete all expired documents.
I am trying something simple using the mongo-go-driver.
I insert some datas in a collection, and I want them to be automaticaly deleted after a number of seconds.
I have read the following documentation : https://docs.mongodb.com/manual/tutorial/expire-data/#expire-documents-after-a-specified-number-of-seconds
Then I have wrote something in GO, but it does not seems to work as I expected. Maybe there is something I did not get, or I am doing the wrong way.
package main
import (
"bytes"
"context"
"fmt"
"log"
"text/tabwriter"
"time"
"github.com/Pallinder/go-randomdata"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/bson/primitive"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
func main() {
ctx := context.TODO()
client, err := mongo.NewClient(options.Client().ApplyURI("mongodb://localhost:27017"))
if err != nil {
log.Fatal(err)
}
err = client.Connect(ctx)
if err != nil {
log.Fatal(err)
}
db := client.Database("LADB")
col := db.Collection("LACOLL")
// add index to col
// the goal is to set a TTL for datas to only 1 secondes (test purpose)
model := mongo.IndexModel{
Keys: bson.M{"createdAt": 1},
Options: options.Index().SetExpireAfterSeconds(1),
}
ind, err := col.Indexes().CreateOne(ctx, model)
if err != nil {
log.Fatal(err)
}
fmt.Println(ind)
// insert some datas each seconds
for i := 0; i < 5; i++ {
name := randomdata.SillyName()
res, err := col.InsertOne(ctx, NFT{Timestamp: time.Now(), CreatedAt: time.Now(), Name: name})
if err != nil {
log.Fatal(err)
}
fmt.Println("Inserted", name, "with id", res.InsertedID)
time.Sleep(1 * time.Second)
}
// display all
cursor, err := col.Find(ctx, bson.M{}, nil)
if err != nil {
log.Fatal(err)
}
var datas []NFT
if err = cursor.All(ctx, &datas); err != nil {
log.Fatal(err)
}
// I expect some datas not to be there (less than 5)
fmt.Println(datas)
}
type NFT struct {
ID primitive.ObjectID `bson:"_id,omitempty"`
CreatedAt time.Time `bson:"createdAt,omitempty"`
Timestamp time.Time `bson:"timestamp,omitempty"`
Name string `bson:"name,omitempty"`
}
There's nothing wrong with your example, it works.
Please note that the expireAfterSeconds you specify is the duration after createdAt when the document expires, and that instant is the earliest time at which the document may be deleted, but there is no guarantee that the deletion will happen "immediately", exactly at that time.
Quoting from MongoDB docs: TTL indexes: Timing of the Delete Operation:
The TTL index does not guarantee that expired data will be deleted immediately upon expiration. There may be a delay between the time a document expires and the time that MongoDB removes the document from the database.
The background task that removes expired documents runs every 60 seconds. As a result, documents may remain in a collection during the period between the expiration of the document and the running of the background task.
Because the duration of the removal operation depends on the workload of your mongod instance, expired data may exist for some time beyond the 60 second period between runs of the background task.
As you can see, if a document expires, at worst case it may take 60 seconds for the background task to kick in and start removing expired documents, and if there are many (or the database is under heavy load), it may take some time to delete all expired documents.
I'm having some issues getting the fully expected results from a mongoDB query using the Golang driver.
I'm currently querying a collection with 5791 documents totaling around ~150MB. It seems that when the query gets a large amount of data as the result the cursor does not iterate over the complete set of documents expected.
For example:
Query returns 2290 documents instead of 5791 expected with no error and cursor iterates without error.
Is there anything in the FindOptions for the Collection.Find() perhaps to remove a byte size limit on the query results?
Here is the code I'm using:
func (db *Database) ExecuteQuery(coll string, query bson.M) ([]bson.M, error) {
// Retrieve the appropriate database and collection to query on
collection, ctx, cancel := database.getCollection(coll)
defer cancel()
cursor, err := collection.Find(ctx, query)
if err != nil {
return nil, err
}
var res []bson.M
for cursor.Next(ctx) {
//Create a value into which the single document can be decoded
var elem bson.M
err := cursor.Decode(&elem)
if err != nil {
return nil, err
}
res = append(res, elem)
}
cursor.Close(ctx)
return res, nil
}
Turns out the issue was when I implemented a method to get the a collection from the database I was getting the collection using context.WithTimeout which had a 10 second timeout. This basically capped the time I could execute a query using this context on this collection and so only the number of documents it could fit in that time period were returned.
The code below shows the change where the context is retrieved using context.WithCancel instead in order to let queries take as long as they want.
// getCollection retrieves the appropriate collection to query on and returns the context and context cancelling function tied to it.
func (db *Database) getCollection(coll string) (*mongo.Collection, context.Context, context.CancelFunc) {
mongoDB := db.client.Database(db.Config.DatabaseName)
collection := mongoDB.Collection(coll)
ctx, cancel := context.WithCancel(context.Background())
return collection, ctx, cancel
}
I have a collections of 2,000,000 records
> db.events.count(); │
2000000
and I use golang mongodb client to connect to the database
package main
import (
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://localhost:27888").SetAuth(options.Credential{
Username: "mongoadmin",
Password: "secret",
}))
if err != nil {
panic(err)
}
defer func() {
if err = client.Disconnect(ctx); err != nil {
panic(err)
}
}()
collection := client.Database("test").Collection("events")
var bs int32 = 10000
var b = true
cur, err := collection.Find(context.Background(), bson.D{}, &options.FindOptions{
BatchSize: &bs, NoCursorTimeout: &b})
if err != nil {
log.Fatal(err)
}
defer cur.Close(ctx)
s, n := runningtime("retrive db from mongo and publish to kafka")
count := 0
for cur.Next(ctx) {
var result bson.M
err := cur.Decode(&result)
if err != nil {
log.Fatal(err)
}
bytes, err := json.Marshal(result)
if err != nil {
log.Fatal(err)
}
count++
msg := &sarama.ProducerMessage{
Topic: "hello",
// Key: sarama.StringEncoder("aKey"),
Value: sarama.ByteEncoder(bytes),
}
asyncProducer.Input() <- msg
}
But the the program only retrives only about 600,000 records instead of 2,000,000 every times I ran the program.
$ go run main.go
done
count = 605426
nErrors = 0
2020/09/18 11:23:43 End: retrive db from mongo and publish to kafka took 10.080603336s
I don't know why? I want to retrives all 2,000,000 records. Thanks for any help.
Your loop fetching the results may end early because you are using the same ctx context for iterating over the results which has a 10 seconds timeout.
Which means if retrieving and processing the 2 million records (including connecting) takes more than 10 seconds, the context will be cancelled and thus the cursor will also report an error.
Note that setting FindOptions.NoCursorTimeout to true is only to prevent cursor timeout for inactivity, it does not override the used context's timeout.
Use another context for executing the query and iterating over the results, one that does not have a timeout, e.g. context.Background().
Also note that for constructing the options for find, use the helper methods, so it may look as simple and as elegant as this:
options.Find().SetBatchSize(10000).SetNoCursorTimeout(true)
So the working code:
ctx2 := context.Background()
cur, err := collection.Find(ctx2, bson.D{},
options.Find().SetBatchSize(10000).SetNoCursorTimeout(true))
// ...
for cur.Next(ctx2) {
// ...
}
// Also check error after the loop:
if err := cur.Err(); err != nil {
log.Printf("Iterating over results failed: %v", err)
}
IN setLimit() method what should i keep to fetch all the records in data
packages - used : go.mongodb.org/mongo-driver/bson
go.mongodb.org/mongo-driver/mongo
go.mongodb.org/mongo-driver/mongo/options
findOption := options.Find()
findOption.SetLimit(?)
var res1 []Person
cur, err := collection.Find(context.TODO(), bson.D{}, findOption)
if err != nil {
log.Fatal(err)
}
for cur.Next(context.TODO()) {
var elem Person
err := cur.Decode(&elem)
if err != nil {
log.Fatal(err)
}
res1 = append(res1, elem)
}
if err := cur.Err(); err != nil {
log.Fatal(err)
}
// Close the cursor once finished
cur.Close(context.TODO())
fmt.Printf("Found multiple documents (array of pointers): %+v\n", res1)
Easiest is to not call FindOptions.SetLimit() if you don't want to limit the number of results. If you don't pass a FindOptions, or you pass one where you did not set a limit, by default, results are not limited.
If you have a FindOptions value where limit has been set previously, you may set a limit of 0 to "undo" the limitation.
Quoting from FindOptions.Limit:
// The maximum number of documents to return. The default value is 0, which means that all documents matching the
// filter will be returned. A negative limit specifies that the resulting documents should be returned in a single
// batch. The default value is 0.
Limit *int64