Im trying to run a Pipe that doesn't return any results, because the last pipeline operator is $out.
// { $out: "y" }
pipeline := DB. C("x"). Pipe(stages). AllowDiskUse()
result := []bson.M{}
err := pipeline.All(&result)
When running the pipe with I'm getting a timeout. I assume mgo is waiting for results to be read - forever.
Solved. Instead of calling All(&result), call Iter().
All would call Next on an iterator that is empty from the beginning, obviously leading to the timeout.
Iter returns an iterator, that will just get discarded. No calls to Next, no timeouts.
Related
I want a few jobs executed everyday at specific times.
The first job I want to run is to acquire data from the database and store it in a global variable
The second job I want to run is a few minutes after the first job is executed where it uses the data acquired from the first job that was stored in a global variable.
global dataacq
dataacq = None
def condb():
global check
global dataacq
conn = psycopg2.connect(#someinformation)
cursor = conn.cursor()
query = "SELECT conversation_id FROM tablename"
cursor.execute(query)
dataacq = cursor.fetchall()
print(dataacq)
cursor.close()
conn.close()
check = True
print(check)
return dataacq
def printresult(result):
print(result)
schedule.every().day.at("08:59").do(condb)
schedule.every().day.at("09:00").do(printresult, dataacq)
Above is a part of the code I am using for testing. The problem here is when the "printresult" function is called it displays None as output. But if I execute all the functions without any scheduling then it works and displays what I need it to show. So why is this happening?
I have a command which is made using "labix.org/v2/mgo" library
err = getCollection.Find(bson.M{}).Sort("department").Distinct("department", &listedDepartment)
this is working fine. But now I'm moving to the official golang mongo-driver "go.mongodb.org/mongo-driver/mongo" and I want to run this command in that library but there is no direct function that I can use with Find then Sort then Distinct. How can I achieve this command using this mongo-driver. The variable listedDepartment is type of []string. Please suggest me know the solutions.
You may use Collection.Distinct() but it does not yet support sorting:
// Obtain collection:
c := client.Database("dbname").Collection("collname")
ctx := context.Background()
results, err := c.Distinct(ctx, "department", bson.M{})
It returns a value of type []interface{}. If you know it contains string values, you may use a loop and type assertions to obtain the string values like this:
listedDepartment = make([]string, len(results))
for i, v := range results {
listedDepartment[i] = v.(string)
}
And if you need it sorted, simply sort the slice:
sort.Strings(listedDepartment)
How to avoid CORB timeout when run large batch of data pull over 10 million docs pdf/xml? Do I need to reduce thread-count and batch-size.
uris-module:
let $uris := cts:uris(
(),
(),
cts:and-query((
cts:collection-query("/sites"),
cts:field-range-query("cdate","<","2019-10-01"),
cts:not-query(
cts:or-query((
cts:field-word-query("dcax","200"),
more code...,
))
)
))
return (fn:count($uris), $uris)
process.xqy:
declare variable $URI as xs:string external;
let $uris := fn:tokenize($URI,";")
let $outputJson := "/output/json/"
let $outputPdf := "/output/pdf/"
for $uri1 in $uris
let $accStr := fn:substring-before(fn:substring-after($uri1,"/sites/"),".xml")
let $pdfUri := fn:concat("/pdf/iadb/",$accStr,".pdf")
let $doc := fn:doc($uri1)
let $obj := json:object()
let $_ := map:put($obj,"PaginationOrMediaCount",fn:number($doc/rec/MediaCount))
let $_ := map:put($obj,"Abstract",fn:replace($doc/rec/Abstract/text(),"[^a-zA-Z0-9 ,.\-\r\n]",""))
let $_ := map:put($obj,"Descriptors",json:to-array($doc/rec/Descriptor/text()))
let $_ := map:put($obj,"FullText",fn:replace($doc/rec/FullText/text(),"[^a-zA-Z0-9 ,.\-\r\n]",""))
let $_ := xdmp:save(
fn:concat($outputJson,$accStr,".json"),
xdmp:to-json($obj)
)
let $_ := if (fn:doc-available($pdfUri))
then xdmp:save(
fn:concat($outputPdf,$accStr,".pdf"),
fn:doc($pdfUri)
)
else ()
return $URI
It would be easier to diagnose and suggest improvements if you shared the CoRB job options and the code for your URIS-MODULE and PROCESS-MODULE
The general concept of a CoRB job is that is splits up the work to perform multiple module executions rather than trying to do all of the work in a single execution, in order to avoid timeout issues and excessive memory consumption.
For instance, if you wanted to download 10 million documents, the URIS-MODULE would select the URIs of all of those documents, and then each URI would be sent to the PROCESS-MODULE, which would be responsible for retrieving it. Depending upon the THREAD-COUNT, you could be downloading several documents at a time but they should all be returning very quickly.
Is the execution of the URIs module what is timing out, or the process module?
You can increase the timeout limit from the default limit up to the maximum timeout limit by using: xdmp:request-set-time-limit()
Generally, the process modules should execute quickly and shouldn't be timing out. One possible reason would be performing too much work in the transform (i.e. setting BATCH-SIZE really large and doing too much at once) or maybe a misconfiguration or poorly written query (i.e. rather than fetching a single doc with the $URI value, performing a search and retrieving all of the docs each time that the process module is executed).
I've tried to implement a t flip flop(I think this is what it's called) into my program but am having some issues with it. The idea is to have the program start and stop while using the same hotkey. This is what I have so far.
looping := false
pass = 0
max = 2
^r::
pass++
looping := true
while(looping = true AND pass < max)
{
Send, stack overflow, save me!
}
looping := false
pass = 0
return
When I run the program and hit the hotkey the while loop starts. However, when I attempt to break the loop by pressing ^r I get no response and the program keeps looping.
I think you are referring to a "toggle" script. I am not what sure you are trying to achieve exactly, but the key is using a logical not: looping := !true. More about it here.
looping := false
pass = 0
max = 2
^r::
pass++
looping := !true
while (looping & pass < max)
{
Send, stack overflow, save me!
}
pass = 0
return
There's a lot of resources for this, here are a few:
https://autohotkey.com/boards/viewtopic.php?t=11952
http://maul-esel.github.io/ahkbook/en/toggle-autofire.html
https://www.reddit.com/r/AutoHotkey/comments/6wqgbu/how_do_i_toggle_hold_down_a_key/dmad0xx
I'm trying to read a collection dump generated by mongodump. The file is a few gigabytes so I want to read it incrementally.
I can read the first object with something like this:
buf := make([]byte, 100000)
f, _ := os.Open(path)
f.Read(buf)
var m bson.M
bson.Unmarshal(buf, &m)
However I don't know how much of the buf was consumed, so I don't know how to read the next one.
Is this possible with mgo?
Using mgo's bson.Unmarshal() alone is not enough -- that function is designed to take a []byte representing a single document, and unmarshal it into a value.
You will need a function that can read the next whole document from the dump file, then you can pass the result to bson.Unmarshal().
Comparing this to encoding/json or encoding/gob, it would be convenient if mgo.bson had a Reader type that consumed documents from an io.Reader.
Anyway, from the source for mongodump, it looks like the dump file is just a series of bson documents, with no file header/footer or explicit record separators.
BSONTool::processFile shows how mongorestore reads the dump file. Their code reads 4 bytes to determine the length of the document, then uses that size to read the rest of the document. Confirmed that the size prefix is part of the bson spec.
Here is a playground example that shows how this could be done in Go: read the length field, read the rest of the document, unmarshal, repeat.
The method File.Read returns the number of bytes read.
File.Read
Read reads up to len(b) bytes from the File. It returns the number of bytes read and an error, if any. EOF is signaled by a zero count with err set to io.EOF.
So you can get the number of bytes read by simply storing the return parameters of you read:
n, err := f.Read(buf)
I managed to solve it with the following code:
for len(buf) > 0 {
var r bson.Raw
var m userObject
bson.Unmarshal(buf, &r)
r.Unmarshal(&m)
fmt.Println(m)
buf = buf[len(r.Data):]
}
Niks Keets' answer did not work for me. Somehow len(r.Data) was always the whole buffer length. So I came out with this other code:
for len(buff) > 0 {
messageSize := binary.LittleEndian.Uint32(buff)
err = bson.Unmarshal(buff, &myObject)
if err != nil {
panic(err)
}
// Do your stuff
buff = buff[messageSize:]
}
Of course you have to handle truncated strucs at the end of the buffer. In my case I could load the whole file into memory.