I am using GridFS to store some video files in my database. I have updated to MongoDB 4.0 and trying to use the multi collection transaction model. The problem that I am facing is that the put() command gridfs hangs the system. The way I am using it as follows:
client = pymongo.MongoClient(mongo_url)
db = client[db_name]
fs = gridfs.GridFS(db)
Now I try to use the transaction model as follows:
with db.client.start_session() as session:
try:
file_path = "video.mp4"
session.start_transaction()
with open(file_path, 'rb') as f:
fid = self.fs.put(f, metadata={'sequence_id': '0001'})
session.commit_transaction()
except Exception as e:
raise
finally:
session.end_session()
The issue is that the put command hangs for about a minute. It then returns but the commit fails. I have a feeling this is because the session object is not passed to the put command but I do not see any parameter in the help that takes the session as an input. After the hang, the test fails with the following stack:
Traceback (most recent call last):
session.commit_transaction()
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/client_session.py", line 393, in commit_transaction
self._finish_transaction_with_retry("commitTransaction")
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/client_session.py", line 457, in _finish_transaction_with_retry
return self._finish_transaction(command_name)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/client_session.py", line 452, in _finish_transaction
parse_write_concern_error=True)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/database.py", line 514, in _command
client=self.__client)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/pool.py", line 579, in command
unacknowledged=unacknowledged)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/network.py", line 150, in command
parse_write_concern_error=parse_write_concern_error)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Transaction 1 has been aborted.
EDIT
I tried replacing the put block as:
try:
gf = self.fs.new_file(metadata={'sequence_id': '0000'})
gf.write(f)
finally:
gf.close()
However, the hang happens again at gf.close()
I also tried instantiating the GridIn directly so that I could provide the session object and this fails as:
gin = gridfs.GridIn(root_collection=self.db["fs.files"], session=session)
gin.write(f)
gin.close()
This fails with the error message:
It is illegal to provide a txnNumber for command createIndexes
The issue is that the put command hangs for about a minute
The first attempt with self.fs.put() does not actually use transactions, it just took a while to upload the file.
Then after the upload was completed while attempting to commit the (empty) transaction, unfortunately the transaction reached the maximum lifetime limit due to the time taken to upload. See transactionLifetimeLimitSeconds. The default limit has been set to 60 seconds to set the maximum transaction runtime.
If you are considering raising this limit, keep in mind that as write volume enters MongoDB after a transactions snapshot has been created, WiredTiger cache pressure builds up. This cache pressure can only be released once a transaction commits. This is the reason behind the 60 second default limit.
It is illegal to provide a txnNumber for command createIndexes
First, operations that affect the database catalog such as creating or dropping a collection or an index, are not allowed in multi-document transactions.
The code for PyMongo GridFS is attempting to create indexes for the GridFS collection, which is prohibited on the server when used with transactional session (You may use session but not transactions).
I have updated to MongoDB 4.0 and trying to use the multi collection transaction model
I'd recommend to use a normal database operations with GridFS. MongoDB multi-document transaction is designed to be used for multi-documents atomicity. Which I don't think it's necessary in the file uploading case.
Related
I have started to work on a Spring WebFlux and R2DBC project. Mainly, my code works fine.
But after some elements I am receiving this warning
r2dbc.mssql.client.ReactorNettyClient : Connection has been closed by peer
after this warning I am getting this exception and normally program stops to read from Flux which source is R2DBC driver.
ReactorNettyClient$MssqlConnectionClosedException: Connection unexpectedly closed
My main pipeline like this;
Sinks.Empty<Void> completionSink = Sinks.empty();
Flux<Event> events = service.getPairs(
taskProperties.A,
taskProperties.B);
events
.flatMap(some operation)
.doOnComplete(() -> {
log.info("Finished Job");
completionSink.emitEmpty(Sinks.EmitFailureHandler.FAIL_FAST);
})
.subscribe();
completionSink.asMono().block();
After run, flatMap requesting 256 element as a default, then after fetching trying to request(1) for next signal.
At somewhere between 280. and 320. element it is getting above error. It is not idempotent, sometimes it reads 280 element sometimes it is reading 303, 315 etc.
I think it is about network maybe? But not sure and cannot find the reason. Do I need a pool or something different?
Sorry if I missed anything, in case you want I will try to update here.
Thank you in advance
I have tried to change request size of flatMap to unbounded, adding scheduler, default r2dbc pool but for now I don't have any clue.
I’m trying to submit a JCL through an SQR Program using Call System Command on MVS z/os. The JCL resides in specific Dataset.
What I’m trying do is something like this:
let $jclcmd= 'SUBMIT PSLIBDSN.O92.CUST7.JCLSRC(UTILI)'
call system using $jclcmd #rtnstat
Up to this point, I have not been able to submit the JCL. What I get from the mainframe is this error:
**** WARNING **** ERRNO = ESYS
Generated in SYSTEM called from line 389 of SYS(UCALL) , offset 000118
Program SUBMIT was abnormally terminated with a system code of 66D.SYS(UCALL) , offset 000118
I also tried let $jclcmd= 'TSO SUBMIT PSLIBDSN.O92.CUST7.JCLSRC(UTILI)' but gets this:
Program TSO was abnormally terminated with a system code of 806.
SYSTEM COMPLETION CODE=806 REASON CODE=00000004
Up to this point I have thought that the call system function does not allow operating system commands to be executed for reasons of incompatibility with MVS. The reality is that the SQR documentation does not mention that it is not, but always mentions Windows and UNIX as an example. I have made a thousand attempts to execute a REXX program, submit a JCL and others but looks like the function is not right assembling the command.
Any idea will be welcome.
I have a question regarding using the parameter_variation.py script provided on GitHub.
I'm using FMPy functions here (https://github.com/CATIA-Systems/FMPy) and have a specific error occurring only when I run a certain FMU, which is only slightly different from other FMU’s I’ve been using with a modified version of the parameter_variation.py example script provided.
Errors:
...
File "c:\parameter_variation.py", line 136, in simulate_fmu
fmu.terminate()
File "C:\AppData\Local\Continuum\anaconda3\lib\site-packages\fmpy\fmi2.py", line 231, in terminate
return self.fmi2Terminate(self.component)
File "C:\AppData\Local\Continuum\anaconda3\lib\site-packages\fmpy\fmi2.py", line 169, in w res = f(*args, **kwargs)
OSError: exception: access violation reading 0xFFFFFFFE1CD34660
End
I’m running 100 simulations for this FMU in 20 chunks, although the same FMU in the parameter_variation.py script appears to provide results if I run less than ~30 simulations in ~6 chunks.
Do you have any guesses why the access violation error may be occurring and how a solution can be forged? Let me know if this is enough information.
Thanks in advance.
In the title you mention multi-threading (multiple instances of the same FMU in the same process) which is not supported by many FMUs and can lead to unexpected side effects (e.g. through access to shared resources). If this is the case here you should be able to run your variation with a synchronized scheduler by setting the variable sync = True in parameter_variation.py (line 27).
I wrote the file transferring code as follows:
val fileContent: Enumerator[Array[Byte]] = Enumerator.fromFile(file)
val size = file.length.toString
file.delete // (1) THE FILE IS TEMPORARY SO SHOULD BE DELETED
SimpleResult(
header = ResponseHeader(200, Map(CONTENT_LENGTH -> size, CONTENT_TYPE -> "application/pdf")),
body = fileContent)
This code works successfully, even if the file size is rather large (2.6 MB),
but I'm confused because my understanding about .fromFile() is a wrapper of fromCallBack() and SimpleResult actually reads the file buffred,but the file is deleted before that.
MY easy assumption is that java.io.File.delete waits until the file gets released after the chunk reading completed, but I have never heard of that process of Java File class,
Or .fromFile() has already loaded all lines to the Enumerator instance, but it's against the fromCallBack() spec, I think.
Does anybody knows about this mechanism?
I'm guessing you are on some kind of a Unix system, OSX or Linux for example.
On a Unix:y system you can actually delete a file that is open, any filesystem entry is just a link to the actual file, and so is a file handle which you get when you open a file. The file contents won't become unreachable /deleted until the last link to it is removed.
So: it will no longer show up in the filesystem after you do file.delete but you can still read it using the InputStream that was created in Enumerator.fromFile(file) since that created a file handle. (On Linux you actually can find it through the special /proc filesystem which, among other things, contains the filehandles of each running process)
On windows I think you will get an error though, so if it is to run on multiple platforms you should probably check test your webapp on windows as well.
i am pulling urls from my database with a perl script where i employ fetchrow_array to pull URL from the database which worked fine until i encountered a very long URL georgelog24.blog.iskreni.net/?bid=6744d9dcf85991ed2e4b8a258153a1ab&lid=ff9963b9a798ea335b75b5f7c0c295d1
then it started to give me this error.
DBD::ODBC::st fetchrow_array failed: st_fetch/SQLFetch (long truncated DBI attribute LongTruncOk not set and/or LongReadLen too small) (SQL-HY000) [state was HY000 now 01004]
[Microsoft][ODBC SQL Server Driver]String data, right truncation (SQL-01004) at C:\test\multihashtest2.pl line 44.
I believe this is on the database side as the code i have been using to pull URL has worked before. The database that i am using is MSSQL server 2005.
the URL column in the database uses text type currently, but i have tried changing it to varchar(max) and nvarchar(max) but the error still stands.
After a bit of trial and error i found that the maximum length of the url then i could query successfully with fetchrow_array was 81 characters. And since URLs can span ridiculous lengths sometimes, i cannot put a restriction on URL length.
Can anybody help me understand and suggest a fix for this?
FYI: line 44 is the first line in my code below
while (($myid,$url) = $statement_handle->fetchrow_array()) { # executes as many threads as there are jobs to do
my $thread = threads->create(\&webcrawl); #initiate thread
my $tid = $thread->tid;
print " - Thread $tid started\n"; #obtain thread no. and print
push (#Threads, $thread); #push thread into array for "housekeeping" later on
}
Try with:
#not anymore errors if content is truncated - you don't necessarily want this
$statement_handle->{'LongTruncOk'} = 1;
#nice, hard coded constant for the length of data to be read from Longs
$statement_handle->{'LongReadLen'} = 20000;
while (($myid,$url) = $statement_handle->fetchrow_array()) { # executes as many threads as there are jobs to do
my $thread = threads->create(\&webcrawl); #initiate thread
my $tid = $thread->tid;
print " - Thread $tid started\n"; #obtain thread no. and print
push (#Threads, $thread); #push thread into array for "housekeeping" later on
}
Also, I'd recommend you to try Parallel::ForkManager for parallelizing jobs - I find it much more intuitive and easy to use than threads
Please look at the DBI attributes LongTruncOk and LongReadlen
You will NEED to either accept truncation or set a max size as text and varchar(max) columns can be massive so if it was left to the DBD it would have no choice but to allocate massive amounts of memory in case the column is the max size of that column.
Important point: you need to set the LongReadLen and/or LongTruncOk attributes on the database handle prior to preparing the statement, as noted here.
Attempting to set it on the prepared statement handle prior to fetching data will have no effect on truncation of the returned data.