I'm trying to save a large file from network to disk (160MB file). I'm saving the chunks as soon as they arrive but my saved file on disk is always corrupted and having variable size each time I re-run the script.
The reason I'm trying to save it as soon as a chunk has arrived is to save memory. I tried fetching the entire file and then save it, which caused my flutter to consume about 2GB of memory.
This is my code:
final request = http.Request('GET', uri);
final streamedResponse = await request.send();
File pathToSave = File('C:\\Downloads\\test.zip');
streamedResponse.stream.listen((value) async
{
await pathToSave.writeAsBytes(value, mode: FileMode.writeOnlyAppend, flush: true);
})
I'm not sure but it seems the listen function doesn't wait for the previous callback to get finished and fires the next callback immediately and meddles with the previous file write.
Same script works fine with small files.
Flutter 3.0.5 (desktop)
Windows 11
I am using excel package version 2.0.0-null-safety-3 to read an excel file,
For small files, it looks good
But when reading a large file, the interface stops until the file is read
Excel.decodeBytes(_bytes);
decodeBytes method => sync is not supported
Is there a way to make the process synchronous
To be able to show the (download bar or waiting dialog) to the user
Thanks in advance.
Use Compute For Large Bytes of Data
await compute(function,param)
Flutter Compute
I'm uploading files, in chunks, to SharePoint Online through the REST API.
I am doing this via the StartUpload, ContinueUpload and FinishUpload methods and passing them a chunk as a byte[].
I have used this as the example to work from: Using chunked upload/StartUpload with sharepoint REST api
Documentation is here: https://msdn.microsoft.com/library/office/microsoft.sharepoint.client.file.startupload.aspx
(I have not included the code as I don't think it's relevant, please correct me if I'm mistaken)
This works as long as the total file size is larger than the chunk size.
For example, if the chunk size is 1MB, but the total file size is 4MB, then the method will work.
If the chunk size is 4MB, but the total file size is 1MB, then I end up with empty or corrupt files once uploaded.
This is because the initial call to StartUpload contains the entire file in one chunk, so FinishUpload never gets called to close the file.
If I call FinishUpload with an empty byte[0] then I get:
Error 500 - Internal Server Error: The upload was incomplete. Try to save again.
Under normal circumstances I would simply check if the total file size was smaller than the chunk size and instead add the file directly using /Files/add(...).
Unfortunately I am being passed the file chunks in a stream and I don't know the total file size beforehand. I also can't save all chunks before processing, I have to pass each chunk straight to SharePoint as I am given them.
So how do I use FinishUpload to close a file upload when I have no more bytes to upload?
For that matter you could consider the following modifications:
1) in case if total file size is less then chunk size, then the file
is getting uploaded via a single request. For example:
var fi = new FileInfo(fileName);
if(fi.Length <= chunkSize)
{
this.UploadFile(address, fileName);
return;
}
where WebClient.UploadFile Method is used for uploading a file via SharePoint RPC
2) otherwise file is getting uploaded via chunk session
Refer SharePointClient.cs for a complete example.
Meteor is great but it lacks native supports for traditional file uploading. There are several options to handle file uploading:
From the client, data can be sent using:
Meteor.call('saveFile',data) or collection.insert({file:data})
'POST' form or HTTP.call('POST')
In the server, the file can be saved to:
a mongodb file collection by collection.insert({file:data})
file system in /path/to/dir
mongodb GridFS
What are the pros and cons for these methods and how best to implement them? I am aware that there are also other options such as saving to a third party site and obtain an url.
You can achieve file uploading with Meteor without using any more packages or a third party
Option 1: DDP, saving file to a mongo collection
/*** client.js ***/
// asign a change event into input tag
'change input' : function(event,template){
var file = event.target.files[0]; //assuming 1 file only
if (!file) return;
var reader = new FileReader(); //create a reader according to HTML5 File API
reader.onload = function(event){
var buffer = new Uint8Array(reader.result) // convert to binary
Meteor.call('saveFile', buffer);
}
reader.readAsArrayBuffer(file); //read the file as arraybuffer
}
/*** server.js ***/
Files = new Mongo.Collection('files');
Meteor.methods({
'saveFile': function(buffer){
Files.insert({data:buffer})
}
});
Explanation
First, the file is grabbed from the input using HTML5 File API. A reader is created using new FileReader. The file is read as readAsArrayBuffer. This arraybuffer, if you console.log, returns {} and DDP can't send this over the wire, so it has to be converted to Uint8Array.
When you put this in Meteor.call, Meteor automatically runs EJSON.stringify(Uint8Array) and sends it with DDP. You can check the data in chrome console websocket traffic, you will see a string resembling base64
On the server side, Meteor call EJSON.parse() and converts it back to buffer
Pros
Simple, no hacky way, no extra packages
Stick to the Data on the Wire principle
Cons
More bandwidth: the resulting base64 string is ~ 33% larger than the original file
File size limit: can't send big files (limit ~ 16 MB?)
No caching
No gzip or compression yet
Take up lots of memory if you publish files
Option 2: XHR, post from client to file system
/*** client.js ***/
// asign a change event into input tag
'change input' : function(event,template){
var file = event.target.files[0];
if (!file) return;
var xhr = new XMLHttpRequest();
xhr.open('POST', '/uploadSomeWhere', true);
xhr.onload = function(event){...}
xhr.send(file);
}
/*** server.js ***/
var fs = Npm.require('fs');
//using interal webapp or iron:router
WebApp.connectHandlers.use('/uploadSomeWhere',function(req,res){
//var start = Date.now()
var file = fs.createWriteStream('/path/to/dir/filename');
file.on('error',function(error){...});
file.on('finish',function(){
res.writeHead(...)
res.end(); //end the respone
//console.log('Finish uploading, time taken: ' + Date.now() - start);
});
req.pipe(file); //pipe the request to the file
});
Explanation
The file in the client is grabbed, an XHR object is created and the file is sent via 'POST' to the server.
On the server, the data is piped into an underlying file system. You can additionally determine the filename, perform sanitisation or check if it exists already etc before saving.
Pros
Taking advantage of XHR 2 so you can send arraybuffer, no new FileReader() is needed as compared to option 1
Arraybuffer is less bulky compared to base64 string
No size limit, I sent a file ~ 200 MB in localhost with no problem
File system is faster than mongodb (more of this later in benchmarking below)
Cachable and gzip
Cons
XHR 2 is not available in older browsers, e.g. below IE10, but of course you can implement a traditional post <form> I only used xhr = new XMLHttpRequest(), rather than HTTP.call('POST') because the current HTTP.call in Meteor is not yet able to send arraybuffer (point me if I am wrong).
/path/to/dir/ has to be outside meteor, otherwise writing a file in /public triggers a reload
Option 3: XHR, save to GridFS
/*** client.js ***/
//same as option 2
/*** version A: server.js ***/
var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db;
var GridStore = MongoInternals.NpmModule.GridStore;
WebApp.connectHandlers.use('/uploadSomeWhere',function(req,res){
//var start = Date.now()
var file = new GridStore(db,'filename','w');
file.open(function(error,gs){
file.stream(true); //true will close the file automatically once piping finishes
file.on('error',function(e){...});
file.on('end',function(){
res.end(); //send end respone
//console.log('Finish uploading, time taken: ' + Date.now() - start);
});
req.pipe(file);
});
});
/*** version B: server.js ***/
var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db;
var GridStore = Npm.require('mongodb').GridStore; //also need to add Npm.depends({mongodb:'2.0.13'}) in package.js
WebApp.connectHandlers.use('/uploadSomeWhere',function(req,res){
//var start = Date.now()
var file = new GridStore(db,'filename','w').stream(true); //start the stream
file.on('error',function(e){...});
file.on('end',function(){
res.end(); //send end respone
//console.log('Finish uploading, time taken: ' + Date.now() - start);
});
req.pipe(file);
});
Explanation
The client script is the same as in option 2.
According to Meteor 1.0.x mongo_driver.js last line, a global object called MongoInternals is exposed, you can call defaultRemoteCollectionDriver() to return the current database db object which is required for the GridStore. In version A, the GridStore is also exposed by the MongoInternals. The mongo used by current meteor is v1.4.x
Then inside a route, you can create a new write object by calling var file = new GridStore(...) (API). You then open the file and create a stream.
I also included a version B. In this version, the GridStore is called using a new mongodb drive via Npm.require('mongodb'), this mongo is the latest v2.0.13 as of this writing. The new API doesn't require you to open the file, you can call stream(true) directly and start piping
Pros
Same as in option 2, sent using arraybuffer, less overhead compared to base64 string in option 1
No need to worry about file name sanitisation
Separation from file system, no need to write to temp dir, the db can be backed up, rep, shard etc
No need to implement any other package
Cachable and can be gzipped
Store much larger sizes compared to normal mongo collection
Using pipe to reduce memory overload
Cons
Unstable Mongo GridFS. I included version A (mongo 1.x) and B (mongo 2.x). In version A, when piping large files > 10 MB, I got lots of error, including corrupted file, unfinished pipe. This problem is solved in version B using mongo 2.x, hopefully meteor will upgrade to mongodb 2.x soon
API confusion. In version A, you need to open the file before you can stream, but in version B, you can stream without calling open. The API doc is also not very clear and the stream is not 100% syntax exchangeable with Npm.require('fs'). In fs, you call file.on('finish') but in GridFS you call file.on('end') when writing finishes/ends.
GridFS doesn't provide write atomicity, so if there are multiple concurrent writes to the same file, the final result may be very different
Speed. Mongo GridFS is much slower than file system.
Benchmark
You can see in option 2 and option 3, I included var start = Date.now() and when writing end, I console.log out the time in ms, below is the result. Dual Core, 4 GB ram, HDD, ubuntu 14.04 based.
file size GridFS FS
100 KB 50 2
1 MB 400 30
10 MB 3500 100
200 MB 80000 1240
You can see that FS is much faster than GridFS. For a file of 200 MB, it takes ~80 sec using GridFS but only ~ 1 sec in FS. I haven't tried SSD, the result may be different. However, in real life, the bandwidth may dictate how fast the file is streamed from client to server, achieving 200 MB/sec transfer speed is not typical. On the other hand, a transfer speed ~2 MB/sec (GridFS) is more the norm.
Conclusion
By no mean this is comprehensive, but you can decide which option is best for your need.
DDP is the simplest and sticks to the core Meteor principle but the data are more bulky, not compressible during transfer, not cachable. But this option may be good if you only need small files.
XHR coupled with file system is the 'traditional' way. Stable API, fast, 'streamable', compressible, cachable (ETag etc), but needs to be in a separate folder
XHR coupled with GridFS, you get the benefit of rep set, scalable, no touching file system dir, large files and many files if file system restricts the numbers, also cachable compressible. However, the API is unstable, you get errors in multiple writes, it's s..l..o..w..
Hopefully soon, meteor DDP can support gzip, caching etc and GridFS can be faster...
Hi just to add on to Option1 regarding viewing of the file. I did it without ejson.
<template name='tryUpload'>
<p>Choose file to upload</p>
<input name="upload" class='fileupload' type='file'>
</template>
Template.tryUpload.events({
'change .fileupload':function(event,template){
console.log('change & view');
var f = event.target.files[0];//assuming upload 1 file only
if(!f) return;
var r = new FileReader();
r.onload=function(event){
var buffer = new Uint8Array(r.result);//convert to binary
for (var i = 0, strLen = r.length; i < strLen; i++){
buffer[i] = r.charCodeAt(i);
}
var toString = String.fromCharCode.apply(null, buffer );
console.log(toString);
//Meteor.call('saveFiles',buffer);
}
r.readAsArrayBuffer(f);};
Trying to download a page blob of size 2 GB using java sdk and it fails with Storage exception because the file size downloaded does not match the actual file size
On multiple tries the same result is seen, although there is slight change in the downloaded file size. Setting timeout to maximum value also does not help.
Also, when I download the same vhd using the Azure portal, I see that the download completes but only partially. It is usually comparable to the one downloaded with SDK.
In the SDK code, I can see HTTpUrlConnection is being used. Could that be a problem ? The same code on a Windows machine has similar results but only the downloaded file is few more MB's in size but not complete.
Any thoughts on how to get it working ?
The code snippet used is
URI blobEndpoint = null;
String uriString = "http://" + "sorageaccount" + ".blob.core.windows.net";
blobEndpoint = new URI(uriString);
CloudBlobClient blobClient = new CloudBlobClient(blobEndpoint,
new StorageCredentialsAccountAndKey("abcd", "passed"));
CloudBlobContainer container = blobClient.getContainerReference(Constants.STORAGE_CONTAINER_NAME);
CloudPageBlob pageBlob = container.getPageBlobReference("http://abcd.blob.core.windows.net/sc/someimg.vhd");
System.out.println("Page Blob Name: " + pageBlob.getName());
OutputStream outStream = new FileOutputStream(new File("/Users/myself/Downloads/TestDownload.vhd"));
System.out.println("Starting download now ... ");
BlobRequestOptions options = new BlobRequestOptions();
options.setUseTransactionalContentMD5(true);
options.setStoreBlobContentMD5 (true); // Set full blob level MD5
options.setTimeoutIntervalInMs(Integer.MAX_VALUE);
options.setRetryPolicyFactory(new RetryLinearRetry());
pageBlob.download(outStream, null, options, null);
outStream.close();