MongoDB 3.2 C# driver version 2.2.3.3 Gridfs Download large files more than 2gb - mongodb

I am uploading files using the following code:
using (var s = File.OpenRead(#"C:\2gbDataTest.zip"))
{
var t = Task.Run<ObjectId>(() =>
{
return fs.UploadFromStreamAsync("2gbDataTest.zip", s);
});
return t.Result;
}
//works for the files below 2gb
var t1 = fs.DownloadAsBytesAsync(id);
Task.WaitAll(t1);
var bytes = t1.Result;
I am getting error
I am new to MongoDb and C#, can any one please show me how to download files greater than 2GB in size?

You are hitting the limit in terms of the size a byte array (kept in memory) download can be, so your only choice is to use a Stream instead like you are doing when you upload, something like (with a valid destination):
IGridFSBucket fs;
ObjectId id;
FileStream destination;
await fs.DownloadToStreamAsync(id, destination);

//Just writing complete code for others, This will work ;
//Thanks to "Adam Comerford"
var fs = new GridFSBucket(database);
using (var newFs = new FileStream(filePathToDownload, FileMode.Create))
{
//id is file objectId
var t1 = fs.DownloadToStreamAsync(id, newFs);
Task.WaitAll(t1);
newFs.Flush();
newFs.Close();
}

Related

Use mongodb BsonSerializer to serialize and deserialize data

I have complex classes like this:
abstract class Animal { ... }
class Dog: Animal{ ... }
class Cat: Animal{ ... }
class Farm{
public List<Animal> Animals {get;set;}
...
}
My goal is to send objects from computer A to computer B
I was able to achieve my goal by using BinaryFormatter serialization. It enabled me to serialize complex classes like Animal in order to transfer objects from computer A to computer B. Serialization was very fast and I only had to worry about placing a serializable attribute on top of my classes. But now BinaryFormatter is obsolete and if you read on the internet future versions of dotnet may remove that.
As a result I have these options:
Use System.Text.Json
This approach does not work well with polymorphism. In other words I cannot deserialize an array of cats and dogs. So I will try to avoid it.
Use protobuf
I do not want to create protobuf map files for every class. I have over 40 classes this is a lot of work. Or maybe there is a converter that I am not aware of? But still how will the converter be smart enough to know that my array of animals can have cats and dogs?
Use Newtonsoft (json.net)
I could use this solution and build something like this: https://stackoverflow.com/a/19308474/637142 . Or even better serialize the objects with a type like this: https://stackoverflow.com/a/71398251/637142. So this will probably be my to go option.
Use MongoDB.Bson.Serialization.BsonSerializer Because I am dealing with a lot of complex objects we are using MongoDB. MongoDB is able to store a Farm object easily. My goal is to retrieve objects from the database in binary format and send that binary data to another computer and use BsonSerializer to deserialize them back to objects.
Have computer B connect to the database remotely. I cannot use this option because one of our requirements is to do everything through an API. For security reasons we are not allowed to connect remotely to the database.
I am hopping I can use step 4. It will be the most efficient because we are already using MongoDB. If we use step 3 which will work we are doing extra steps. We do not need the data in json format. Why not just sent it in binary and deserialize it once it is received by computer B? MongoDB.Driver is already doing this. I wish I knew how it does it.
This is what I have worked so far:
MongoClient m = new MongoClient("mongodb://localhost:27017");
var db = m.GetDatabase("TestDatabase");
var collection = db.GetCollection<BsonDocument>("Farms");
// I have 1s and 0s in here.
var binaryData = collection.Find("{}").ToBson();
// this is not readable
var t = System.Text.Encoding.UTF8.GetString(binaryData);
Console.WriteLine(t);
// how can I convert those 0s and 1s to a Farm object?
var collection = db.GetCollection<RawBsonDocument>(nameof(this.Calls));
var sw = new Stopwatch();
var sb = new StringBuilder();
sw.Start();
// get items
IEnumerable<RawBsonDocument>? objects = collection.Find("{}").ToList();
sb.Append("TimeToObtainFromDb: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var ms = new MemoryStream();
var largestSixe = 0;
// write data to memory stream for demo purposes. on real example I will write this to a tcpSocket
foreach (var item in objects)
{
var bsonType = item.BsonType;
// write object
var bytes = item.ToBson();
ushort sizeOfBytes = (ushort)bytes.Length;
if (bytes.Length > largestSixe)
largestSixe = bytes.Length;
var size = BitConverter.GetBytes(sizeOfBytes);
ms.Write(size);
ms.Write(bytes);
}
sb.Append("time to serialze into bson to memory: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
// now on the client side on computer B lets pretend we are deserializing the stream
ms.Position = 0;
var clones = new List<Call>();
byte[] sizeOfArray = new byte[2];
byte[] buffer = new byte[102400]; // make this large because if an document is larger than 102400 bytes it will fail!
while (true)
{
var i = ms.Read(sizeOfArray, 0, 2);
if (i < 1)
break;
var sizeOfBuffer = BitConverter.ToUInt16(sizeOfArray);
int position = 0;
while (position < sizeOfBuffer)
position = ms.Read(buffer, position, sizeOfBuffer - position);
//using var test = new RawBsonDocument(buffer);
using var test = new RawBsonDocumentWrapper(buffer , sizeOfBuffer);
var identityBson = test.ToBsonDocument();
var cc = BsonSerializer.Deserialize<Call>(identityBson);
clones.Add(cc);
}
sb.Append("time to deserialize from memory into clones: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var serializedjs = new List<string>();
foreach(var item in clones)
{
var foo = item.SerializeToJsStandards();
if (foo.Contains("jaja"))
throw new Exception();
serializedjs.Add(foo);
}
sb.Append("time to serialze into js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
foreach(var item in serializedjs)
{
try
{
var obj = item.DeserializeUsingJsStandards<Call>();
if (obj is null)
throw new Exception();
if (obj.IdAccount.Contains("jsfjklsdfl"))
throw new Exception();
}
catch(Exception ex)
{
Console.WriteLine(ex);
throw;
}
}
sb.Append("time to deserialize js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();

Create and download Word file in Blazor

I am trying to create a Word file and download the created file in clients browser.
The creation part seems to work fine and I can open the file manually from its Folder.
But the downloaded file in browser does not open correctly and produces an error
"The file is corrupt and cannot be opened"
I am using the code from here
Microsoft instructions for downloading a file in Blazor
My code seems like this
private async Task CreateAndDownloadWordFile()
{
var destination = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
var fileName = destination + "\\test12.docx";
//SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Create(destination, SpreadsheetDocumentType.Workbook);
using (WordprocessingDocument doc = WordprocessingDocument.Create
(fileName, DocumentFormat.OpenXml.WordprocessingDocumentType.Document))
{
// Add a main document part.
MainDocumentPart mainPart = doc.AddMainDocumentPart();
// Create the document structure and add some text.
mainPart.Document = new Document();
Body body = mainPart.Document.AppendChild(new Body());
Paragraph para = body.AppendChild(new Paragraph());
Run run = para.AppendChild(new Run());
// String msg contains the text, "Hello, Word!"
run.AppendChild(new Text("New text in document"));
}
var fileStream = GetFileStream();
using var streamRef = new DotNetStreamReference(stream: fileStream);
await JS.InvokeVoidAsync("downloadFileFromStream", fileName, streamRef);
}
private Stream GetFileStream()
{
var randomBinaryData = new byte[50 * 1024];
var fileStream = new MemoryStream(randomBinaryData);
return fileStream;
}
And I use this Javascript code
async function downloadFileFromStream(fileName, contentStreamReference) {
const arrayBuffer = await contentStreamReference.arrayBuffer();
const blob = new Blob([arrayBuffer]);
const url = URL.createObjectURL(blob);
triggerFileDownload(fileName, url);
URL.revokeObjectURL(url);
}
function triggerFileDownload(fileName, url) {
const anchorElement = document.createElement('a');
anchorElement.href = url;
anchorElement.download = fileName ?? '';
anchorElement.click();
anchorElement.remove();
}
Any ideas?
But the downloaded file in browser does not open correctly
That is probably because you
First create a Word document
And then download var randomBinaryData = new byte[50 * 1024];
the downloaded file in browser
Check those. Are they exactly 50 * 1024 bytes ?
--
Also, you shouldn't pass the full C:\... path to the download funtion.
var fileStream = File.OpenRead(filename);
using var streamRef = new DotNetStreamReference(stream: fileStream);
//await JS.InvokeVoidAsync("downloadFileFromStream", fileName, streamRef);
await JS.InvokeVoidAsync("downloadFileFromStream", "suggestedName", streamRef);

Word found unreadable content in xxx.docx after split a docx using openxml

I have a full.docx which includes two math questions, the docx embeds some pictures and MathType equation (oleobject), I split the doc according to this, get two files (first.docx, second.docx) , first.docx works fine, the second.docx, however, pops up a warning dialog when I try to open it:
"Word found unreadable content in second.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes."
After click "Yes", the doc can be opened, the content is also correct, I want to know what is wrong with the second.docx? I have checked it with "Open xml sdk 2.5 productivity tool", but found no reason. Very appreciated for any help. Thanks.
The three files have been uploaded to here.
Show some code:
byte[] templateBytes = System.IO.File.ReadAllBytes(TEMPLATE_YANG_FILE);
using (MemoryStream templateStream = new MemoryStream())
{
templateStream.Write(templateBytes, 0, (int)templateBytes.Length);
string guidStr = Guid.NewGuid().ToString();
using (WordprocessingDocument document = WordprocessingDocument.Open(templateStream, true))
{
document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
MainDocumentPart mainPart = document.MainDocumentPart;
mainPart.Document = new Document();
Body bd = new Body();
foreach (DocumentFormat.OpenXml.Wordprocessing.Paragraph clonedParagrph in lst)
{
bd.AppendChild<DocumentFormat.OpenXml.Wordprocessing.Paragraph>(clonedParagrph);
clonedParagrph.Descendants<Blip>().ToList().ForEach(blip =>
{
var newRelation = document.CopyImage(blip.Embed, this.wordDocument);
blip.Embed = newRelation;
});
clonedParagrph.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().ToList().ForEach(imageData =>
{
var newRelation = document.CopyImage(imageData.RelationshipId, this.wordDocument);
imageData.RelationshipId = newRelation;
});
}
mainPart.Document.Body = bd;
mainPart.Document.Save();
}
string subDocFile = System.IO.Path.Combine(this.outDir, guidStr + ".docx");
this.subWordFileLst.Add(subDocFile);
File.WriteAllBytes(subDocFile, templateStream.ToArray());
}
the lst contains Paragraph cloned from original docx using:
(DocumentFormat.OpenXml.Wordprocessing.Paragraph)p.Clone();
Using productivity tool, found oleobjectx.bin not copied, so I add below code after copy Blip and ImageData:
clonedParagrph.Descendants<OleObject>().ToList().ForEach(ole =>
{
var newRelation = document.CopyOleObject(ole.Id, this.wordDocument);
ole.Id = newRelation;
});
Solved the issue.

Concatenate multiple PDF/A with different conformance levels

Is it possible to concatenate a number of pdf/a (with possibly different conformance levels: some pdf/a-1b, some pdf/a-3b ecc) into a single pdfa ?
I was thinking that using the latest level (3-a or 3b) would be ok but I get errors when validating with VeraPDF:
Here is my code (where :
public static byte[] CreateConformantCopy(List<byte[]> sourcePdfs)
{
var version = PdfVersion.PDF_1_7;
var type = PdfAType.PDF_A_3B;
WriterProperties wp = new WriterProperties();
wp.UseSmartMode();
wp.SetPdfVersion(version.ToPdfVersion());
PdfOutputIntent oi = new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", Assembly.GetExecutingAssembly().GetManifestResourceStream("xxx.Resources.sRGB_CS_profile.icm"));
using (var mergedPdf = new MemoryStream())
{
var writer = new PdfWriter(mergedPdf, wp);
using (PdfADocument newDoc = new PdfADocument(writer, type.ToPdfAConformanceLevel(), oi, new DocumentProperties() { }))
{
Document document = new Document(newDoc, PageSize.A4.Rotate());
newDoc.SetTagged();
newDoc.GetCatalog().SetLang(new PdfString(Thread.CurrentThread.CurrentUICulture.Name));
newDoc.GetCatalog().SetViewerPreferences(
new PdfViewerPreferences()
.SetDisplayDocTitle(true)
.SetCenterWindow(true)
);
PdfMerger merger = new PdfMerger(newDoc);
for (int k = 0; k < sourcePdfs.Count; k++)
{
using (var inDoc = PdfHelper.GetDocument(sourcePdfs[k]))
{
var numberOfPages = inDoc.GetNumberOfPages();
merger.Merge(inDoc, 1, numberOfPages);
}
}
newDoc.Close();
}
return mergedPdf.ToArray();
}
}
PDF/A-1 and PDF/A-2 have several differences in the requirements. So, merging them together might not be possible. Looking on your validation errors, I think this is exactly the case. For example, the very first one is about XMP metadata. The PDF/A-2 is more strict here, and you get this error because your first file (which is probably a valid PDF/A-1) does not actually satisfy the PDF/A-2 rules.
What is possible however is to attach a PDF/A-1 document to PDF/A-2 one. This does not even require the use of PDF/A-3, which allows arbitrary attachments. The PDF/A-2 standard does allow attaching valid PDF/A-1 (as well as PDF/A-2 documents).

Persisting a Modified Database in Chrome APP using chrome.storage API

I am using SQL.js for SQLite in my chrome app , I am loading external db file to perform query , now i want to save my changes to local storage to make it persistent , it is already define here-
https://github.com/kripken/sql.js/wiki/Persisting-a-Modified-Database
i am using the same way as defined in the article-
function toBinString (arr) {
var uarr = new Uint8Array(arr);
var strings = [], chunksize = 0xffff;
// There is a maximum stack size. We cannot call String.fromCharCode with as many arguments as we want
for (var i=0; i*chunksize < uarr.length; i++){
strings.push(String.fromCharCode.apply(null, uarr.subarray(i*chunksize, (i+1)*chunksize)));
}
return strings.join('');
}
function toBinArray (str) {
var l = str.length,
arr = new Uint8Array(l);
for (var i=0; i<l; i++) arr[i] = str.charCodeAt(i);
return arr;
}
save data to storage -
var data =toBinString(db.export());
chrome.storage.local.set({"localDB":data});
and to get data from storage-
chrome.storage.local.get('localDB', function(res) {
var data = toBinArray(res.localDB);
//sample example usage
db = new SQL.Database(data);
var result = db.exec("SELECT * FROM user");
});
Now when i make a query , i am getting this error -
Error: file is encrypted or is not a database
is there any differnces for storing values in chrome.storage and localStorage ? because its working fine using localStorage, find the working example here-
http://kripken.github.io/sql.js/examples/persistent.html
as document sugggested here -
https://developer.chrome.com/apps/storage
we don't need to use stringify and parse in chrome.storage API unlike localStorage, we can directly saves object and array.
when i try to save result return from db.export without any conversion, i am getting this error-
Cannot serialize value to JSON
So please help me guys what will be the approach to save db export in chrome's storage, is there anything i am doing wrong?