Use mongodb BsonSerializer to serialize and deserialize data - mongodb

I have complex classes like this:
abstract class Animal { ... }
class Dog: Animal{ ... }
class Cat: Animal{ ... }
class Farm{
public List<Animal> Animals {get;set;}
...
}
My goal is to send objects from computer A to computer B
I was able to achieve my goal by using BinaryFormatter serialization. It enabled me to serialize complex classes like Animal in order to transfer objects from computer A to computer B. Serialization was very fast and I only had to worry about placing a serializable attribute on top of my classes. But now BinaryFormatter is obsolete and if you read on the internet future versions of dotnet may remove that.
As a result I have these options:
Use System.Text.Json
This approach does not work well with polymorphism. In other words I cannot deserialize an array of cats and dogs. So I will try to avoid it.
Use protobuf
I do not want to create protobuf map files for every class. I have over 40 classes this is a lot of work. Or maybe there is a converter that I am not aware of? But still how will the converter be smart enough to know that my array of animals can have cats and dogs?
Use Newtonsoft (json.net)
I could use this solution and build something like this: https://stackoverflow.com/a/19308474/637142 . Or even better serialize the objects with a type like this: https://stackoverflow.com/a/71398251/637142. So this will probably be my to go option.
Use MongoDB.Bson.Serialization.BsonSerializer Because I am dealing with a lot of complex objects we are using MongoDB. MongoDB is able to store a Farm object easily. My goal is to retrieve objects from the database in binary format and send that binary data to another computer and use BsonSerializer to deserialize them back to objects.
Have computer B connect to the database remotely. I cannot use this option because one of our requirements is to do everything through an API. For security reasons we are not allowed to connect remotely to the database.
I am hopping I can use step 4. It will be the most efficient because we are already using MongoDB. If we use step 3 which will work we are doing extra steps. We do not need the data in json format. Why not just sent it in binary and deserialize it once it is received by computer B? MongoDB.Driver is already doing this. I wish I knew how it does it.
This is what I have worked so far:
MongoClient m = new MongoClient("mongodb://localhost:27017");
var db = m.GetDatabase("TestDatabase");
var collection = db.GetCollection<BsonDocument>("Farms");
// I have 1s and 0s in here.
var binaryData = collection.Find("{}").ToBson();
// this is not readable
var t = System.Text.Encoding.UTF8.GetString(binaryData);
Console.WriteLine(t);
// how can I convert those 0s and 1s to a Farm object?

var collection = db.GetCollection<RawBsonDocument>(nameof(this.Calls));
var sw = new Stopwatch();
var sb = new StringBuilder();
sw.Start();
// get items
IEnumerable<RawBsonDocument>? objects = collection.Find("{}").ToList();
sb.Append("TimeToObtainFromDb: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var ms = new MemoryStream();
var largestSixe = 0;
// write data to memory stream for demo purposes. on real example I will write this to a tcpSocket
foreach (var item in objects)
{
var bsonType = item.BsonType;
// write object
var bytes = item.ToBson();
ushort sizeOfBytes = (ushort)bytes.Length;
if (bytes.Length > largestSixe)
largestSixe = bytes.Length;
var size = BitConverter.GetBytes(sizeOfBytes);
ms.Write(size);
ms.Write(bytes);
}
sb.Append("time to serialze into bson to memory: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
// now on the client side on computer B lets pretend we are deserializing the stream
ms.Position = 0;
var clones = new List<Call>();
byte[] sizeOfArray = new byte[2];
byte[] buffer = new byte[102400]; // make this large because if an document is larger than 102400 bytes it will fail!
while (true)
{
var i = ms.Read(sizeOfArray, 0, 2);
if (i < 1)
break;
var sizeOfBuffer = BitConverter.ToUInt16(sizeOfArray);
int position = 0;
while (position < sizeOfBuffer)
position = ms.Read(buffer, position, sizeOfBuffer - position);
//using var test = new RawBsonDocument(buffer);
using var test = new RawBsonDocumentWrapper(buffer , sizeOfBuffer);
var identityBson = test.ToBsonDocument();
var cc = BsonSerializer.Deserialize<Call>(identityBson);
clones.Add(cc);
}
sb.Append("time to deserialize from memory into clones: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var serializedjs = new List<string>();
foreach(var item in clones)
{
var foo = item.SerializeToJsStandards();
if (foo.Contains("jaja"))
throw new Exception();
serializedjs.Add(foo);
}
sb.Append("time to serialze into js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
foreach(var item in serializedjs)
{
try
{
var obj = item.DeserializeUsingJsStandards<Call>();
if (obj is null)
throw new Exception();
if (obj.IdAccount.Contains("jsfjklsdfl"))
throw new Exception();
}
catch(Exception ex)
{
Console.WriteLine(ex);
throw;
}
}
sb.Append("time to deserialize js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();

Related

how to make sure locks are released with ef core and postgres?

I have a console program that moves Data between two different servers (DatabaseA and DatabaseB).
Database B is a Postgres-Server.
It calls a lot of stored procedures and other raw queries.
I use ExecuteSqlRaw a lot.
I also use NpsqlBulk.EfCore.
The program uses the same context instance for DatabaseB during the whole run it takes to finish.
Somehow i get locks on some of my tables on DatabaseB that never get released.
This happens always on my table mytable_fromdatabase_import.
The code run on that is the following:
protected override void AddIdsNew()
{
var toAdd = IdsNotInDatabaseB();
var newObjectsToAdd = GetByIds(toAdd).Select(Converter.ConvertAToB);
DatabaseBContext.Database.ExecuteSqlRaw("truncate mytable_fromdatabase_import; ");
var uploader = new NpgsqlBulkUploader(DatabaseBContext);
uploader.Insert(newObjectsToAdd); // inserts data into mytable_fromdatabase_import
DatabaseBContext.Database.ExecuteSqlRaw("call insert_myTable_from_importTable();");
}
After i run it the whole table is not accessable annymore and when i query the locks on the server i can see there is a process holding it.
How can i make sure this process always closes and releases its locks on tables?
I thought ef-core would do that automaticaly.
-----------Edit-----------
I just wanted to add that this is not a temporary problem during the run of the console. When i run this code and it is finished my table is still locked and nothing can access it. My understanding was that the ef-core context would release everything after it is disposed (if by error or by being finished)
The problem had nothing to do with ef core but with a wrong configured backupscript. The program is running now with no changes to it and it works fine
For concrete task you need right tools. Probably you have locks when retrieve Ids and also when trying to do not load already imported records. These steps are slow!
I would suggest to use linq2db (disclaimer, I'm co-author of this library)
Create two projects with models from different databases:
Source.Model.csproj - install linq2db.SQLServer
Destination.Model.csproj - install linq2db.PostgreSQL
Follow instructions in T4 templates how to generate model from two databases. It is easy and you can ask questions on linq2db`s github site.
I'll post helper class which I've used for transferring tables on my previous project. It additionally uses library CodeJam for mapping, but in your project, for sure, you can use Automapper.
public class DataImporter
{
private readonly DataConnection _source;
private readonly DataConnection _destination;
public DataImporter(DataConnection source, DataConnection destination)
{
_source = source;
_destination = destination;
}
private long ImportDataPrepared<TSource, TDest>(IOrderedQueryable<TSource> source, Expression<Func<TSource, TDest>> projection) where TDest : class
{
var destination = _destination.GetTable<TDest>();
var tableName = destination.TableName;
var sourceCount = source.Count();
if (sourceCount == 0)
return 0;
var currentCount = destination.Count();
if (currentCount > sourceCount)
throw new Exception($"'{tableName}' what happened here?.");
if (currentCount >= sourceCount)
return 0;
IQueryable<TSource> sourceQuery = source;
if (currentCount > 0)
sourceQuery = sourceQuery.Skip(currentCount);
var projected = sourceQuery.Select(projection);
var copied =
_destination.BulkCopy(
new BulkCopyOptions
{
BulkCopyType = BulkCopyType.MultipleRows,
RowsCopiedCallback = (obj) => RowsCopiedCallback(obj, currentCount, sourceCount, tableName)
}, projected);
return copied.RowsCopied;
}
private void RowsCopiedCallback(BulkCopyRowsCopied obj, int currentRows, int totalRows, string tableName)
{
var percent = (currentRows + obj.RowsCopied) / (double)totalRows * 100;
Console.WriteLine($"Copied {percent:N2}% \tto {tableName}");
}
public class ImporterHelper<TSource>
{
private readonly DataImporter _improrter;
private readonly IOrderedQueryable<TSource> _sourceQuery;
public ImporterHelper(DataImporter improrter, IOrderedQueryable<TSource> sourceQuery)
{
_improrter = improrter;
_sourceQuery = sourceQuery;
}
public long To<TDest>() where TDest : class
{
var mapperBuilder = new MapperBuilder<TSource, TDest>();
return _improrter.ImportDataPrepared(_sourceQuery, mapperBuilder.GetMapper().GetMapperExpressionEx());
}
public long To<TDest>(Expression<Func<TSource, TDest>> projection) where TDest : class
{
return _improrter.ImportDataPrepared(_sourceQuery, projection);
}
}
public ImporterHelper<TSource> ImprortData<TSource>(IOrderedQueryable<TSource> source)
{
return new ImporterHelper<TSource>(this, source);
}
}
So begin transferring. Note that I have used OrderBy/ThenBy to specify Id order to do not import already transferred records - important order fields should be Unique Key combination. So this sample is reentrant and can be re-run again when connection is lost.
var sourceBuilder = new LinqToDbConnectionOptionsBuilder();
sourceBuilder.UseSqlServer(SourceConnectionString);
var destinationBuilder = new LinqToDbConnectionOptionsBuilder();
destinationBuilder.UsePostgreSQL(DestinationConnectionString);
using (var source = new DataConnection(sourceBuilder.Build()))
using (var destination = new DataConnection(destinationBuilder.Build()))
{
var dataImporter = new DataImporter(source, destination);
dataImporter.ImprortData(source.GetTable<Source.Model.FirstTable>()
.OrderBy(e => e.Id1)
.ThenBy(e => e.Id2))
.To<Dest.Model.FirstTable>();
dataImporter.ImprortData(source.GetTable<Source.Model.SecondTable>().OrderBy(e => e.Id))
.To<Dest.Model.SecondTable>();
}
For sure boring part with OrderBy can be generated automatically, but this will explode this already not a short answer.
Also play with BulkCopyOptions. Native Npgsql COPY may fail and Multi-Line variant should be used.

Concatenate multiple PDF/A with different conformance levels

Is it possible to concatenate a number of pdf/a (with possibly different conformance levels: some pdf/a-1b, some pdf/a-3b ecc) into a single pdfa ?
I was thinking that using the latest level (3-a or 3b) would be ok but I get errors when validating with VeraPDF:
Here is my code (where :
public static byte[] CreateConformantCopy(List<byte[]> sourcePdfs)
{
var version = PdfVersion.PDF_1_7;
var type = PdfAType.PDF_A_3B;
WriterProperties wp = new WriterProperties();
wp.UseSmartMode();
wp.SetPdfVersion(version.ToPdfVersion());
PdfOutputIntent oi = new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", Assembly.GetExecutingAssembly().GetManifestResourceStream("xxx.Resources.sRGB_CS_profile.icm"));
using (var mergedPdf = new MemoryStream())
{
var writer = new PdfWriter(mergedPdf, wp);
using (PdfADocument newDoc = new PdfADocument(writer, type.ToPdfAConformanceLevel(), oi, new DocumentProperties() { }))
{
Document document = new Document(newDoc, PageSize.A4.Rotate());
newDoc.SetTagged();
newDoc.GetCatalog().SetLang(new PdfString(Thread.CurrentThread.CurrentUICulture.Name));
newDoc.GetCatalog().SetViewerPreferences(
new PdfViewerPreferences()
.SetDisplayDocTitle(true)
.SetCenterWindow(true)
);
PdfMerger merger = new PdfMerger(newDoc);
for (int k = 0; k < sourcePdfs.Count; k++)
{
using (var inDoc = PdfHelper.GetDocument(sourcePdfs[k]))
{
var numberOfPages = inDoc.GetNumberOfPages();
merger.Merge(inDoc, 1, numberOfPages);
}
}
newDoc.Close();
}
return mergedPdf.ToArray();
}
}
PDF/A-1 and PDF/A-2 have several differences in the requirements. So, merging them together might not be possible. Looking on your validation errors, I think this is exactly the case. For example, the very first one is about XMP metadata. The PDF/A-2 is more strict here, and you get this error because your first file (which is probably a valid PDF/A-1) does not actually satisfy the PDF/A-2 rules.
What is possible however is to attach a PDF/A-1 document to PDF/A-2 one. This does not even require the use of PDF/A-3, which allows arbitrary attachments. The PDF/A-2 standard does allow attaching valid PDF/A-1 (as well as PDF/A-2 documents).

Entity Framework is too slow during mapping data up to 100k

I have min 100 000 data into a Job_Details table and I'm using Entity Framework to map the data.
This is the code:
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
var lstJob = dbContext.Job_Details.ToList();
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
I found to this line is taking about 2-3 min to execute
var lstJob = dbContext.Job_Details.ToList();
How can i solve this issue?
To outline the performance issues with your example: (see inline comments)
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Loads *ALL* entities into memory. This effectively takes all fields for all rows across from the database to your app server. (Even though you don't want it all)
var lstJob = dbContext.Job_Details.ToList();
// Filters from the data in memory.
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
// Maps the entity to a DTO and adds it to the return collection.
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
// Returns the DTOs.
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
First: pass your WHERE clause to EF to pass to the DB server rather than loading all entities into memory..
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Will pass the where expression to be DB server to be executed. Note: No .ToList() yet to leave this as IQueryable.
var jobs = dbContext.Job_Details..Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null));
Next, use SELECT to load your DTOs. Typically these won't contain as much data as the main entity, and so long as you're working with IQueryable you can load related data as needed. Again this will be sent to the DB Server so you cannot use functions like "MapBEJobForSearchObj" here because the DB server does not know this function. You can SELECT a simple DTO object, or an anonymous type to pass to a dynamic mapper.
var dtos = jobs.Select(ie => new JobBO
{
JobId = ie.JobId,
// ... populate remaining DTO fields here.
}).ToList();
getJobResponse.Jobs = dtos;
return getJobResponse;
}
Moving the .ToList() to the end will materialize the data into your JobBO DTOs/ViewModels, pulling just enough data from the server to populate the desired rows and with the desired fields.
In cases where you may have a large amount of data, you should also consider supporting server-side pagination where you pass a page # and page size, then utilize a .Skip() + .Take() to load a single page of entries at a time.

Testing With A Fake DbContext and Autofixture and Moq

SO follow this example
example and how make a fake DBContex For test my test using just this work fine
[Test]
public void CiudadIndex()
{
var ciudades = new FakeDbSet<Ciudad>
{
new Ciudad {CiudadId = 1, EmpresaId =1, Descripcion ="Santa Cruz", FechaProceso = DateTime.Now, MarcaBaja = null, UsuarioId = 1},
new Ciudad {CiudadId = 2, EmpresaId =1, Descripcion ="La Paz", FechaProceso = DateTime.Now, MarcaBaja = null, UsuarioId = 1},
new Ciudad {CiudadId = 3, EmpresaId =1, Descripcion ="Cochabamba", FechaProceso = DateTime.Now, MarcaBaja = null, UsuarioId = 1}
};
//// Create mock unit of work
var mockData = new Mock<IContext>();
mockData.Setup(m => m.Ciudades).Returns(ciudades);
// Setup controller
var homeController = new CiudadController(mockData.Object);
// Invoke
var viewResult = homeController.Index();
var ciudades_de_la_vista = (IEnumerable<Ciudad>)viewResult.Model;
// Assert..
}
Iam tryign now to use Autofixture-Moq
to create "ciudades" but I cant. I try this
var fixture = new Fixture();
var ciudades = fixture.Build<FakeDbSet<Ciudad>>().CreateMany<FakeDbSet<Ciudad>>();
var mockData = new Mock<IContext>();
mockData.Setup(m => m.Ciudades).Returns(ciudades);
I get this error
Cant convert System.Collections.Generic.IEnumerable(FakeDbSet(Ciudad)) to System.Data.Entity.IDbSet(Ciudad)
cant put "<>" so I replace with "()" in the error message
Implementation of IContext and FakeDbSet
public interface IContext
{
IDbSet<Ciudad> Ciudades { get; }
}
public class FakeDbSet<T> : IDbSet<T> where T : class
how can make this to work?
A minor point... In stuff like:
var ciudades_fixture = fixture.Build<Ciudad>().CreateMany<Ciudad>();
The second type arg is unnecessary and should be:
var ciudades_fixture = fixture.Build<Ciudad>().CreateMany();
I really understand why you need a FakeDbSet and the article is a bit TL;DR... In general, I try to avoid faking and mucking with ORM bits and instead dealing with interfaces returning POCOs to the max degree possible.
That aside... The reason the normal syntax for initialising the list works is that there is an Add (and IEnumerable) in DBFixture. AutoFixture doesn't have a story for that pattern directly (after all it is compiler syntactic sugar and not particularly amenable to reflection or in line with any other conventions) but you can use AddManyTo as long as there is an ICollection in play. Luckily, within the impl of FakeDbSet as in the article, the following gives us an in:-
public ObservableCollection<T> Local
{
get { return _data; }
}
As ObservableCollection<T> derives from ICollection<T>, you should be able to:
var ciudades = new FakeDbSet<Cuidad>();
fixture.AddManyTo(ciudades.Local);
var mockData = new Mock<IContext>();
mockData.Setup(m => m.Ciudades).Returns(ciudades);
It's possible to wire up a customization to make this prettier, but at least you have a way to manage it. The other option is to have something implement ICollection (or add a prop with a setter taking IEnumerable<T> and have AF generate the parent object, causing said collection to be filled in.
Long superseded side note: In your initial question, you effectively have:
fixture.Build<FakeDbSet<Ciudad>>().CreateMany()
The problem becomes clearer then - you are asking AF to generate Many FakeDbSet<Ciudad>s, which is not what you want.
I haven't used AutoFixture in a while, but shouldn't it be:
var ciudades = new FakeDbSet<Ciudad>();
fixture.AddManyTo(ciudades);
for the moment I end doing this, I will keep reading about how use automoq, cause I'm new in this
var fixture = new Fixture();
var ciudades_fixture = fixture.Build<Ciudad>().CreateMany<Ciudad>();
var ciudades = new FakeDbSet<Ciudad>();
foreach (var item in ciudades_fixture)
{
ciudades.Add(item);
}
var mockData = new Mock<IContext>();
fixture.Create<Mock<IContext>>();
mockData.Setup(r => r.Ciudades).Returns(ciudades);

A better way of representing File Attachment into a list(c#3.0)

I have written
List<Attachment> lstAttachment = new List<Attachment>();
//Check if any error file is present in which case it needs to be send
if (new FileInfo(Path.Combine(errorFolder, errorFileName)).Exists)
{
Attachment unprocessedFile = new Attachment(Path.Combine(errorFolder, errorFileName));
lstAttachment.Add(unprocessedFile);
}
//Check if any processed file is present in which case it needs to be send
if (new FileInfo(Path.Combine(outputFolder, outputFileName)).Exists)
{
Attachment processedFile = new Attachment(Path.Combine(outputFolder, outputFileName));
lstAttachment.Add(processedFile);
}
Working fine and is giving the expected output.
Basically I am attaching the file to the list based on whether the file is present or not.
I am looking for any other elegant solution than the one I have written.
Reason: Want to learn differnt ways of representing the same program.
I am using C#3.0
Thanks.
Is it looks better?
...
var lstAttachment = new List<Attachment>();
string errorPath = Path.Combine(errorFolder, errorFileName);
string outputPath = Path.Combine(outputFolder, outputFileName);
AddAttachmentToCollection(lstAttachment, errorPath);
AddAttachmentToCollection(lstAttachment, outputPath);
...
public static void AddAttachmentToCollection(ICollection<Attachment> collection, string filePath)
{
if (File.Exists(filePath))
{
var attachment = new Attachment(filePath);
collection.Add(attachment);
}
}
How about a little LINQ?
var filenames = new List<string>()
{
Path.Combine(errorFolder, errorFilename),
Path.Combine(outputFolder, outputFilename)
};
var attachments = filenames.Where(f => File.Exists(f))
.Select(f => new Attachment(f));