A: greendao method loadAll() to Large return - greendao

I use loadAll method, my list real on db is 10, but the return list size is too large, around 100 more
entityDao = daoSession.getEntityDao();
List<Entity> listEntities = entityDao.loadAll();
listEntities.Size() = too large

Related

Spark mapPartitionsToPai execution time

In the current project I am working, we are using spark as computation engine for one of workflows.
Workflow is as follows
We have product catalog being served from several pincodes. User logged in from any particular pin code should be able to see least available cost from all available serving pincodes.
Least cost is calculated as follows
product price+dist(pincode1,pincode2) -
pincode2 being user pincode and pincode1 being source pincode. Apply the above formula for all source pincodes and identify the least available one.
My Core spark logic looks like this
pincodes.javaRDD().cartesian(pincodePrices.javaRDD()).mapPartitionsToPair(new PairFlatMapFunction<Iterator<Tuple2<Row,Row>>, Row, Row>() {
#Override
public Iterator<Tuple2<Row, Row>> call(Iterator<Tuple2<Row, Row>> t)
throws Exception {
MongoClient mongoclient = MongoClients.create("mongodb://localhost");
MongoDatabase database = mongoclient.getDatabase("catalogue");
MongoCollection<Document>pincodeCollection = database.getCollection("pincodedistances");
List<Tuple2<Row,Row>> list =new LinkedList<>();
while (t.hasNext()) {
Tuple2<Row, Row>tuple2 = t.next();
Row pinRow = tuple2._1;
Integer srcPincode = pinRow.getAs("pincode");
Row pricesRow = tuple2._2;
Row pricesRow1 = (Row)pricesRow.getAs("leastPrice");
Integer buyingPrice = pricesRow1.getAs("buyingPrice");
Integer quantity = pricesRow1.getAs("quantity");
Integer destPincode = pricesRow1.getAs("pincodeNum");
if(buyingPrice!=null && quantity>0) {
BasicDBObject dbObject = new BasicDBObject();
dbObject.append("sourcePincode", srcPincode);
dbObject.append("destPincode", destPincode);
//System.out.println(srcPincode+","+destPincode);
Number distance;
if(srcPincode.intValue()==destPincode.intValue()) {
distance = 0;
}else {
Document document = pincodeCollection.find(dbObject).first();
distance = document.get("distance", Number.class);
}
double margin = 0.02;
Long finalPrice = Math.round(buyingPrice+(margin*buyingPrice)+distance.doubleValue());
//Row finalPriceRow = RowFactory.create(finalPrice,quantity);
StructType structType = new StructType();
structType = structType.add("finalPrice", DataTypes.LongType, false);
structType = structType.add("quantity", DataTypes.LongType, false);
Object values[] = {finalPrice,quantity};
Row finalPriceRow = new GenericRowWithSchema(values, structType);
list.add(new Tuple2<Row, Row>(pinRow, finalPriceRow));
}
}
mongoclient.close();
return list.iterator();
}
}).reduceByKey((priceRow1,priceRow2)->{
Long finalPrice1 = priceRow1.getAs("finalPrice");
Long finalPrice2 = priceRow2.getAs("finalPrice");
if(finalPrice1.longValue()<finalPrice2.longValue())return priceRow1;
return priceRow2;
}).collect().forEach(tuple2->{
// Business logic to push computed price to mongodb
}
I am able to get the answer correctly, however mapPartitionsToPair is taking a bit of time(~22 secs for just 12k records).
After browsing internet I found that mapPartitions performs better than mapPartitionsToPair, but I am not sure how to emit (key,value) from mapPartitions and then sort it.
Is there any alternative for above transformations or any better approach is highly appreciated.
Spark Cluster: Standalone(1 executor, 6 cores)

Use mongodb BsonSerializer to serialize and deserialize data

I have complex classes like this:
abstract class Animal { ... }
class Dog: Animal{ ... }
class Cat: Animal{ ... }
class Farm{
public List<Animal> Animals {get;set;}
...
}
My goal is to send objects from computer A to computer B
I was able to achieve my goal by using BinaryFormatter serialization. It enabled me to serialize complex classes like Animal in order to transfer objects from computer A to computer B. Serialization was very fast and I only had to worry about placing a serializable attribute on top of my classes. But now BinaryFormatter is obsolete and if you read on the internet future versions of dotnet may remove that.
As a result I have these options:
Use System.Text.Json
This approach does not work well with polymorphism. In other words I cannot deserialize an array of cats and dogs. So I will try to avoid it.
Use protobuf
I do not want to create protobuf map files for every class. I have over 40 classes this is a lot of work. Or maybe there is a converter that I am not aware of? But still how will the converter be smart enough to know that my array of animals can have cats and dogs?
Use Newtonsoft (json.net)
I could use this solution and build something like this: https://stackoverflow.com/a/19308474/637142 . Or even better serialize the objects with a type like this: https://stackoverflow.com/a/71398251/637142. So this will probably be my to go option.
Use MongoDB.Bson.Serialization.BsonSerializer Because I am dealing with a lot of complex objects we are using MongoDB. MongoDB is able to store a Farm object easily. My goal is to retrieve objects from the database in binary format and send that binary data to another computer and use BsonSerializer to deserialize them back to objects.
Have computer B connect to the database remotely. I cannot use this option because one of our requirements is to do everything through an API. For security reasons we are not allowed to connect remotely to the database.
I am hopping I can use step 4. It will be the most efficient because we are already using MongoDB. If we use step 3 which will work we are doing extra steps. We do not need the data in json format. Why not just sent it in binary and deserialize it once it is received by computer B? MongoDB.Driver is already doing this. I wish I knew how it does it.
This is what I have worked so far:
MongoClient m = new MongoClient("mongodb://localhost:27017");
var db = m.GetDatabase("TestDatabase");
var collection = db.GetCollection<BsonDocument>("Farms");
// I have 1s and 0s in here.
var binaryData = collection.Find("{}").ToBson();
// this is not readable
var t = System.Text.Encoding.UTF8.GetString(binaryData);
Console.WriteLine(t);
// how can I convert those 0s and 1s to a Farm object?
var collection = db.GetCollection<RawBsonDocument>(nameof(this.Calls));
var sw = new Stopwatch();
var sb = new StringBuilder();
sw.Start();
// get items
IEnumerable<RawBsonDocument>? objects = collection.Find("{}").ToList();
sb.Append("TimeToObtainFromDb: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var ms = new MemoryStream();
var largestSixe = 0;
// write data to memory stream for demo purposes. on real example I will write this to a tcpSocket
foreach (var item in objects)
{
var bsonType = item.BsonType;
// write object
var bytes = item.ToBson();
ushort sizeOfBytes = (ushort)bytes.Length;
if (bytes.Length > largestSixe)
largestSixe = bytes.Length;
var size = BitConverter.GetBytes(sizeOfBytes);
ms.Write(size);
ms.Write(bytes);
}
sb.Append("time to serialze into bson to memory: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
// now on the client side on computer B lets pretend we are deserializing the stream
ms.Position = 0;
var clones = new List<Call>();
byte[] sizeOfArray = new byte[2];
byte[] buffer = new byte[102400]; // make this large because if an document is larger than 102400 bytes it will fail!
while (true)
{
var i = ms.Read(sizeOfArray, 0, 2);
if (i < 1)
break;
var sizeOfBuffer = BitConverter.ToUInt16(sizeOfArray);
int position = 0;
while (position < sizeOfBuffer)
position = ms.Read(buffer, position, sizeOfBuffer - position);
//using var test = new RawBsonDocument(buffer);
using var test = new RawBsonDocumentWrapper(buffer , sizeOfBuffer);
var identityBson = test.ToBsonDocument();
var cc = BsonSerializer.Deserialize<Call>(identityBson);
clones.Add(cc);
}
sb.Append("time to deserialize from memory into clones: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
var serializedjs = new List<string>();
foreach(var item in clones)
{
var foo = item.SerializeToJsStandards();
if (foo.Contains("jaja"))
throw new Exception();
serializedjs.Add(foo);
}
sb.Append("time to serialze into js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();
foreach(var item in serializedjs)
{
try
{
var obj = item.DeserializeUsingJsStandards<Call>();
if (obj is null)
throw new Exception();
if (obj.IdAccount.Contains("jsfjklsdfl"))
throw new Exception();
}
catch(Exception ex)
{
Console.WriteLine(ex);
throw;
}
}
sb.Append("time to deserialize js: ");
sb.AppendLine(sw.Elapsed.TotalMilliseconds.ToString());
sw.Restart();

Entity Framework is too slow during mapping data up to 100k

I have min 100 000 data into a Job_Details table and I'm using Entity Framework to map the data.
This is the code:
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
var lstJob = dbContext.Job_Details.ToList();
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
I found to this line is taking about 2-3 min to execute
var lstJob = dbContext.Job_Details.ToList();
How can i solve this issue?
To outline the performance issues with your example: (see inline comments)
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Loads *ALL* entities into memory. This effectively takes all fields for all rows across from the database to your app server. (Even though you don't want it all)
var lstJob = dbContext.Job_Details.ToList();
// Filters from the data in memory.
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
// Maps the entity to a DTO and adds it to the return collection.
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
// Returns the DTOs.
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
First: pass your WHERE clause to EF to pass to the DB server rather than loading all entities into memory..
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Will pass the where expression to be DB server to be executed. Note: No .ToList() yet to leave this as IQueryable.
var jobs = dbContext.Job_Details..Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null));
Next, use SELECT to load your DTOs. Typically these won't contain as much data as the main entity, and so long as you're working with IQueryable you can load related data as needed. Again this will be sent to the DB Server so you cannot use functions like "MapBEJobForSearchObj" here because the DB server does not know this function. You can SELECT a simple DTO object, or an anonymous type to pass to a dynamic mapper.
var dtos = jobs.Select(ie => new JobBO
{
JobId = ie.JobId,
// ... populate remaining DTO fields here.
}).ToList();
getJobResponse.Jobs = dtos;
return getJobResponse;
}
Moving the .ToList() to the end will materialize the data into your JobBO DTOs/ViewModels, pulling just enough data from the server to populate the desired rows and with the desired fields.
In cases where you may have a large amount of data, you should also consider supporting server-side pagination where you pass a page # and page size, then utilize a .Skip() + .Take() to load a single page of entries at a time.

MongoDB - combining multiple Numeric Range queries (C# driver)

*Mongo Newbie here
I have a document containing several hundred numeric fields which I need to query in combination.
var collection = _myDB.GetCollection<MyDocument>("collection");
IMongoQuery mongoQuery; // = Query.GT("field", value1).LT(value2);
foreach (MyObject queryObj in Queries)
{
// I have several hundred fields such as Height, that are in queryObj
// how do I build a "boolean" query in C#
mongoQuery = Query.GTE("Height", Convert.ToInt16(queryObj.Height * lowerbound));
}
I have several hundred fields such as Height (e.g. Width, Area, Perimeter etc.), that are in queryObj how do I build a "boolean" query in C# that combines range queries for each field in conjunction.
I have tried to use the example Query.GT("field", value1).LT(value2);, however the compiler does not accept the LT(Value) construct. In any event I need to be able to build a complex boolean query by looping through each of the numeric field values.
Thanks for helping a newbie out.
EDIT 3:
Ok, it looks like you already have code in place to build the complicated query. In that case, you just needed to fix the compiler issue. Am assuming you want to do the following (x > 20 && x < 40) && (y > 30 && y < 50) ...
var collection = _myDB.GetCollection<MyDocument>("collection");
var queries = new List<IMongoQuery>();
foreach (MyObject queryObj in Queries)
{
//I have several hundred fields such as Height, that are in queryObj
//how do I build a "boolean" query in C#
var lowerBoundQuery = Query.GTE("Height", Convert.ToInt16(queryObj.Height * lowerbound));
var upperBoundQuery = Query.LTE("Height", Convert.ToInt16(queryObj.Height * upperbound));
var query = Query.And(lowerBoundQuery, upperBoundQuery);
queries.Add(query);
}
var finalQuery = Query.And(queries);
/*
if you want to instead do an OR,
var finalQuery = Query.Or(queries);
*/
Original Answer.
var list = _myDb.GetCollection<MyDoc>("CollectionName")
.AsQueryable<MyDoc>()
.Where(x =>
x.Height > 20 &&
x.Height < 40)
.ToList();
I have tried to use the example Query.GT("field", value1).LT(value2);,
however the compiler does not accept the LT(Value) construct.
You can query MongoDB using linq, if you are using the official C# driver. That ought to solve the compiler issue I think.
The more interesting question I have in mind is, how are you going to construct that complicated boolean query?
One option is to dynamically build an Expression and then pass that to the Where
My colleague is using the following code for something similar...
public static IQueryable<T> Where<T>(this IQueryable<T> query,
string column, object value, WhereOperation operation)
{
if (string.IsNullOrEmpty(column))
return query;
ParameterExpression parameter = Expression.Parameter(query.ElementType, "p");
MemberExpression memberAccess = null;
foreach (var property in column.Split('.'))
memberAccess = MemberExpression.Property
(memberAccess ?? (parameter as Expression), property);
//change param value type
//necessary to getting bool from string
ConstantExpression filter = Expression.Constant
(
Convert.ChangeType(value, memberAccess.Type)
);
//switch operation
Expression condition = null;
LambdaExpression lambda = null;
switch (operation)
{
//equal ==
case WhereOperation.Equal:
condition = Expression.Equal(memberAccess, filter);
lambda = Expression.Lambda(condition, parameter);
break;
//not equal !=
case WhereOperation.NotEqual:
condition = Expression.NotEqual(memberAccess, filter);
lambda = Expression.Lambda(condition, parameter);
break;
//string.Contains()
case WhereOperation.Contains:
condition = Expression.Call(memberAccess,
typeof(string).GetMethod("Contains"),
Expression.Constant(value));
lambda = Expression.Lambda(condition, parameter);
break;
}
MethodCallExpression result = Expression.Call(
typeof(Queryable), "Where",
new[] { query.ElementType },
query.Expression,
lambda);
return query.Provider.CreateQuery<T>(result);
}
public enum WhereOperation
{
Equal,
NotEqual,
Contains
}
Currently it only supports == && !=, but it shouldn't be that difficult to implement >= or <= ...
You could get some hints from the Expression class: http://msdn.microsoft.com/en-us/library/system.linq.expressions.expression.aspx
EDIT:
var props = ["Height", "Weight", "Age"];
var query = _myDb.GetCollection<MyDoc>("CName").AsQueryable<MyDoc>();
foreach (var prop in props)
{
query = query.Where(prop, GetLowerLimit(queryObj, prop), WhereOperation.Between, GetUpperLimit(queryObj, prop));
}
// the above query when iterated over, will result in a where clause that joins each individual `prop\condition` with an `AND`.
// The code above will not compile. The `Where` function I wrote doesnt accept 4 parameters. You will need to implement the logic for that yourself. Though it ought to be straight forward I think...
EDIT 2:
If you don't want to use linq, you can still use Mongo Query. You will just need to craft your queries using the Query.And() and Query.Or().
// I think this might be deprecated. Please refer the release notes for the C# driver version 1.5.0
Query.And(Query.GTE("Salary", new BsonDouble(20)), Query.LTE("Salary", new BsonDouble(40)), Query.GTE("Height", new BsonDouble(20)), Query.LTE("Height", new BsonDouble(40)))
// strongly typed version
new QueryBuilder<Employee>().And(Query<Employee>.GTE(x => x.Salary, 40), Query<Employee>.LTE(x => x.Salary, 60), Query<Employee>.GTE(x => x.HourlyRateToClients, 40), Query<Employee>.LTE(x => x.HourlyRateToClients, 60))

Merging two ArrayLists with 2 different objects(which are the instance of the same class)

I just want to merge two ArrayLists and have the contents in one ArrayList. Both lists contain the object which is an instance of the same class.
The object reference themselves are different though. However I am getting this unexpected size for the combined arraylist. I use JAVA 1.4
ArrayList a1 = new ArrayList();
ArrayList b1 = new ArrayList();
ClassA cls1A = new ClassA();
ClassA cls1B = new ClassA();
a1.add(cls1A);
b1.add(cls1B);
a1.size() = 100;
b1.size() = 50;
//merge the two arraylist contents into one
//1st method and its result
a1.addAll(b1);
//Expected Result
a1.size = 150
//but
//Obtained result
a1.size = 6789
//2nd method and its result
Collections.copy(a1, b1)
//Expected result
a1.size() = 150
//but
//Obtained result
a1.size = 6789
How can I have an ArrayList which displays the combined size??
I came up with the following solution. First get the size of both the arraylists, then increase the capacity(by using ensureCapacity method of ArrayList class) of the arraylist to which the two needs to be merged. Then add the objects of the 2nd arraylist from the last index of the first arraylist.
a1.add(cls1A);
b1.add(cls1B);
int p = a1.size();
int w = b1.size();
int j = (p + w);
a1.ensureCapacity(j);
for(int r = 0; r <w; r++)
{
ClassA cls1B = new ClassA();
Object obj = b1.get(r);
cls1B = (ClassA)obj;
a1.add(p, cls1B);
p++;
}