mybatis cursor cannot solve memory issue - mybatis

I am using spring 2 and mybatis cursor to minimize the impact out of memory when selecting and process > 100k records at once, but I am not sure I am doing the right thing.
Mapper:
#SelectProvider(type = provider.class, method = "retrieveTx")
Cursor<TxModel> retrieveTx();
DAO:
#Transactional(readOnly = true)
public List<TxModel> retrieveTx() {
Iterator<TxModel> iterator = mapper.retrieveTx().iterator();
List<TxModel> actualList = new ArrayList<>();
iterator.forEachRemaining(actualList::add);
return actualList;
}
Using it :
List<TxModel> txs = dao.retrieveTx()
for (TxModel tx: txs) {
....
}
Anyone can suggest what I am doing is right? Because I felt that this method doesn't solve my problem when the db records is at 6 digits.

Related

Spark mapPartitionsToPai execution time

In the current project I am working, we are using spark as computation engine for one of workflows.
Workflow is as follows
We have product catalog being served from several pincodes. User logged in from any particular pin code should be able to see least available cost from all available serving pincodes.
Least cost is calculated as follows
product price+dist(pincode1,pincode2) -
pincode2 being user pincode and pincode1 being source pincode. Apply the above formula for all source pincodes and identify the least available one.
My Core spark logic looks like this
pincodes.javaRDD().cartesian(pincodePrices.javaRDD()).mapPartitionsToPair(new PairFlatMapFunction<Iterator<Tuple2<Row,Row>>, Row, Row>() {
#Override
public Iterator<Tuple2<Row, Row>> call(Iterator<Tuple2<Row, Row>> t)
throws Exception {
MongoClient mongoclient = MongoClients.create("mongodb://localhost");
MongoDatabase database = mongoclient.getDatabase("catalogue");
MongoCollection<Document>pincodeCollection = database.getCollection("pincodedistances");
List<Tuple2<Row,Row>> list =new LinkedList<>();
while (t.hasNext()) {
Tuple2<Row, Row>tuple2 = t.next();
Row pinRow = tuple2._1;
Integer srcPincode = pinRow.getAs("pincode");
Row pricesRow = tuple2._2;
Row pricesRow1 = (Row)pricesRow.getAs("leastPrice");
Integer buyingPrice = pricesRow1.getAs("buyingPrice");
Integer quantity = pricesRow1.getAs("quantity");
Integer destPincode = pricesRow1.getAs("pincodeNum");
if(buyingPrice!=null && quantity>0) {
BasicDBObject dbObject = new BasicDBObject();
dbObject.append("sourcePincode", srcPincode);
dbObject.append("destPincode", destPincode);
//System.out.println(srcPincode+","+destPincode);
Number distance;
if(srcPincode.intValue()==destPincode.intValue()) {
distance = 0;
}else {
Document document = pincodeCollection.find(dbObject).first();
distance = document.get("distance", Number.class);
}
double margin = 0.02;
Long finalPrice = Math.round(buyingPrice+(margin*buyingPrice)+distance.doubleValue());
//Row finalPriceRow = RowFactory.create(finalPrice,quantity);
StructType structType = new StructType();
structType = structType.add("finalPrice", DataTypes.LongType, false);
structType = structType.add("quantity", DataTypes.LongType, false);
Object values[] = {finalPrice,quantity};
Row finalPriceRow = new GenericRowWithSchema(values, structType);
list.add(new Tuple2<Row, Row>(pinRow, finalPriceRow));
}
}
mongoclient.close();
return list.iterator();
}
}).reduceByKey((priceRow1,priceRow2)->{
Long finalPrice1 = priceRow1.getAs("finalPrice");
Long finalPrice2 = priceRow2.getAs("finalPrice");
if(finalPrice1.longValue()<finalPrice2.longValue())return priceRow1;
return priceRow2;
}).collect().forEach(tuple2->{
// Business logic to push computed price to mongodb
}
I am able to get the answer correctly, however mapPartitionsToPair is taking a bit of time(~22 secs for just 12k records).
After browsing internet I found that mapPartitions performs better than mapPartitionsToPair, but I am not sure how to emit (key,value) from mapPartitions and then sort it.
Is there any alternative for above transformations or any better approach is highly appreciated.
Spark Cluster: Standalone(1 executor, 6 cores)

Can some one help me on below apex code i want to update this code as per governor limits

public static void updatecasefields(List<Case> lstcase) {
//List<Case> lstcase = new list<case>();
ID devRecordTypeId = Schema.SObjectType.Case.getRecordTypeInfosByDeveloperName().get('CRM_CSR_Case').getRecordTypeId();
for (Case cs: lstcase) {
if(cs.ID != null && cs.RecordTypeId == devRecordTypeId) {
}
List<CRM_CasePick__c> Casp = [SELECT Id, CRM_Carrier_Name__c,CRM_LOB__c, CRM_SLA_Turnaround_Time__c,CRM_Category__c, CRM_Issue_Sub_Type__c,CRM_Issue_Type__c,CRM_Turnaround_Time_Days__c FROM CRM_CasePick__c WHERE CRM_Carrier_Name__c = :cs.GiDP_CarrierName__c AND CRM_Category__c = :cs.CRM_Category__c AND CRM_Issue_Type__c = :cs.CRM_Issue_Type__c AND CRM_Issue_Sub_Type__c = :cs.CRM_Issue_Sub_Type__c AND CRM_LOB__c = :cs.CRM_Line_of_Business__c];
for(CRM_CasePick__c CP: Casp) {
cs.CRM_Turnaround_Time_Days__c = cp.CRM_Turnaround_Time_Days__c;
cs.CRM_SLA_Turnaround_time__c = cp.CRM_SLA_Turnaround_Time__c;
}
}
}
Remove the SOQL query from the for loop - best practice is to never run a query within a loop.
Right now it is running that query for every value of the initial list. If the list is over 100 records, it will exceed the governor limit.

Entity Framework is too slow during mapping data up to 100k

I have min 100 000 data into a Job_Details table and I'm using Entity Framework to map the data.
This is the code:
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
var lstJob = dbContext.Job_Details.ToList();
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
I found to this line is taking about 2-3 min to execute
var lstJob = dbContext.Job_Details.ToList();
How can i solve this issue?
To outline the performance issues with your example: (see inline comments)
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Loads *ALL* entities into memory. This effectively takes all fields for all rows across from the database to your app server. (Even though you don't want it all)
var lstJob = dbContext.Job_Details.ToList();
// Filters from the data in memory.
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
// Maps the entity to a DTO and adds it to the return collection.
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
// Returns the DTOs.
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
First: pass your WHERE clause to EF to pass to the DB server rather than loading all entities into memory..
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Will pass the where expression to be DB server to be executed. Note: No .ToList() yet to leave this as IQueryable.
var jobs = dbContext.Job_Details..Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null));
Next, use SELECT to load your DTOs. Typically these won't contain as much data as the main entity, and so long as you're working with IQueryable you can load related data as needed. Again this will be sent to the DB Server so you cannot use functions like "MapBEJobForSearchObj" here because the DB server does not know this function. You can SELECT a simple DTO object, or an anonymous type to pass to a dynamic mapper.
var dtos = jobs.Select(ie => new JobBO
{
JobId = ie.JobId,
// ... populate remaining DTO fields here.
}).ToList();
getJobResponse.Jobs = dtos;
return getJobResponse;
}
Moving the .ToList() to the end will materialize the data into your JobBO DTOs/ViewModels, pulling just enough data from the server to populate the desired rows and with the desired fields.
In cases where you may have a large amount of data, you should also consider supporting server-side pagination where you pass a page # and page size, then utilize a .Skip() + .Take() to load a single page of entries at a time.

Rules are skipped in KnowledgeBase

we are using drools 5.5 final version.we have thousands of objects and two rules so we are getting objects in chunk(100 size) wise and creating knowledge base for every chunk and firing rules.since creation of Knowledge Base is expensive we are getting performance issue.So we are creating Knowledge Base once and using that knowledge base for every chunk in this case after 4 to 5 chunks got executed from 6th chunk on wards rules are not getting fired though match is there .please suggest what can be done.
sample code
public static KnowledgeBase getPackageKnowledgeBase(PackageDescr pkg){
KnowledgeBuilderConfiguration builderConf = KnowledgeBuilderFactory.newKnowledgeBuilderConfiguration();
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder(builderConf);
kbuilder.add(ResourceFactory.newDescrResource(pkg), ResourceType.DESCR);
Collection<KnowledgePackage> kpkgs = kbuilder.getKnowledgePackages();
if(kbuilder.hasErrors()){
LOGGER.error(kbuilder.getErrors());
}
KnowledgePackage knowledgePackage = kpkgs.iterator().next();
KnowledgeBase kbase= KnowledgeBaseFactory.newKnowledgeBase();
kbase.addKnowledgePackages(Collections.singletonList(knowledgePackage));
return kbase;
}
using method
chunkSize=100;
int start = 0;
Count = -1;
KnowledgeBase kbase=getPackageKnowledgeBase(pkgdscr)//pkgdscr contails all rules got from db
while(Count!=0 && Count <= chunkSize ){
LOGGER.debug("Deduction not getting "+mappedCustomerId);
Objects inputObjects = handler.getPaginatedInputObjects(start);
Count = inputObjects.size();
start=start+chunkSize;
StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession();
for(Object object:inputObjects){
ksession.insert(object);
}
ksession.fireAllRules();
ksession.dispose();
}
Below is the essential part of your loop. Looks to me that this loop terminates as soon as Count exceeds chunkSize (100). You sure this never happens?
while(Count!=0 && Count <= chunkSize ){
Objects inputObjects = ...;
Count = inputObjects.size();
...
StatefulKnowledgeSession ksession = ...;
for(Object object:inputObjects){
ksession.insert(object);
}
ksession.fireAllRules();
...
}

IronRuby performance issue while using Variables

Here is code of very simple expression evaluator using IronRuby
public class BasicRubyExpressionEvaluator
{
ScriptEngine engine;
ScriptScope scope;
public Exception LastException
{
get; set;
}
private static readonly Dictionary<string, ScriptSource> parserCache = new Dictionary<string, ScriptSource>();
public BasicRubyExpressionEvaluator()
{
engine = Ruby.CreateEngine();
scope = engine.CreateScope();
}
public object Evaluate(string expression, DataRow context)
{
ScriptSource source;
parserCache.TryGetValue(expression, out source);
if (source == null)
{
source = engine.CreateScriptSourceFromString(expression, SourceCodeKind.SingleStatement);
parserCache.Add(expression, source);
}
var result = source.Execute(scope);
return result;
}
public void SetVariable(string variableName, object value)
{
scope.SetVariable(variableName, value);
}
}
and here is problem.
var evaluator = new BasicRubyExpressionEvaluator();
evaluator.SetVariable("a", 10);
evaluator.SetVariable("b", 1 );
evaluator.Evaluate("a+b+2", null);
vs
var evaluator = new BasicRubyExpressionEvaluator();
evaluator.Evaluate("10+1+2", null);
First Is 25 times slower than second. Any suggestions? String.Replace is not a solution for me.
I do not think the performance you are seeing is due to variable setting; the first execution of IronRuby in a program is always going to be slower than the second, regardless of what you're doing, since most of the compiler isn't loaded in until code is actually run (for startup performance reasons). Please try that example again, maybe running each version of your code in a loop, and you'll see the performance is roughly equivalent; the variable-version does have some overhead of method-dispatch to get the variables, but that should be negligible if you run it enough.
Also, in your hosting code, how come you are holding onto ScriptScopes in a dictionary? I would hold onto CompiledCode (result of engine.CreateScriptSourceFromString(...).Compile()) instead -- as that will help a lot more in repeat runs.
you can of course first build the string something like
evaluator.Evaluate(string.format("a={0}; b={1}; a + b + 2", 10, 1))
Or you can make it a method
if instead of your script you return a method then you should be able to use it like a regular C# Func object.
var script = #"
def self.addition(a, b)
a + b + 2
end
"
engine.ExecuteScript(script);
var = func = scope.GetVariable<Func<object,object,object>>("addition");
func(10,1)
This is probably not a working snippet but it shows the general idea.