Bulk removal of Edges on Titan 1.0 - titan

I have a long list of edge IDs (about 12 billion) that I am willing to remove from my Titan graph (which is hosted on an HBase backend).
How can I do it quickly and efficiently?
I tried removing the edges via Gremlin, but that is too slow for that amount of edges.
Is it possible to directly perform Delete commands on HBase? How can I do it? (How do I assemble the Key to delete?)
Thanks

After two days of research, I came up with a solution.
The main purpose - given a very large collection of string edgeIds, implementing a logics which removes them from the graph -
The implementation has to support a removal of billions of edges, so it must be efficient in memory and time.
Direct usage of Titan is disqualified, since Titan performs a lot of unnecessary instantiations which are redundant -- generally, we don't want to load the edges, we just want to remove them from HBase.
/**
* Deletes the given edge IDs, by splitting it to chunks of 100,000
* #param edgeIds Collection of edge IDs to delete
* #throws IOException
*/
public static void deleteEdges(Iterator<String> edgeIds) throws IOException {
IDManager idManager = new IDManager(NumberUtil.getPowerOf2(GraphDatabaseConfiguration.CLUSTER_MAX_PARTITIONS.getDefaultValue()));
byte[] columnFamilyName = "e".getBytes(); // 'e' is your edgestore column-family name
long deletionTimestamp = System.currentTimeMillis();
int chunkSize = 100000; // Will contact HBase only once per 100,000 records two deletes (=> 50,000 edges, since each edge is removed one time as IN and one time as OUT)
org.apache.hadoop.conf.Configuration config = new org.apache.hadoop.conf.Configuration();
config.set("hbase.zookeeper.quorum", "YOUR-ZOOKEEPER-HOSTNAME");
config.set("hbase.table", "YOUR-HBASE-TABLE");
List<Delete> deletions = Lists.newArrayListWithCapacity(chunkSize);
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(config.get("hbase.table")));
Iterators.partition(edgeIds, chunkSize)
.forEachRemaining(edgeIdsChunk -> deleteEdgesChunk(edgeIdsChunk, deletions, table, idManager,
columnFamilyName, deletionTimestamp));
}
/**
* Given a collection of edge IDs, and a list of Delete object (that is cleared on entrance),
* creates two Delete objects for each edge (one for IN and one for OUT),
* and deletes it via the given Table instance
*/
public static void deleteEdgesChunk(List<String> edgeIds, List<Delete> deletions, Table table, IDManager idManager,
byte[] columnFamilyName, long deletionTimestamp) {
deletions.clear();
for (String edgeId : edgeIds)
{
RelationIdentifier identifier = RelationIdentifier.parse(edgeId);
deletions.add(createEdgeDelete(idManager, columnFamilyName, deletionTimestamp, identifier.getRelationId(),
identifier.getTypeId(), identifier.getInVertexId(), identifier.getOutVertexId(),
IDHandler.DirectionID.EDGE_IN_DIR);
deletions.add(createEdgeDelete(idManager, columnFamilyName, deletionTimestamp, identifier.getRelationId(),
identifier.getTypeId(), identifier.getOutVertexId(), identifier.getInVertexId(),
IDHandler.DirectionID.EDGE_OUT_DIR));
}
try {
table.delete(deletions);
}
catch (IOException e)
{
logger.error("Failed to delete a chunk due to inner exception: " + e);
}
}
/**
* Creates an HBase Delete object for a specific edge
* #return HBase Delete object to be used against HBase
*/
private static Delete createEdgeDelete(IDManager idManager, byte[] columnFamilyName, long deletionTimestamp,
long relationId, long typeId, long vertexId, long otherVertexId,
IDHandler.DirectionID directionID) {
byte[] vertexKey = idManager.getKey(vertexId).getBytes(0, 8); // Size of a long
byte[] edgeQualifier = makeQualifier(relationId, otherVertexId, directionID, typeId);
return new Delete(vertexKey)
.addColumn(columnFamilyName, edgeQualifier, deletionTimestamp);
}
/**
* Cell Qualifier for a specific edge
*/
private static byte[] makeQualifier(long relationId, long otherVertexId, IDHandler.DirectionID directionID, long typeId) {
WriteBuffer out = new WriteByteBuffer(32); // Default length of array is 32, feel free to increase
IDHandler.writeRelationType(out, typeId, directionID, false);
VariableLong.writePositiveBackward(out, otherVertexId);
VariableLong.writePositiveBackward(out, relationId);
return out.getStaticBuffer().getBytes(0, out.getPosition());
}
Keep in mind that I do not consider System Types and so -- I assume that the given edge IDs are user-edges.
Using this implementation I was able to remove 20 million edges in about 2 minutes.

Related

Spark mapPartitionsToPai execution time

In the current project I am working, we are using spark as computation engine for one of workflows.
Workflow is as follows
We have product catalog being served from several pincodes. User logged in from any particular pin code should be able to see least available cost from all available serving pincodes.
Least cost is calculated as follows
product price+dist(pincode1,pincode2) -
pincode2 being user pincode and pincode1 being source pincode. Apply the above formula for all source pincodes and identify the least available one.
My Core spark logic looks like this
pincodes.javaRDD().cartesian(pincodePrices.javaRDD()).mapPartitionsToPair(new PairFlatMapFunction<Iterator<Tuple2<Row,Row>>, Row, Row>() {
#Override
public Iterator<Tuple2<Row, Row>> call(Iterator<Tuple2<Row, Row>> t)
throws Exception {
MongoClient mongoclient = MongoClients.create("mongodb://localhost");
MongoDatabase database = mongoclient.getDatabase("catalogue");
MongoCollection<Document>pincodeCollection = database.getCollection("pincodedistances");
List<Tuple2<Row,Row>> list =new LinkedList<>();
while (t.hasNext()) {
Tuple2<Row, Row>tuple2 = t.next();
Row pinRow = tuple2._1;
Integer srcPincode = pinRow.getAs("pincode");
Row pricesRow = tuple2._2;
Row pricesRow1 = (Row)pricesRow.getAs("leastPrice");
Integer buyingPrice = pricesRow1.getAs("buyingPrice");
Integer quantity = pricesRow1.getAs("quantity");
Integer destPincode = pricesRow1.getAs("pincodeNum");
if(buyingPrice!=null && quantity>0) {
BasicDBObject dbObject = new BasicDBObject();
dbObject.append("sourcePincode", srcPincode);
dbObject.append("destPincode", destPincode);
//System.out.println(srcPincode+","+destPincode);
Number distance;
if(srcPincode.intValue()==destPincode.intValue()) {
distance = 0;
}else {
Document document = pincodeCollection.find(dbObject).first();
distance = document.get("distance", Number.class);
}
double margin = 0.02;
Long finalPrice = Math.round(buyingPrice+(margin*buyingPrice)+distance.doubleValue());
//Row finalPriceRow = RowFactory.create(finalPrice,quantity);
StructType structType = new StructType();
structType = structType.add("finalPrice", DataTypes.LongType, false);
structType = structType.add("quantity", DataTypes.LongType, false);
Object values[] = {finalPrice,quantity};
Row finalPriceRow = new GenericRowWithSchema(values, structType);
list.add(new Tuple2<Row, Row>(pinRow, finalPriceRow));
}
}
mongoclient.close();
return list.iterator();
}
}).reduceByKey((priceRow1,priceRow2)->{
Long finalPrice1 = priceRow1.getAs("finalPrice");
Long finalPrice2 = priceRow2.getAs("finalPrice");
if(finalPrice1.longValue()<finalPrice2.longValue())return priceRow1;
return priceRow2;
}).collect().forEach(tuple2->{
// Business logic to push computed price to mongodb
}
I am able to get the answer correctly, however mapPartitionsToPair is taking a bit of time(~22 secs for just 12k records).
After browsing internet I found that mapPartitions performs better than mapPartitionsToPair, but I am not sure how to emit (key,value) from mapPartitions and then sort it.
Is there any alternative for above transformations or any better approach is highly appreciated.
Spark Cluster: Standalone(1 executor, 6 cores)

how to make sure locks are released with ef core and postgres?

I have a console program that moves Data between two different servers (DatabaseA and DatabaseB).
Database B is a Postgres-Server.
It calls a lot of stored procedures and other raw queries.
I use ExecuteSqlRaw a lot.
I also use NpsqlBulk.EfCore.
The program uses the same context instance for DatabaseB during the whole run it takes to finish.
Somehow i get locks on some of my tables on DatabaseB that never get released.
This happens always on my table mytable_fromdatabase_import.
The code run on that is the following:
protected override void AddIdsNew()
{
var toAdd = IdsNotInDatabaseB();
var newObjectsToAdd = GetByIds(toAdd).Select(Converter.ConvertAToB);
DatabaseBContext.Database.ExecuteSqlRaw("truncate mytable_fromdatabase_import; ");
var uploader = new NpgsqlBulkUploader(DatabaseBContext);
uploader.Insert(newObjectsToAdd); // inserts data into mytable_fromdatabase_import
DatabaseBContext.Database.ExecuteSqlRaw("call insert_myTable_from_importTable();");
}
After i run it the whole table is not accessable annymore and when i query the locks on the server i can see there is a process holding it.
How can i make sure this process always closes and releases its locks on tables?
I thought ef-core would do that automaticaly.
-----------Edit-----------
I just wanted to add that this is not a temporary problem during the run of the console. When i run this code and it is finished my table is still locked and nothing can access it. My understanding was that the ef-core context would release everything after it is disposed (if by error or by being finished)
The problem had nothing to do with ef core but with a wrong configured backupscript. The program is running now with no changes to it and it works fine
For concrete task you need right tools. Probably you have locks when retrieve Ids and also when trying to do not load already imported records. These steps are slow!
I would suggest to use linq2db (disclaimer, I'm co-author of this library)
Create two projects with models from different databases:
Source.Model.csproj - install linq2db.SQLServer
Destination.Model.csproj - install linq2db.PostgreSQL
Follow instructions in T4 templates how to generate model from two databases. It is easy and you can ask questions on linq2db`s github site.
I'll post helper class which I've used for transferring tables on my previous project. It additionally uses library CodeJam for mapping, but in your project, for sure, you can use Automapper.
public class DataImporter
{
private readonly DataConnection _source;
private readonly DataConnection _destination;
public DataImporter(DataConnection source, DataConnection destination)
{
_source = source;
_destination = destination;
}
private long ImportDataPrepared<TSource, TDest>(IOrderedQueryable<TSource> source, Expression<Func<TSource, TDest>> projection) where TDest : class
{
var destination = _destination.GetTable<TDest>();
var tableName = destination.TableName;
var sourceCount = source.Count();
if (sourceCount == 0)
return 0;
var currentCount = destination.Count();
if (currentCount > sourceCount)
throw new Exception($"'{tableName}' what happened here?.");
if (currentCount >= sourceCount)
return 0;
IQueryable<TSource> sourceQuery = source;
if (currentCount > 0)
sourceQuery = sourceQuery.Skip(currentCount);
var projected = sourceQuery.Select(projection);
var copied =
_destination.BulkCopy(
new BulkCopyOptions
{
BulkCopyType = BulkCopyType.MultipleRows,
RowsCopiedCallback = (obj) => RowsCopiedCallback(obj, currentCount, sourceCount, tableName)
}, projected);
return copied.RowsCopied;
}
private void RowsCopiedCallback(BulkCopyRowsCopied obj, int currentRows, int totalRows, string tableName)
{
var percent = (currentRows + obj.RowsCopied) / (double)totalRows * 100;
Console.WriteLine($"Copied {percent:N2}% \tto {tableName}");
}
public class ImporterHelper<TSource>
{
private readonly DataImporter _improrter;
private readonly IOrderedQueryable<TSource> _sourceQuery;
public ImporterHelper(DataImporter improrter, IOrderedQueryable<TSource> sourceQuery)
{
_improrter = improrter;
_sourceQuery = sourceQuery;
}
public long To<TDest>() where TDest : class
{
var mapperBuilder = new MapperBuilder<TSource, TDest>();
return _improrter.ImportDataPrepared(_sourceQuery, mapperBuilder.GetMapper().GetMapperExpressionEx());
}
public long To<TDest>(Expression<Func<TSource, TDest>> projection) where TDest : class
{
return _improrter.ImportDataPrepared(_sourceQuery, projection);
}
}
public ImporterHelper<TSource> ImprortData<TSource>(IOrderedQueryable<TSource> source)
{
return new ImporterHelper<TSource>(this, source);
}
}
So begin transferring. Note that I have used OrderBy/ThenBy to specify Id order to do not import already transferred records - important order fields should be Unique Key combination. So this sample is reentrant and can be re-run again when connection is lost.
var sourceBuilder = new LinqToDbConnectionOptionsBuilder();
sourceBuilder.UseSqlServer(SourceConnectionString);
var destinationBuilder = new LinqToDbConnectionOptionsBuilder();
destinationBuilder.UsePostgreSQL(DestinationConnectionString);
using (var source = new DataConnection(sourceBuilder.Build()))
using (var destination = new DataConnection(destinationBuilder.Build()))
{
var dataImporter = new DataImporter(source, destination);
dataImporter.ImprortData(source.GetTable<Source.Model.FirstTable>()
.OrderBy(e => e.Id1)
.ThenBy(e => e.Id2))
.To<Dest.Model.FirstTable>();
dataImporter.ImprortData(source.GetTable<Source.Model.SecondTable>().OrderBy(e => e.Id))
.To<Dest.Model.SecondTable>();
}
For sure boring part with OrderBy can be generated automatically, but this will explode this already not a short answer.
Also play with BulkCopyOptions. Native Npgsql COPY may fail and Multi-Line variant should be used.

How to calculate mean of distributed data?

How I can calculate the arithmetic mean of a large vector(series) in distributed computing where I partition the data on multiple nodes. I do not want to use map reduce paradigm. Is there any distributed algorithm to efficiently compute the mean besides the trivial computation of individual sum on each node and then bringing the result at master node and dividing with the size of the vector(series).
distributed average consensus is an alternative.
The problem with the trivial approach of map-reduce with a master is that if you have a vast set of data, in essence to make everything dependent on each other, it could take a very long time to calculate the data, by which time the information is very out of date, and therefore wrong, unless you lock the entire dataset - impractical for a massive set of distributed data. Using distributed average consensus (the same methods work for alternative algorithms to Mean), you get a more up to date, better guess at the current value of the Mean without locking the data, and in real time.
Here is a link to a paper on it, but it's math heavy :
http://web.stanford.edu/~boyd/papers/pdf/lms_consensus.pdf
You can google for many papers on it.
The general concept is like this: say on each node you have a socket listener. You evaluate your local sum and average, then publish it to the other nodes. Each node listens for the other nodes, and receives their sum and averages on a timescale that makes sense. You can then evaluate a good guess at the total average by (sumForAllNodes(storedAverage[node] * storedCount[node]) / (sumForAllNodes(storedCount[node])). If you have a truly large dataset, you could just listen for new values as they are stored in the node, and amend the local count and average, then publish them.
If even this is taking too long, you could average over a random subset of the data in each node.
Here is some c# code that gives you an idea (uses fleck to run on more versions of windows than windows-10-only microsoft websockets implementation). Run this on two nodes, one with
<appSettings>
<add key="thisNodeName" value="UK" />
</appSettings>
in the app.config, and use "EU-North" in the other. Here is some sample code. The two instances exchange means using websockets. You just need to add your back end enumeration of the database.
using Fleck;
namespace WebSocketServer
{
class Program
{
static List<IWebSocketConnection> _allSockets;
static Dictionary<string,decimal> _allMeans;
static Dictionary<string,decimal> _allCounts;
private static decimal _localMean;
private static decimal _localCount;
private static decimal _localAggregate_count;
private static decimal _localAggregate_average;
static void Main(string[] args)
{
_allSockets = new List<IWebSocketConnection>();
_allMeans = new Dictionary<string, decimal>();
_allCounts = new Dictionary<string, decimal>();
var serverAddresses = new Dictionary<string,string>();
//serverAddresses.Add("USA-WestCoast", "ws://127.0.0.1:58951");
//serverAddresses.Add("USA-EastCoast", "ws://127.0.0.1:58952");
serverAddresses.Add("UK", "ws://127.0.0.1:58953");
serverAddresses.Add("EU-North", "ws://127.0.0.1:58954");
//serverAddresses.Add("EU-South", "ws://127.0.0.1:58955");
foreach (var serverAddress in serverAddresses)
{
_allMeans.Add(serverAddress.Key, 0m);
_allCounts.Add(serverAddress.Key, 0m);
}
var thisNodeName = ConfigurationSettings.AppSettings["thisNodeName"]; //for example "UK"
var serverSocketAddress = serverAddresses.First(x=>x.Key==thisNodeName);
serverAddresses.Remove(thisNodeName);
var websocketServer = new Fleck.WebSocketServer(serverSocketAddress.Value);
websocketServer.Start(socket =>
{
socket.OnOpen = () =>
{
Console.WriteLine("Open!");
_allSockets.Add(socket);
};
socket.OnClose = () =>
{
Console.WriteLine("Close!");
_allSockets.Remove(socket);
};
socket.OnMessage = message =>
{
Console.WriteLine(message + " received");
var parameters = message.Split('~');
var remoteHost = parameters[0];
var remoteMean = decimal.Parse(parameters[1]);
var remoteCount = decimal.Parse(parameters[2]);
_allMeans[remoteHost] = remoteMean;
_allCounts[remoteHost] = remoteCount;
};
});
while (true)
{
//evaluate my local average and count
Random rand = new Random(DateTime.Now.Millisecond);
_localMean = 234.00m + (rand.Next(0, 100) - 50)/10.0m;
_localCount = 222m + rand.Next(0, 100);
//evaluate my local aggregate average using means and counts sent from all other nodes
//could publish aggregate averages to other nodes, if you wanted to monitor disagreement between nodes
var total_mean_times_count = 0m;
var total_count = 0m;
foreach (var server in serverAddresses)
{
total_mean_times_count += _allCounts[server.Key]*_allMeans[server.Key];
total_count += _allCounts[server.Key];
}
//add on local mean and count which were removed from the server list earlier, so won't be processed
total_mean_times_count += (_localMean * _localCount);
total_count = total_count + _localCount;
_localAggregate_average = (total_mean_times_count/total_count);
_localAggregate_count = total_count;
Console.WriteLine("local aggregate average = {0}", _localAggregate_average);
System.Threading.Thread.Sleep(10000);
foreach (var serverAddress in serverAddresses)
{
using (var wscli = new ClientWebSocket())
{
var tokSrc = new CancellationTokenSource();
using (var task = wscli.ConnectAsync(new Uri(serverAddress.Value), tokSrc.Token))
{
task.Wait();
}
using (var task = wscli.SendAsync(new ArraySegment<byte>(Encoding.UTF8.GetBytes(thisNodeName+"~"+_localMean + "~"+_localCount)),
WebSocketMessageType.Text,
false,
tokSrc.Token
))
{
task.Wait();
}
}
}
}
}
}
}
Don't forget to add static lock or separate activity by synchronising at given times. (not shown for simplicity)
There are two simple approaches you can use.
One is, as you correctly noted, to calculate the sum on every node and then combine the sums and divide by the total amount of data:
avg = (sum1+sum2+sum3)/(cnt1+cnt2+cnt3)
Another possibility is to calculate the average on every node and then use weighted average:
avg = (avg1*cnt1 + avg2*cnt2 + avg3*cnt3) / (cnt1+cnt2+cnt3)
= avg1*cnt1/(cnt1+cnt2+cnt3) + avg2*cnt2/(cnt1+cnt2+cnt3) + avg3*cnt3/(cnt1+cnt2+cnt3)
I don't see anything wrong with these trivial ways and am wondering why you would want to use a different approach.

Load the graph from the titan db for a specified depth in a single or efficitent query

We are using titan db to store the graph infomation. we have cassandra + es as backend storage and index. We are trying to load the graph data to represent the graph in the webui.
This is the approach i am following.
public JSONObject getGraph(long vertexId, final int depth) throws Exception {
JSONObject json = new JSONObject();
JSONArray vertices = new JSONArray();
JSONArray edges = new JSONArray();
final int currentDepth = 0;
TitanGraph graph = GraphFactory.getInstance().getGraph();
TitanTransaction tx = graph.newTransaction();
try {
GraphTraversalSource g = tx.traversal();
Vertex parent = graphDao.getVertex(vertexId);
loadGraph(g, parent, currentDepth + 1, depth, vertices, edges);
json.put("vertices", vertices);
json.put("edges", edges);
return json;
} catch (Throwable e) {
log.error(e.getMessage(), e);
if (tx != null) {
tx.rollback();
}
throw new Exception(e.getMessage(), e);
} finally {
if (tx != null) {
tx.close();
}
}
}
private void loadGraph(final GraphTraversalSource g, final Vertex vertex, final int currentDepth,
final int maxDepth, final JSONArray vertices, final JSONArray edges) throws Exception {
vertices.add(toJSONvertex));
List<Edge> edgeList = g.V(vertex.id()).outE().toList();
if (edgeList == null || edgeList.size() <= 0) {
return;
}
for (Edge edge : edgeList) {
Vertex child = edge.inVertex();
edges.add(Schema.toJSON(vertex, edge, child));
if (currentDepth < maxDepth) {
loadGraph(g, child, currentDepth + 1, maxDepth, vertices, edges);
}
}
}
But this is taking a bit lot of time for the depth 3 when we have more number of nodes exist in the tree it is taking about 1 min to load the data.
Please help me are there any better mechanisms to load the graph efficiently?
You might see better performance performing your full query across a single traversal execution - e.g., g.V(vertex.id()).outV().outV().outE() for depth 3 - but any vertices with very high edge cardinality are going to make this query slow no matter how you execute it.
To add to the answer by #Benjamin performing one traversal as opposed to many little ones which are constantly expanding will indeed be faster. Titan uses lazy loading so you should take advantage of that.
The next thing I would recommend is to also multithread each of your traversals and writes. Titan actually supports simultaneous writes very nicely. You can achieve this using Transactions.

please help me to convert trigger to batch apex

please help me in converting my after trigger to batch apex.
This trigger fires when opportunity stage changes to won.
It runs through line items and checks if forecast(custom objet) exists with that acunt.if yes,iit links to them..if no,itt will create a new forecat.
my trigger works fine forr some records.but to mass update i am getting timed out error.So opting batch apex but i had never written it.pls help me.
trigger Accountforecast on Opportunity (after insert,after update) {
List<Acc_c> AccproductList =new List<Acc_c>();
List<Opportunitylineitem> opplinitemlist =new List<Opportunitylineitem>();
list<opportunitylineitem > oppdate= new list<opportunitylineitem >();
List<Acc__c> accquery =new List<Acc__c>();
List<date> dt =new List<date>();
Set<Id> sProductIds = new Set<Id>();
Set<Id> sAccountIds = new Set<Id>();
Set<id> saccprodfcstids =new set<Id>();
Acc__c accpro =new Acc__c();
string aname;
Integer i;
Integer myIntMonth;
Integer myIntyear;
Integer myIntdate;
opplinitemlist=[select Id,PricebookEntry.Product2.Name,opp_account__c,Opp_account_name__c,PricebookEntry.Product2.id, quantity,ServiceDate,Acc_Product_Fcst__c from Opportunitylineitem WHERE Opportunityid IN :Trigger.newMap.keySet() AND Acc__c=''];
for(OpportunityLineItem oli:opplinitemlist) {
sProductIds.add(oli.PricebookEntry.Product2.id);
sAccountIds.add(oli.opp_account__c);
}
accquery=[select id,Total_Qty_Ordered__c,Last_Order_Qty__c,Last_Order_Date__c,Fcst_Days_Period__c from Acc__c where Acc__c.product__c In :sproductids and Acc__c.Account__c in :saccountids];
for(Acc__c apf1 :accquery){
saccprodfcstids.add(apf1.id);
}
if(saccprodfcstids!=null){
oppdate=[select servicedate from opportunitylineitem where Acc__c IN :saccprodfcstids ];
i =[select count() from Opportunitylineitem where acc_product_fcst__c in :saccprodfcstids];
}
for(Opportunity opp :trigger.new)
{
if(opp.Stagename=='Closed Won')
{
for(opportunitylineitem opplist:opplinitemlist)
{
if(!accquery.isempty())
{
for(opportunitylineitem opldt :oppdate)
{
string myDate = String.valueOf(opldt);
myDate = myDate.substring(myDate.indexof('ServiceDate=')+12);
myDate = myDate.substring(0,10);
String[] strDate = myDate.split('-');
myIntMonth = integer.valueOf(strDate[1]);
myIntYear = integer.valueOf(strDate[0]);
myIntDate = integer.valueOf(strDate[2]);
Date d = Date.newInstance(myIntYear, myIntMonth, myIntDate);
dt.add(d);
}
dt.add(opp.closedate);
dt.sort();
integer TDays=0;
system.debug('*************dt:'+dt.size());
for(integer c=0;c<dt.size()-1;c++)
{
TDays=TDays+dt[c].daysBetween(dt[c+1]);
}
for(Acc_product_fcst__c apf:accquery)
{
apf.Fcst_Days_Period__c = TDays/i;
apf.Total_Qty_Ordered__c =apf.Total_Qty_Ordered__c +opplist.quantity;
apf.Last_Order_Qty__c=opplist.quantity;
apf.Last_Order_Date__c=opp.CloseDate ;
apf.Fcst_Qty_Avg__c=apf.Total_Qty_Ordered__c/(i+1);
Opplist.Acc__c =apf.Id;
}
}
else{
accpro.Account__c=opplist.opp_account__c;
accpro.product__c=opplist.PricebookEntry.Product2.Id;
accpro.opplineitemid__c=opplist.id;
accpro.Total_Qty_Ordered__c =opplist.quantity;
accpro.Last_Order_Qty__c=opplist.quantity;
accpro.Last_Order_Date__c=opp.CloseDate;
accpro.Fcst_Qty_Avg__c=opplist.quantity;
accpro.Fcst_Days_Period__c=7;
accproductList.add(accpro);
}
}
}
}
if(!accproductlist.isempty()){
insert accproductlist;
}
update opplinitemlist;
update accquery;
}
First of all, you should take a look at this: Apex Batch Processing
Once you get a better idea on how batches work, we need to take into account the following points:
Identify the object that requires more processing. Account? Opportunity?
Should the data be maintained across batch calls? Stateful?
Use correct data structure in terms of performance. Map, List?
From your code, we can see you have three objects: OpportunityLineItems, Accounts, and Opportunities. It seems that your account object is using the most processing here.
It seems you're just keeping track of dates and not doing any aggregations. Thus, you don't need to maintain state across batch calls.
Your code has a potential of hitting governor limits, especially memory limits on the heap. You have a four-nested loop. Our suggestion would be to maintain opportunity line items related to Opportunities in a Map rather than in a List. Plus, we can get rid of those unnecessary for loops by refactoring the code as follows:
Note: This is just a template for the batch you will need to construct.
globalglobal Database.QueryLocator start(Database.BatchableContext BC) class AccountforecastBatch implements Database.Batchable<sObject>
{
global Database.QueryLocator start(Database.BatchableContext BC)
{
// 1. Do some initialization here: (i.e. for(OpportunityLineItem oli:opplinitemlist) {sProductIds.add(oli.PricebookEntry.Product2.id)..}
// 2. return Opportunity object here: return Database.getQueryLocator([select id,Total_Qty_Ordered__c,Last_Order_Qty ....]);
}
global void execute(Database.BatchableContext BC, List<sObject> scope)
{
// 1. Traverse your scope which at this point will be a list of Accounts
// 2. You're adding dates inside the process for Opportunity Line Items. See if you can isolate this process outside the for loops with a Map data structure.
// 3. You have 3 potential database transactions here (insert accproductlist;update opplinitemlist; update accquery; ). Ideally, you will only need one DB transaction per batch.If you can complete step 2 above, you might only need to update your opportunity line items. Otherwise, you're trying to do more than one thing in a method and you will need to redesign your solution
}
global void finish(Database.BatchableContext BC)
{
// send email or do some other tasks here
}
}