Ehcache 2.4.2 does not write all the values in the element to a file - element

I'm trying to do a simple test with ehcache - put an element into the cache, flush and shutdown the cache. Then reload all the beans with spring (also initializes cachemanager). Do a cache.get and retrieve previously written values.
EhCache Element's value is a some serializable class called DOM which comprises a field ConcurrentHasMap
I create 3 DOM instances: d1, d2, d3
d1 (has a map with 3 values: t1, t2, t3)
d2 (has a map with 2 values: x1, x2)
d3 (has a map with 2 values: s1, s2)
I call:
cachemanager and cache are created with spring
cache.put(new Element(1,d1))
cache.put(new Element(2,d2))
cache.put(new Element(3,d3))
cache.flush();
cacheManager.shutdown();
cache = null
cacheManager = null
I call to load spring application context (which creates cacheManager and cache)
I call:
actualD1 = cache.get(1)
actualD2 = cache.get(2)
actualD3 = cache.get(3)
I receive the DOM objects into the actualD1, actualD2 and actualD3 variables
But the problem is that now each of them has only one value
actualD1 (has a map with 1 value: t1)
actualD2 (has a map with 1 value: x1)
actualD3 (has a map with 1 value: s1)
What could be the problem!???
Here is my ehcache.xml file:
<defaultCache
maxElementsInMemory="1000000"
eternal="false"
diskSpoolBufferSizeMB="100"
overflowToDisk="true"
clearOnFlush="false"
copyOnRead="false"
copyOnWrite="false"
diskExpiryThreadIntervalSeconds="300"
diskPersistent="true">
</defaultCache>
Here is how I create a cacheManager (this method is called in startup from spring)
protected def checkAndCreateCacheManagerIfNeeded() =
{
if (cacheManager == null)
{
synchronized
{
if (cacheManager == null)
{
cacheManager = CacheManager.create(ehCacheConfigFile);
}
}
};
};
The following code creates the cache:
protected def getOrCreateCache(cacheName : String) =
{
checkAndCreateCacheManagerIfNeeded();
var cache = cacheManager.getEhcache(cacheName);
if (cache == null)
{
cacheManager.synchronized
{
cache = cacheManager.getEhcache(cacheName);
if (cache == null)
{
cache = cacheManager.addCacheIfAbsent(cacheName);
}
}
};
cache;
};

The problem was adding t1, t2, t3 to d1 without putting the updated d1 to the cache. After each addition of the value to map. One must add the call:
cache.put(d1Element)

While this isn't necessarily true, since it depends on your setup, it is considered best practice indeed to do so with Ehcache. In this particular setup, your elements are being serialized to disk (not necessarily at the time of the put though). As a result, once serialized, any subsequent change to the object graph won't reflect in the DiskStore.
This would have worked with onHeap only storage, but it is still recommended you do put back, enabling you to change the cache configuration in the future without any need to change the code.

Related

How do I update a MongoDB document with new value using reactors Mono? (Kotlin)

So the context is that I require to update a value in a single document, I have a Mono, the parameter Object contains values such as username (to find the correct user by unique username) and an amount value.
The problem is that this value (due to other components of my application) is the value by which I need to increase/decrease the users balance, as opposed to passing a new balance. I intend to do this using two Monos where one finds the user, then this is combined to the other Mono with the inbound request, where I can then perform a simple sum (i.e balance + changeRequest.amount) then return this to the document database.
override fun increaseBalance(changeRequest: Mono<ChangeBalanceRequestResource>): Mono<ChangeBalanceResponse> {
val changeAmount: Mono<Decimal128> = changeRequest.map { it.transactionAmount }
val user: Mono<User> = changeRequest.flatMap { rxUserRepository.findByUsername(it.username)
val newBalace = user.map {
val r = changeAmount.block()
it.balance = sumBalance(it.balance!!, r!!)
rxUserRepository.save(it)
}
.flatMap { it }
.map { it.balance!! }
return Mono.just(ChangeBalanceResponse("success", newBalace.block()!!))
}
Obviously I'm trying to achieve this in a non-blocking fashion. I'm also open to using only a single Mono if that's possible/optimal. I also appreciate I've truly butchered the example and used .block as a placeholder to illustrate what I'm trying to achieve.
P.S this is my first post, so any tips on how to express my problem clearer would be useful.
Here's how I would do this in Java (Using Double instead of Decimal128):
public Mono<ChangeBalanceResponse> increaseBalance(Mono<ChangeBalanceRequestResource> changeRequest) {
Mono<Double> changeAmount = changeRequest.map(a -> a.transactionAmount());
Mono<User> user = changeRequest.map(a -> a.username()).flatMap(RxUserRepository::findByUsername);
return Mono.zip(changeAmount,user).flatMap(t2 -> {
Double changeAmount = t2.getT1();
User user = t2.getT2();
//assumes User is chained
return rxUserRepository.save(user.balance(sumBalance(changeAmount,user.balance())));
}).map(res -> new ChangeBalanceResponse("success",res.newBalance()))
}

Entity Framework is too slow during mapping data up to 100k

I have min 100 000 data into a Job_Details table and I'm using Entity Framework to map the data.
This is the code:
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
var lstJob = dbContext.Job_Details.ToList();
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
I found to this line is taking about 2-3 min to execute
var lstJob = dbContext.Job_Details.ToList();
How can i solve this issue?
To outline the performance issues with your example: (see inline comments)
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
List<JobBO> lstJobs = new List<JobBO>();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Loads *ALL* entities into memory. This effectively takes all fields for all rows across from the database to your app server. (Even though you don't want it all)
var lstJob = dbContext.Job_Details.ToList();
// Filters from the data in memory.
foreach (var dbJob in lstJob.Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null))
{
// Maps the entity to a DTO and adds it to the return collection.
JobBO job = MapBEJobforSearchObj(dbJob);
lstJobs.Add(job);
}
}
// Returns the DTOs.
getJobResponse.Jobs = lstJobs;
return getJobResponse;
}
First: pass your WHERE clause to EF to pass to the DB server rather than loading all entities into memory..
public GetJobsResponse GetImportJobs()
{
GetJobsResponse getJobResponse = new GetJobsResponse();
using (NSEXIM_V2Entities dbContext = new NSEXIM_V2Entities())
{
// Will pass the where expression to be DB server to be executed. Note: No .ToList() yet to leave this as IQueryable.
var jobs = dbContext.Job_Details..Where(ie => ie.IMP_EXP == "I" && ie.Job_No != null));
Next, use SELECT to load your DTOs. Typically these won't contain as much data as the main entity, and so long as you're working with IQueryable you can load related data as needed. Again this will be sent to the DB Server so you cannot use functions like "MapBEJobForSearchObj" here because the DB server does not know this function. You can SELECT a simple DTO object, or an anonymous type to pass to a dynamic mapper.
var dtos = jobs.Select(ie => new JobBO
{
JobId = ie.JobId,
// ... populate remaining DTO fields here.
}).ToList();
getJobResponse.Jobs = dtos;
return getJobResponse;
}
Moving the .ToList() to the end will materialize the data into your JobBO DTOs/ViewModels, pulling just enough data from the server to populate the desired rows and with the desired fields.
In cases where you may have a large amount of data, you should also consider supporting server-side pagination where you pass a page # and page size, then utilize a .Skip() + .Take() to load a single page of entries at a time.

Graph processing increasingly gets slower on titan + dynamoDB (local) as more vertices/edges are added

I am working with titan 1.0 using AWS dynamoDB local implementation as storage backend on a 16GB machine. My use case involves generating graphs periodically containing vertices & edges in the order of 120K. Every time I generate a new graph in-memory, I check the graph stored in DB and either (i) add vertices/edges that do not exist, or (ii) update properties if they already exist (existence is determined by 'Label' and a 'Value' attribute). Note that the 'Value' property is indexed. Transactions are committed in batches of 500 vertices.
Problem: I find that this process gets slower each time I process a new graph (1st graph finished in 45 mins with empty db initially, 2nd took 2.5 hours, 3rd in 3.5 hours, 4th in 6 hours, 5th in 10 hours and so on). In fact, when processing a given graph, it is fairly quick at start time but progressively gets slower (initial batches take 2-4 secs and later on it increases to 100s of seconds for same batch size of 500 nodes; I also see sometimes it takes 1000-2000 secs for a batch). This is the processing time alone (see approach below); commit takes between 8-10 secs always. I configured the jvm heap size to 10G, and I notice that when the app is running it is eventually using up all of it.
Question: Is this behavior to be expected? It seems to me something is wrong here (either in my config / approach?). Any help or suggestions would be greatly appreciated.
Approach:
Starting from the root node of the in-memory graph, I retrieve all child nodes and maintain a queue
For each child node, I check to see if it exists in DB, else create new node, and update some properties
Vertex dbVertex = dbgraph.traversal().V()
.has(currentVertexInMem.label(), "Value",
(String) currentVertexInMem.value("Value"))
.tryNext()
.orElseGet(() -> createVertex(dbgraph, currentVertexInMem));
if (dbVertex != null) {
// Update Properties
updateVertexProperties(dbgraph, currentVertexInMem, dbVertex);
}
// Add edge if necessary
if (parentDBVertex != null) {
GraphTraversal<Vertex, Edge> edgeIt = graph.traversal().V(parentDBVertex).outE()
.has("EdgeProperty1", eProperty1) // eProperty1 is String input parameter
.has("EdgeProperty2", eProperty2); // eProperty2 is Long input parameter
Boolean doCreateEdge = true;
Edge e = null;
while (edgeIt.hasNext()) {
e = edgeIt.next();
if (e.inVertex().equals(dbVertex)) {
doCreateEdge = false;
break;
}
if (doCreateEdge) {
e = parentDBVertex.addEdge("EdgeLabel", dbVertex, "EdgeProperty1", eProperty1, "EdgeProperty2", eProperty2);
}
e = null;
it = null;
}
...
if ((processedVertexCount.get() % 500 == 0)
|| processedVertexCount.get() == verticesToProcess.get()) {
graph.tx().commit();
}
Create function:
public static Vertex createVertex(Graph graph, Vertex clientVertex) {
Vertex newVertex = null;
switch (clientVertex.label()) {
case "Label 1":
newVertex = graph.addVertex(T.label, clientVertex.label(), "Value",
clientVertex.value("Value"),
"Property1-1", clientVertex.value("Property1-1"),
"Property1-2", clientVertex.value("Property1-2"));
break;
case "Label 2":
newVertex = graph.addVertex(T.label, clientVertex.label(), "Value",
clientVertex.value("Value"), "Property2-1",
clientVertex.value("Property2-1"),
"Property2-2", clientVertex.value("Property2-2"));
break;
default:
newVertex = graph.addVertex(T.label, clientVertex.label(), "Value",
clientVertex.value("Value"));
break;
}
return newVertex;
}
Schema Def: (Showing some of the indexes)
Note:
"EdgeLabel" = Constants.EdgeLabels.Uses
"EdgeProperty1" = Constants.EdgePropertyKeys.EndpointId
"EdgeProperty2" = Constants.EdgePropertyKeys.Timestamp
public void createSchema() {
// Create Schema
TitanManagement mgmt = dbgraph.openManagement();
mgmt.set("cache.db-cache",true);
// Vertex Properties
PropertyKey value = mgmt.getPropertyKey(Constants.VertexPropertyKeys.Value);
if (value == null) {
value = mgmt.makePropertyKey(Constants.VertexPropertyKeys.Value).dataType(String.class).make();
mgmt.buildIndex(Constants.GraphIndexes.ByValue, Vertex.class).addKey(value).buildCompositeIndex(); // INDEX
}
PropertyKey shapeSet = mgmt.getPropertyKey(Constants.VertexPropertyKeys.ShapeSet);
if (shapeSet == null) {
shapeSet = mgmt.makePropertyKey(Constants.VertexPropertyKeys.ShapeSet).dataType(String.class).cardinality(Cardinality.SET).make();
mgmt.buildIndex(Constants.GraphIndexes.ByShape, Vertex.class).addKey(shapeSet).buildCompositeIndex();
}
...
// Edge Labels and Properties
EdgeLabel uses = mgmt.getEdgeLabel(Constants.EdgeLabels.Uses);
if (uses == null) {
uses = mgmt.makeEdgeLabel(Constants.EdgeLabels.Uses).multiplicity(Multiplicity.MULTI).make();
PropertyKey timestampE = mgmt.getPropertyKey(Constants.EdgePropertyKeys.Timestamp);
if (timestampE == null) {
timestampE = mgmt.makePropertyKey(Constants.EdgePropertyKeys.Timestamp).dataType(Long.class).make();
}
PropertyKey endpointIDE = mgmt.getPropertyKey(Constants.EdgePropertyKeys.EndpointId);
if (endpointIDE == null) {
endpointIDE = mgmt.makePropertyKey(Constants.EdgePropertyKeys.EndpointId).dataType(String.class).make();
}
// Indexes
mgmt.buildEdgeIndex(uses, Constants.EdgeIndexes.ByEndpointIDAndTimestamp, Direction.BOTH, endpointIDE,
timestampE);
}
mgmt.commit();
}
The behavior you experience is expected. Today, DynamoDB Local is a testing tool built on SQLite. If you need to support high TPS for large and periodic data loads, I recommend you use the DynamoDB service.

Can I use non volatile external variables in Scala Enumeratee?

I need to group output of my Enumerator in different ZipEntries, based on specific property (providerId), original chartPreparations stream is ordered by providerId, so I can just keep reference to provider, and add a new entry when provider chages
Enumerator.outputStream(os => {
val currentProvider = new AtomicReference[String]()
// Step 1. Creating zipped output file
val zipOs = new ZipOutputStream(os, Charset.forName("UTF8"))
// Step 2. Processing chart preparation Enumerator
val chartProcessingTask = (chartPreparations) run Iteratee.foreach(cp => {
// Step 2.1. Write new entry if needed
if(currentProvider.get() == null || cp.providerId != currentProvider.get()) {
if (currentProvider.get() != null) {
zipOs.write("</body></html>".getBytes(Charset.forName("UTF8")))
}
currentProvider.set(cp.providerId)
zipOs.putNextEntry(new ZipEntry(cp.providerName + ".html"))
zipOs.write(HTML_HEADER)
}
// Step 2.2 Write chart preparation in HTML format
zipOs.write(toHTML(cp).getBytes(Charset.forName("UTF8")))
})
// Step 3. On Complete close stream
chartProcessingTask.onComplete(_ => zipOs.close())
})
Since current provider reference, is changing, during the output, I made it AtomicReference, so that I could handle references from different threads.
Can currentProvider just be a var Option[String], and why?

EF DbContext. How to avoid caching?

Spent a lot of time, but still cann't understand how to avoid caching in DbContext.
I attached below entity model of some easy case to demonstrate what I mean.
The problem is that dbcontext caching results. For example, I have next code for querying data from my database:
using (TestContext ctx = new TestContext())
{
var res = (from b in ctx.Buildings.Where(x => x.ID == 1)
select new
{
b,
flats = from f in b.Flats
select new
{
f,
people = from p in f.People
where p.Archived == false
select p
}
}).AsEnumerable().Select(x => x.b).Single();
}
In this case, everything is fine: I got what I want (Only persons with Archived == false).
But if I add another query after it, for example, query for buildings that have people that have Archived flag set to true, I have next things, that I really cann't understand:
my previous result, that is res, will be added by data (there
will be added Persons with Archived == true too)
new result will contain absolutely all Person's, no matter what Archived equals
the code of this query is next:
using (TestContext ctx = new TestContext())
{
var res = (from b in ctx.Buildings.Where(x => x.ID == 1)
select new
{
b,
flats = from f in b.Flats
select new
{
f,
people = from p in f.People
where p.Archived == false
select p
}
}).AsEnumerable().Select(x => x.b).Single();
var newResult = (from b in ctx.Buildings.Where(x => x.ID == 1)
select new
{
b,
flats = from f in b.Flats
select new
{
f,
people = from p in f.People
where p.Archived == true
select p
}
}).AsEnumerable().Select(x => x.b).Single();
}
By the way, I set LazyLoadingEnabled to false in constructor of TestContext.
Does anybody know how to workaround this problem? How can I have in my query what I really write in my linq to entity?
P.S. #Ladislav may be you can help?
You can use the AsNoTracking method on your query.
var res = (from b in ctx.Buildings.Where(x => x.ID == 1)
select new
{
b,
flats = from f in b.Flats
select new
{
f,
people = from p in f.People
where p.Archived == false
select p
}
}).AsNoTracking().AsEnumerabe().Select(x => x.b).Single();
I also want to note that your AsEnumerable is probably doing more harm than good. If you remove it, the Select(x => x.b) will be translated to SQL. As is, you are selecting everything, then throwing away everything but x.b in memory.
have you tried something like:
ctx.Persons.Where(x => x.Flat.Building.Id == 1 && x.Archived == false);
===== EDIT =====
In this case I think you approach is, imho, really hazardous. Indeed you works on the data loaded by EF to interpret your query rather than on data resulting of the interpretation of your query. If one day EF changes is loading policy (for example with a predictive pre-loading) your approach will "send you in then wall".
For your goal, you will have to eager load the data you need to build your "filterd" entity. That is select the building, then foreach Flat select the non archived persons.
Another solution is to use too separate contexts in an "UnitOfWork" like design.