Can't use Ignite loadCache method with IgniteBiPredicate - postgresql

I'm using Apache Ignite as a cache backed by a postgres database too large to initialize all at once. Instead, I want to load selective pieces into my cache on an as-needed basis. I wanted to do this via loadCache (since readThroughEnabled mode only works for simple cache operations, which I can't guarantee we'll be using), which accepts an IgniteBiPredicate as an argument. However, whenever I try to use an IgniteBiPredicate as filter, I get ClassCastExceptions that prevent me from loading the cache.
This is for a single node of Ignite local to my server. I have tried taking my cache and calling "loadCache(biPredicate, null)" on it to attempt to load the specific data I want. This is an example of my code with some domain specifics stripped out:
public void initFoo(int key){
IgniteCache<Integer, Foo> myCache = getCache(Foo.class);
if (myCache != null){
myCache.loadCache(new IgniteBiPredicate<Integer, Foo>() {
#Override
public boolean apply(Integer integer, Foo foo){
return integer == key;
}
}, null);
}
}
I would expect this to result in my cache being loaded, at least the part that I care about for this particular entry. Instead I see
"Failed to load cache: Foo"
...(lots of nested exceptions eventually leading to)
...
"Caused by: jvaa.lang.ClassCastException:
org.apache.ignite.internal.binary.BinaryObjectImpl cannot be cast to
com.sms.Foo"
I am confident this is not a problem in my cache configuration (at least at the level of the Java object) as when I did a simple loadCache with no arguments, the entire cache loaded fine and was usable. Am I doing something wrong?

Basically you are getting BinaryObject's instead of your POJO objects.
You can just choose to work with BinaryObject's, or there should be a setting to reverse this behavior. Try cacheCfg.setStoreKeepBinary(false) when creating cache, for example.
UPD: I have checked it and it seems that storeKeepBinary isn't used. Anyway, you can do Foo foo = binaryObject.deserialize(); if you like. I'm still not sure if this is configurable or if this closure just gets what is thrown to it.

Related

Limit instance count with Autofac?

I have a console app that will create an instance of a class and execute a method on it, and that's really all it does (but this method may do a lot of things). The class is determined at runtime based on command line args, and this is registered to Autofac so it can be correctly resolved, supplying class-specific constructor parameters extracted from the command line. All this works.
Now, I need to impose a system-wide limit to the number of instances per class that can be running at any one time. I will probably use a simple SQL database to keep track of number of allowed and running instances per class, and I have no problem with the SQL side of things.
But how do I actually impose this limit in a nice manner using Autofac?
I am thinking that I would have some "slot service" that would do something like this:
Try to reserve a new instance "slot".
If no more slots, log a message and terminate the process.
If slot successfully reserved, create instance and return it.
My idea is also to free the instance's slot in the class' Dispose method, preferably by using another method on the slot service.
How would I fit this into Autofac?
One possibility would be to register the class I want to instantiate with a lambda/delegate that does the above steps. But in that case, how do I "terminate"? Throw an exception? That would require some code to catch the exception and either log it or simply ignore it before terminating the process. I don't like it. I'd like the entire slot reservation stuff inside the delegate, lambda or service.
Another solution might be to do the slot reservation outside of Autofac, but that also seems somewhat messy.
I would prefer a solution where the "slot service" itself can be nicely unit tested, i.e. non-static and with an interface, and preferably resolved with Autofac.
I'm sure I'm missing something obvious here... Any suggestions?
This is my "best bet" so far:
static void Main(string[] args)
{
ReadCommandLine(args, out Type itemClass, out Type paramsClass, out Type paramsInterface, out object parameters);
BuildContainer(itemClass, paramsClass, paramsInterface, parameters);
IInstanceHandler ih = Container.Resolve<IInstanceHandler>();
if (ih.RegisterInstance(itemClass, out long instanceid))
{
try
{
Container.Resolve<IItem>().Execute();
}
finally
{
ih.UnregisterInstance(itemClass, instanceid);
}
}
}

GWT / JSNI - "DataCloneError - An object could not be cloned" - how do I debug?

I am attempting to call out to parallels.js via JSNI. Parallels provides a nice API around web workers, and I wrote some lightweight wrapper code which provides a more convenient interface to workers from GWT than Elemental. However I'm getting an error which has me stumped:
com.google.gwt.core.client.JavaScriptException: (DataCloneError) #io.mywrapper.workers.Parallel::runParallel([Ljava/lang/String;Lcom/google/gwt/core/client/JavaScriptObject;Lcom/google/gwt/core/client/JavaScriptObject;)([Java object: [Ljava.lang.String;#1922352522, JavaScript object(3006), JavaScript object(3008)]): An object could not be cloned.
This comes from, in hosted mode:
at com.google.gwt.dev.shell.BrowserChannelServer.invokeJavascript(BrowserChannelServer.java:249) at com.google.gwt.dev.shell.ModuleSpaceOOPHM.doInvoke(ModuleSpaceOOPHM.java:136) at com.google.gwt.dev.shell.ModuleSpace.invokeNative(ModuleSpace.java:571) at com.google.gwt.dev.shell.ModuleSpace.invokeNativeVoid(ModuleSpace.java:299) at com.google.gwt.dev.shell.JavaScriptHost.invokeNativeVoid(JavaScriptHost.java:107) at io.mywrapper.workers.Parallel.runParallel(Parallel.java)
Here's my code:
Example client call to create a worker:
Workers.spawnWorker(new String[]{"hello"}, new Worker() {
#Override
public String[] work(String[] data) {
return data;
}
#Override
public void done(String[] data) {
int i = data.length;
}
});
The API that provides a general interface:
public class Workers {
public static void spawnWorker(String[] data, Worker worker) {
Parallel.runParallel(data, workFunction(worker), callbackFunction(worker));
}
/**
* Create a reference to the work function.
*/
public static native JavaScriptObject workFunction(Worker worker) /*-{
return worker == null ? null : $entry(function(x) {
worker.#io.mywrapper.workers.Worker::work([Ljava/lang/String;)(x);
});
}-*/;
/**
* Create a reference to the done function.
*/
public static native JavaScriptObject callbackFunction(Worker worker) /*-{
return worker == null ? null : $entry(function(x) {
worker.#io.mywrapper.workers.Worker::done([Ljava/lang/String;)(x);
});
}-*/;
}
Worker:
public interface Worker extends Serializable {
/**
* Called to perform the work.
* #param data
* #return
*/
public String[] work(String[] data);
/**
* Called with the result of the work.
* #param data
*/
public void done(String[] data);
}
And finally the Parallels wrapper:
public class Parallel {
/**
* #param data Data to be passed to the function
* #param work Function to perform the work, given the data
* #param callback Function to be called with result
* #return
*/
public static native void runParallel(String[] data, JavaScriptObject work, JavaScriptObject callback) /*-{
var p = new $wnd.Parallel(data);
p.spawn(work).then(callback);
}-*/;
}
What's causing this?
The JSNI docs say, regarding arrays:
opaque value that can only be passed back into Java code
This is quite terse, but ultimately my arrays are passed back into Java code, so I assume these are OK.
EDIT - ok, bad assumption. The arrays, despite only ostensibly being passed back to Java code, are causing the error (which is strange, because there's very little googleability on DataCloneError.) Changing them to String works; however, String isn't sufficient for my needs here. Looks like objects face the same kinds of issues as arrays do; I saw Thomas' reference to JSArrayUtils in another StackOverflow thread, but I can't figure out how to call it with an array of strings (it wants an array of JavaScriptObjects as input for non-primitive types, which does me no good.) Is there a neat way out of this?
EDIT 2 - Changed to use JSArrayString wherever I was using String[]. New issue; no stacktrace this time, but in the console I get the error: Uncaught ReferenceError: __gwt_makeJavaInvoke is not defined. When I click on the url to the generated script in developer tools, I get this snippet:
self.onmessage = function(e) {self.postMessage((function (){
try {
return __gwt_makeJavaInvoke(3)(null, 65626, jsFunction, this, arguments);
}
catch (e) {
throw e;
}
})(e.data))}
I see that _gwt_makeJavaInvoke is part of the JSNI class; so why would it not be found?
You can find working example of GWT and WebWorkers here: https://github.com/tomekziel/gwtwwlinker/
This is a preliminary work, but using this pattern I was able to pass GWT objects to and from webworker using serialization provided by AutoBeanFactory.
If you never use dev mode it is currently safe to pretend that a Java String[] is a JS array with strings in it. This will break in dev mode since arrays have to be usable in Java and Strings are treated specially, and may break in the future if the compiler optimizes arrays differently.
Cases where this could go wrong in the future:
The semantics of Java arrays and JavaScript arrays are different - Java arrays cannot be resized, and are initialized with specific values based on the component type (the data in the array). Since you are writing Java code, the compiler could conceivable make assumptions based on details about how you create and use that array that could be broken by JS code that doesn't know to never modify the array.
Some arrays of primitive types could be optimized into TypedArrays in JavaScript, more closely following Java semantics in terms of resizing and Java behavior in terms of allocation. This would be a performance boost as well, but could break any use of int[], double[], etc.
Instead, you should copy your data into a JsArrayString, or just use the js array to hold the data rather than going back and forth, depending on your use case. The various JsArray types can be resized and already exist as JavaScript objects that outside JS can understand and work with.
Reply to EDIT 2:
At a guess, the parallel.js script is trying to run your code from another scope such a in the webworker (that's the point of the code, right) where your GWT code isn't present. As such, it can't call the makeJavaInvoke which is the bridge back into dev mode (would be a different failure with compiled JS). According to http://adambom.github.io/parallel.js/ there are specific requirements that a passed callback must meet to be passed in to spawn and perhaps then - your anonymous functions definitely do not meet them, and it may not be possible to maintain java semantics.
Before I get much deeper, check out this answer I did a while ago addressing the basic issues with webworkers and gwt/java: https://stackoverflow.com/a/11376059/860630
As noted there, WebWorkers are effectively new processes, with no shared code or shared state with the original process. The Parallel.js code attempts to paper over this with a little bit of trickery - shared state is only available in the form of the contents passed in to the original Parallel constructor, but you are attempting to pass in instances of 'java' objects and calling methods on them. Those Java instances come with their own state, and potentially can link back to the rest of the Java app by fields in the Worker instance. If I were implementing Worker and doing something that referenced other data than what was passed in, then I would be seeing further bizarre failures.
So the functions you pass in must be completely standalone - they must not refer to external code in any way, since then the function can't be passed off to the webworker, or to several webworkers, each unaware of each other's existence. See https://github.com/adambom/parallel.js/issues/32 for example:
That's not possible since it would
require a shared state across workers
require us to transmit all scope variables (I don't think there's even a possibility to read the available scopes)
The only thing which might be possible would be cache variables, but these can already be defined in the function itself with spawn() and don't make any sense in map (because there's no shared state).
Without being actually familiar with how parallel.js is implemented (all of this answer so far is reading the docs and a quick google search for "parallel.js shared state", plus having experiemented with WebWorkers for a day or so and deciding that my present problem wasn't yet worth the bother), I would guess that then is unrestricted, and you can you pass it whatever you like, but spawn, map, and reduce must be written in such a way that their JS can be passed off to the new JS process and completely stand alone there.
This may be possible from your normal Java code when compiled, provided you have just one implementation of Worker and that impl never uses state other than what is directly passed in. In that case the compiler should rewrite your methods to be static so that they are safe to use in this context. However, that doesn't make for a very useful library, as it seems you are trying to achieve. With that in mind, you could keep your worker code in JSNI to ensure that you follow the parallel.js rules.
Finally, and against the normal GWT rules, avoid $entry for calls you expect to happen in other contexts, since those workers have no access to the normal exception handling and scheduling that $entry enables.
(and finally finally, this is probably still possible if you are very careful at writing Worker implementations and write a Generator that invokes each worker implementation in very specific ways to make sure that com.google.gwt.dev.jjs.impl.MakeCallsStatic and com.google.gwt.dev.jjs.impl.Pruner can correctly act to knock out the this in those instance methods once they've been rewritten as JS functions. I think the cleanest way to do this is to emit the JSNI in the generator itself, call a static method written in real Java, and from that static method call the specific instance method that does the heavy lifting for spawn, etc.)

Post-Initialization in extbase domain model

In one of my extbase models, I want to initialize some properties derived from the properties that are saved in the database. The computation of these virtual properties is time consuming, so I'd like to cache them. Thus my program flow should look somehow like this:
Load the domain object as usual from the database
Check if the virtual property is available in cache. If so, fetch it from there, otherwise compute and cache it.
I first thought the method "initializeObject" is what I need, but it is not: It is called before any property is initialized from the database. So I came up with two other ways:
I can call an initialization-method manually from the repository after fetching the object, but that seems weird and would break if someone adds another find* method to the general repository.
Another idea is to add a boolean "virtualPropertiesInitialized" to the model, query it whenever one of the virtual properties is accessed and initialize the virtual properties if needed. Also seems weird, but would not break if someone adds another "find"-method to the generic repository.
My question is:
Is there a default/best-practice how to do what I want to do?
If reading the final value from disk or database is less computationally intensive, then store the value using the TYPO3 caching framework or by your own caching method of a static class and restore it in the getter of the virtual property. Doing it in the getter method public mixed getYourPropery() will give you the feature that the value is only get from the cache when you call it.
On the second call, just return the value you stored previously:
private $myValue = NULL;
public function getMyValue() {
if($this->myValue != NULL) return $this->myValue;
$this->myValue = "test";
return $this->myValue;
}

Are singletons automatically persisted between requests in ASP.NET MVC?

I have a lookup table (LUT) of thousands integers that I use on a fair amount of requests to compute stuff based on what was fetched from database.
If I simply create a standard singleton to hold the LUT, is it automatically persisted between requests or do I specifically need to push it to the Application state?
If they are automatically persisted, then what is the difference storing them with the Application state?
How would a correct singleton implementation look like? It doesn't need to be lazily initialized, but it needs to be thread-safe (thousands of theoretical users per server instance) and have good performance.
EDIT: Jon Skeet's 4th version looks promising http://csharpindepth.com/Articles/General/Singleton.aspx
public sealed class Singleton
{
static readonly Singleton instance=new Singleton();
// Explicit static constructor to tell C# compiler
// not to mark type as beforefieldinit
static Singleton()
{
}
Singleton()
{
}
public static Singleton Instance
{
get
{
return instance;
}
}
// randomguy's specific stuff. Does this look good to you?
private int[] lut = new int[5000];
public int Compute(Product p) {
return lut[p.Goo];
}
}
Yes, static members persists (not the same thing as persisted - it's not "saved", it never goes away), which would include implementations of a singleton. You get a degree of lazy initialisation for free, as if it's created in a static assignment or static constructor, it won't be called until the relevant class is first used. That creation locks by default, but all other uses would have to be threadsafe as you say. Given the degree of concurrency involved, then unless the singleton was going to be immutable (your look-up table doesn't change for application lifetime) you would have to be very careful as to how you update it (one way is a fake singleton - on update you create a new object and then lock around assigning it to replace the current value; not strictly a singleton though it looks like one "from the outside").
The big danger is that anything introducing global state is suspect, and especially when dealing with a stateless protocol like the web. It can be used well though, especially as an in-memory cache of permanent or near-permanent data, particularly if it involves an object graph that cannot be easily obtained quickly from a database.
The pitfalls are considerable though, so be careful. In particular, the risk of locking issues cannot be understated.
Edit, to match the edit in the question:
My big concern would be how the array gets initialised. Clearly this example is incomplete as it'll only ever have 0 for each item. If it gets set at initialisation and is the read-only, then fine. If it's mutable, then be very, very careful about your threading.
Also be aware of the negative effect of too many such look-ups on scaling. While you save for mosts requests in having pre-calculation, the effect is to have a period of very heavy work when the singleton is updated. A long-ish start-up will likely be tolerable (as it won't be very often), but arbitrary slow downs happening afterwards can be tricky to trace to their source.
I wouldn't rely on a static being persisted between requests. [There is always the, albeit unlikely, chance that the process would be reset between requests.] I'd recommend HttpContext's Cache object for persisting shared resources between requests.
Edit: See Jon's comments about read-only locking.
It's been a while since I've dealt with singleton's (I prefer letting my IOC container deal with lifetimes), but here's how you can handle the thread-safety issues. You'll need to lock around anything that mutates the state of the singleton. Read only operations, like your Compute(int) won't need locking.
// I typically create one lock per collection, but you really need one per set of atomic operations; if you ever modify two collections together, use one lock.
private object lutLock = new object();
private int[] lut = new int[5000];
public int Compute(Product p) {
return lut[p.Goo];
}
public void SetValue(int index, int value)
{
//lock as little code as possible. since this step is read only we don't lock it.
if(index < 0 || index > lut.Length)
{
throw new ArgumentException("Index not in range", "index");
}
// going to mutate state so we need a lock now
lock(lutLock)
{
lut[index] = value;
}
}

Class Design: Demeter vs. Connection Lifetimes

Okay, so here's a problem I'm running into.
I have some classes in my application that have methods that require a database connection. I am torn between two different ways to design the classes, both of which are centered around dependency injection:
Provide a property for the connection that is set by the caller prior to method invocation. This has a few drawbacks.
Every method relying on the connection property has to validate that property to ensure that it isn't null, it's open and not involved in a transaction if that's going to muck up the operation.
If the connection property is unexpectedly closed, all the methods have to either (1.) throw an exception or (2.) coerce it open. Depending on the level of robustness you want, either case is appropriate. (Note that this is different from a connection that is passed to a method in that the reference to the connection exists for the lifetime of the object, not simply for the lifetime of the method invocation. Consequently, the volatility of the connection just seems higher to me.)
Providing a Connection property seems (to me, anyway) to scream out for a corresponding Transaction property. This creates additional overhead in the documentation, since you'd have to make it fairly obvious when the transaction was being used, and when it wasn't.
On the other hand, Microsoft seems to favor the whole set-and-invoke paradigm.
Require the connection to be passed as an argument to the method. This has a few advantages and disadvantages:
The parameter list is naturally larger. This is irksome to me, primarily at the point of call.
While a connection (and a transaction) must still be validated prior to use, the reference to it exists only for the duration of the method call.
The point of call is, however, quite clear. It's very obvious that you must provide the connection, and that the method won't be creating one behind your back automagically.
If a method doesn't require a transaction (say a method that only retrieves data from the database), no transaction is required. There's no lack of clarity due to the method signature.
If a method requires a transaction, it's very clear due to the method signature. Again, there's no lack of clarity.
Because the class does not expose a Connection or a Transaction property, there's no chance of callers trying to drill down through them to their properties and methods, thus enforcing the Law of Demeter.
I know, it's a lot. But on the one hand, there's the Microsoft Way: Provide properties, let the caller set the properties, and then invoke methods. That way, you don't have to create complex constructors or factory methods and the like. Also, avoid methods with lots of arguments.
Then, there's the simple fact that if I expose these two properties on my objects, they'll tend to encourage consumers to use them in nefarious ways. (Not that I'm responsible for that, but still.) But I just don't really want to write crappy code.
If you were in my shoes, what would you do?
Here is a third pattern to consider:
Create a class called ConnectionScope, which provides access to a connection
Any class at any time, can create a ConnectionScope
ConnectionScope has a property called Connection, which always returns a valid connection
Any (and every) ConnectionScope gives access to the same underlying connection object (within some scope, maybe within the same thread, or process)
You then are free to implement that Connection property however you want, and your classes don't have a property that needs to be set, nor is the connection a parameter, nor do they need to worry about opening or closing connections.
More details:
In C#, I'd recommend ConnectionScope implement IDisposable, that way your classes can write code like "using ( var scope = new ConnectionScope() )" and then ConnectionScope can free the connection (if appropriate) when it is destroyed
If you can limit yourself to one connection per thread (or process) then you can easily set the connection string in a [thread] static variable in ConnectionScope
You can then use reference counting to ensure that your single connection is re-used when its already open and connections are released when no one is using them
Updated: Here is some simplified sample code:
public class ConnectionScope : IDisposable
{
private static Connection m_Connection;
private static int m_ReferenceCount;
public Connection Connection
{
get
{
return m_Connection;
}
}
public ConnectionScope()
{
if ( m_Connection == null )
{
m_Connection = OpenConnection();
}
m_ReferenceCount++;
}
public void Dispose()
{
m_ReferenceCount--;
if ( m_ReferenceCount == 0 )
{
m_Connection.Dispose();
m_Connection = null;
}
}
}
Example code of how one (any) of your classes would use it:
using ( var scope = new ConnectionScope() )
{
scope.Connection.ExecuteCommand( ... )
}
I would prefer the latter method. It sounds like your classes use the database connection as a conduit to the persistence layer. Making the caller pass in the database connection makes it clear that this is the case. If the connection/transaction were represented as a property of the object, then things are not so clear and all of the ownership and lifetime issues come out. Better to avoid them from the start.