In ATG, what is default caching for order repository? - atg

In atg what is default caching for order repository.
As per the documentation it is simple(if my understanding is correct?).
Is it better to keep the order repository as simple, because lots of modification will happen for an order object. Is it better to keep it as distributed?

It is always better to keep the cache mode simple. Distributed adds lots of communication between instances and this may add additional overhead for not a lot of value.
It can also be set to locked cache mode so that the order will be locked using the lock server mechanism in ATG. This will prevent different server instances (or even the same instance) from making changes to the order simultaneously. You can find more information about this here - http://docs.oracle.com/cd/E41069_01/Platform.11-0/ATGRepositoryGuide/html/s1005lockedcaching01.html

Related

Best way to keep in sync data in two different applications

I have 2 closed-source application that must share the same data at some point. Both uses REST APIs.
An actual example are helpdesk tickets, they can be created on both applications and i need to update the data on one application when the user adds a new ticket/closes a ticket on the other application and vice versa.
Since is closed-source I can't really modify che code.
I was thinking I can create a third application that every 5 minutes or so, list both applications' tickets for differences on the precedent call, and if the data is different from the precedent call it updates the other application too.
Is there a better way of doing this?
With closed-source applications it's nearly impossible to get something out of them, unless they have some plugin-based setup that you can hook into.
The most efficient way in terms of costs would be to have the first application publish a message on a queue, or call a web-hook that you set, whenever the event is triggered. But as I mentioned, the application needs to support that.
So yeah, your solution is pretty much everything you can do for now, but keep in mind the challenges that you may encounter over time:
What if the results of both APIs are too large to be compared directly? Maybe you need to think about paging the results.
What if your app crashes and you loose the previous state? You need to somehow back it up in an external source
How often you should poll the API to make sure you're getting the updates you need, while keeping a good performance for the existing traffic?

Preventing deletion of noncurrent objects

I'm storing backups in Cloud Storage. A desirable property of such a backup is to ensure the device being backed up cannot erase the backups, to protect against ransomware or similar threats. At the same time, it is desirable to allow the backup client to delete so old files can be pruned. (Because the backups are encrypted, it isn't possible to use lifecycle management to do this.)
The solution that immediately comes to mind is to enable object versioning and use lifecycle rules to retain object versions (deleted files) for a certain amount of time. However, I cannot see a way to allow the backup client to delete the current version, but not historical versions. I thought it might be possible to do this with an IAM condition, but the conditional logic doesn't seem flexible enough to parse out the object version. Is there another way I've missed?
The only other solution that comes to mind is to create a second bucket, inaccessible to the backup client, and use a Cloud Function to replicate the first bucket. The downside of that approach is the duplicate storage cost.
To answer this:
However, I cannot see a way to allow the backup client to delete the current version, but not historical versions
When you delete a live object, object versioning will retain a noncurrent version of it. When deleting the noncurrent object version, you will have to specify the object name along with its generation number.
Just to add, you may want to consider using a transfer job to replicate your data on a separate bucket.
Either way, both approach (object versioning or replicating buckets) will incur additional storage costs.

Multiple Siddhi apps or one big one

We are building an application on top of Siddhi (using the Java library) that allows users to dynamically add rules and have all incoming information going forward be run against those rules. My question is if it's better to have one large app with many queries, streams, windows, and partitions, or to break up each query into it's own application.
We have been including everything in one single Siddhi app (SiddhiAppRuntime), but this is starting to become large and I fear things may start interacting with each other in unintended ways. We are also snapshotting the SiddhiAppRuntime and restoring state whenever our application gets restarted. This could likely lead to massive restores if we have hundreds of pattern queries to re-run.
I am considering making a separate SiddhiAppRuntime from a single SiddhiManager for each query. The benefits (as I see them) would be reduce the risk of unintentional interactions, make each query able to function on its own, and restoring the query after a shutdown should be much simpler since it will only need to restore a single query. Potential downsides could be increased overhead for having potentially hundreds of SiddhiAppRuntimes.
What is considered best practice for our scenario? What will offer better performance, both for running data through the rules and for restoring the rules in the case we have to restart.
(If this is too broad or any clarification is needed I will do my best to update this question accordingly)
From the lengthy description that you have given I assume these rules that users add does not interact with each other meaning rules add by user1 will not be interacting with rules added by user2.
In such a case it is recommended to use different Siddhi Apps(SiddhiAppRuntimes) for each user. This wont add much additional performance overhead as apps wont be interacting with each other. This will improve snapshoting process as we will be taking separate snapshopts per each app.
Also this will make sure you will have clear separation between each collection of rules and will be easily manageable.

Are "best practices" regarding connection handle re-use and database user design mutually exclusive?

SO says this may be subjective. I'm hoping not--I just can't seem to understand how this works in practice, and it seems like a specific enough technical question with I hope a definitive answer.
Context: LAPP stack.
I've read that using a single database user as the login for all connections to the database, and handling security yourself from there, is a bad idea. Databases have sufficient security models and it makes sense to use them.
Database handles have some resource cost associated with them, hence the existence of Apache::DBI, DBIx::Connector, and DBI::connect_cached(), to re-use a recent connection to a database. Making use of them should make a web app faster by avoiding the cost of connecting to a database.
The reason these seem to be mutually exclusive best practices is that, in my understanding, #1 implies that any database connection will be made with separate per-user credentials, which implies (as Apache::DBI documents) that re-using such connections will likely quickly cause your database backend to run out of connections.
The default maximum number of connections for PostgreSQL is 100.
The default numbers of servers and multiplied by subprocesses allowed for each, for Apache 2 running with the prefork MPM, far exceeds that, so it seems Apache::DBI's docs are right.
Thus the question: What do people do then, in practice?
Does this mean people using a LAPP stack generally connect using a single database user, and implement their own security/permissions model? Or does it mean they don't pool connections? Or do they choose between these two strategies based on speed vs security needs if they go with a LAPP stack, and if they need both, go with a desktop app or some other connection model?
Or if these are not, in fact, mutually exclusive strategies, what am I missing in my understanding here?
I've read that using a single database user as the login for all connections to the database, and handling security yourself from there, is a bad idea. Databases have sufficient security models and it makes sense to use them.
You probably misread this, or read it in a highly biased location. A more balanced view is (hopefully) this:
Managing perms (ACL or RBAC or other) within the database is a bloody mess and hard to get right. It can cripple performance, too, if done improperly (think: "select * from table join perms where convoluted_permission_scenario".) Depending on who you ask, you'll get more or less extreme viewpoints, e.g. here's (the very controversial) Zed Shaw: http://vimeo.com/2723800.
Managing perms at the DB level is just as much of a bloody mess. Not all engines implement row-level permissions, and even then there occasionally are leaks. For instance, calling a function in a where clause could (can?) leak rows in Postgres (until a recent version?) if raise gets called. And frankly, if you go past a superficial analysis of what is going on, it basically amounts to the former — just standardized and (usually) in C.
Managing perms at the app level without a database is also a bloody mess. It'll cripple performance no matter what you do from the moment where you need to join outside of SQL, unless you're dealing with trivial amounts of data. If you try it, you'll do fine… until your database grows too large and you basically don't.
So, in short: it's a bloody mess no matter where you manage it. Because permissions are a mess. In addition to the casual and idealistic "Joe needs write access to this set of nodes", you also need to cope with more down to earth scenarios such as "John is going off on vacation for Christmas and needs to temporarily delegate his write permissions on this set of nodes to his assistant Jane". Moreover, whichever scenario you do pick, you need to manage read access (which is usually the most frequent) in such a way that it's fast so you can scale. There's no silver bullet.
Moreover, even in the first and last of the above scenarios, it's ideal to have three DB users. One for reads, one for read/writes, and one for schema changes. Most apps don't, because it's yet another bloody mess to configure your ORM that way, hence the typical one DB user per app.
Anyway, getting back to your question: what people do in practice is one or two database users (read vs read/write/modify), implement RBAC or ACL within the database itself, and avoid access restriction logic like the plague on public-facing pages for performance reasons.

How to link MemCached server together?

I'm looking into using MemCached for a web application I am developing and after researching MemCached over the past few days, I have come across a question I could not find the answer to.
How do you link Memcached server together or how do you replicate data between MemCached server?
Additionally: Is this functionality controlled by the servers or the clients and how?
when you set several servers, the client libraries use a first hash to pick one where to store each key/data pair. that means that there's no replication, and also that every client has to use the same set of servers.
pros:
almost zero overhead, storage and bandwidth grow linearly.
server code is kept simple and reliable.
cons:
any change in the set of servers (one goes down, or you add a new one) suddenly invalidates (almost) the whole cache.
you have to be sure to use the same algorithm on every client.
if you have control to the client's code, you can simply store each key/data pair twice on two servers. just be sure to search on the same places when reading from a different client.
I've used BeITMemcached and in that you create an instance of MemcacheClient and set the servers you want to use, just as strings.
At that point the client itself determines which of the servers it has available to put different items into. You never know which an item will be in.
Check here to see how the servers handle failover.
The easiest thing is to have a repopulate mechanism. In my case, I store several hundred objects in memcache which come out of a database. I can just call repopulate and put them all back in there. Whenever I add, update or delete them to the database, I make those same calls to memcache.
http://repcached.lab.klab.org/
Also, the PHP PECL memcache client can replicate data to multiple servers, see memcache.redundancy.
It sounds like you wish to have caches that can cope with machines rebooting etc if so…
In a lot of case (assuming you are not writing Facebook) a RDMS is fast enough for caching. Just create a table that has a key and a blob column. If the RDBS server has enough ram, all the data will be in RAM and just saved to disk so as to allow recovery.
Remember this could be a separate server(s) from your main database server.
If you wish to get more fancy and are using a high-end RDMS, you may be able to set up change notifications on the queries that are used to build the “cached data” that delete out-of-date rows from the cache.
Someone you can set up triggers to clear invalid rows from the cache, however this can be very complex very quickly.
Memcached does not provide replication property. To do that, you need to add the server to memcached client server list and then hit the DB for the data to be stored in that particular server.
You should seriously consider CouchBase. It uses the memcached protocol, provides nearly the same speed, and delivers the automatic replication you're looking for. It also persists to disk so your cache will never be cold.