Getting pre-configured cache from cache container configured under infinispan wildfly subsystem - wildfly

I'm pretty confused about the infinispan subsystem under Wildfly and am not able to get a pre-configured cache from an existing cache container. To visualize the problem I've created a minimal sample project shared on Github: infinispan-wildfly-test
The test setup creates a cache container (TEST) with two caches (x,y), setting x to be the default. When I now get the EmbeddedCacheManager through resource lookup I get the container I'm expecting:
#Resource(lookup = "java:jboss/infinispan/container/TEST")
private EmbeddedCacheManager cacheManager;
But then, when trying to get a Cache (x or y) I always get a freshly created one whose configuration does not match the one that I have created using the CLI, the cache is totally unconfigured!
The point is I can be sure that the EmbeddedCacheManager is the correct container since it delivers me Cache x as the default one (but unconfigured) but what am I missing here? How can one get the pre-configured caches of a cache container?
All samples that are out there don't address this issue and I'm not sure whether these people are aware of the fact that they get an unconfigured cache instead. Samples always only show the resource lookup of the container an getting the default cache instance. There is no check of configuration ...
So is there anyone out there (maybe a contributor of infinispan) who knows the answer? Thanks and many appreciation in advance ;)

Injected your caches directly.
#Resource(lookup = "java:jboss/infinispan/cache/TEST/x")
private Cache<?, ?> cacheX;
#Resource(lookup = "java:jboss/infinispan/cache/TEST/y")
private Cache<?, ?> cacheY;

Related

Apache Druid Datasources creation

I have a Druid platform deployed without a router (and therefore no UI) on Kubernetes. I noticed that some datasources have disappeared (most probably erased manually). Is there a way to re-created them manually without re-deploying the full platform (for example, by restarting the ingestion server, through an API call, other)?
Thanks - Christian
It really depends on how the data was deleted and whether the segment files are still being pointed to by the Metadata Database.
Your best starting point is the full Druid API documentation.
For example, they may be in Deep Storage and still in Metadata DB but just marked as "unused", in which case you can use an API:
https://druid.apache.org/docs/latest/operations/api-reference.html#post-1
Or they could be "used" in the MDDB but just not being loaded, in which case check the Load Rules:
https://druid.apache.org/docs/latest/operations/api-reference.html#retention-rules

Property placeholder resolution precedence when using vault and consul

I have question about placeholder resolution priority when using consul-config and vault-config
I created simple app using this information
My dependencies are:
dependencies {
compile('org.springframework.cloud:spring-cloud-starter-consul-config')
compile('org.springframework.cloud:spring-cloud-starter-vault-config')
compile('org.springframework.boot:spring-boot-starter-webflux')
compile('org.springframework.cloud:spring-cloud-starter')
testCompile('org.springframework.boot:spring-boot-starter-test')
}
Note that I'm not using service discovery.
Doing next step I created property foo.prop = consul (in consul storage)
and foo.prop = vault.
When using:
#Value("${foo.prop}")
private String prop;
I'm getting vault as an output, but when I delete foo.prop from vault and restart app, I will get consul.
I did this few times in different combinations and seems vault config has higher priority over consul.
My question is where I can find information about resolving strategy.(Imagine that we added as third zookeeper-config). Seems spring-core documentation keep quiet about this.
From what I understood by debugging the Spring source code... Now Vault has a priority.
My investigation results:
PropertySourceBootstrapConfiguration.java is responsible to initialize all property sources in bootstrap phase. Before locating properties it sorts all propertySourceLocators by Order:
AnnotationAwareOrderComparator.sort(this.propertySourceLocators);
Vault always "wins" because instance of LeasingVaultPropertySourceLocator (at least this one was created during my debugging) implements PriorityOrdered interface. Instance of ConsulPropertySourceLocator has #Order(0) annotation. According to OrderComparator : instance of PriorityOrdered is 'more important'.
In case you have another PriorityOrdered property source (e.g. custom one) you can influence this order by setting spring.cloud.vault.config.order for Vault.
For now without customization I don't know how to change priority between Vault and Consul.

How to handle recurring short-lived tasks with Kubernetes

I have a setup with a webserver (NGINX) and a react-based frontend that uses webpack to build the final static sources.
The webserver has its own kubernetes deployment + service.
The frontend needs to be build before the webserver can serve the static html/js/css files - but after that, the pod/container can stop.
My idea was to share a volume between the webserver and the frontend pod. The frontend will write the generated files to the volume and the webserver can serve them from there. Whenever there is an update to the frontend sourcecode, the files need to be regenerated.
What is the best way to accomplish that using kubernetes tools?
Right now, I'm using a init-container to build - but this leads to a restart of the webserver pod as well, which wouldn't be neccessary.
Is this the best/only solution to this problem or should I use kubernetes' jobs for this kind of tasks?
There are multiple ways to do this. Here's how I think about this:
Option 1: The static files represent built source code
In this case, the static files that you want to serve should actually be packaged and built into the docker image of your nginx webserver (in the html directory say). When you want to update your frontend, you update the version of the image used and update the pod.
Option 2: The static files represent state
In this case, your approach is correct. Your 'state' (like a database) is stored in a folder. You then run an init container/job to initialise 'state' and then your webserver pod works fine.
I believe option 1 to be better for 2 reasons:
You can horizontally scale your webserver trivially by increasing the pod replica number. In option 2, you're actually dealing with state so that's a problem when you want to add more nodes to your underlying k8s cluster (you'll have to copy files/folders from one volume/folder to another).
The static files are actually the source code of your app. These are not uploaded media files or similar. In this case, it absolutely makes sense to make them a part of your docker image. Otherwise, it kind of defeats that advantage of containerising and deploying.
Jobs, Init containers, or alternatively a gitRepo type of Volume would work for you.
http://kubernetes.io/docs/user-guide/volumes/#gitrepo
It is not clear in your question why you want to update the static content without simply re-deploying / updating the Pod.
Since somewhere, somehow, you have to build the webserver Docker image, it seems best to build the static content into the image: no moving parts once deployed, no need for volumes or storage. Overall it is simpler.
If you use any kind of automation tool for Docker builds, it's easy.
I personally use Jenkins to build Docker images based on a hook from git repo, and the image is simply rebuilt and deployed whenever the code changes.
Running a Job or Init container doesn't gain you much: sure the web server keeps running, but it's as easy to have a Deployment with rolling updates which will deploy the new Pod before the old one is torn down and you server will always be up too.
Keep it simple...

How to implement the "One Binary" principle with Docker

The One Binary principle explained here:
http://programmer.97things.oreilly.com/wiki/index.php/One_Binary states that one should...
"Build a single binary that you can identify and promote through all the stages in the release pipeline. Hold environment-specific details in the environment. This could mean, for example, keeping them in the component container, in a known file, or in the path."
I see many dev-ops engineers arguably violate this principle by creating one docker image per environment (ie, my-app-qa, my-app-prod and so on). I know that Docker favours immutable infrastructure which implies not changing an image after deployment, therefore not uploading or downloading configuration post deployment. Is there a trade-off between immutable infrastructure and the one binary principle or can they complement each-other? When it comes to separating configuration from code what is the best practice in a Docker world??? Which one of the following approaches should one take...
1) Creating a base binary image and then having a configuration Dockerfile that augments this image by adding environment specific configuration. (i.e my-app -> my-app-prod)
2) Deploying a binary-only docker image to the container and passing in the configuration through environment variables and so on at deploy time.
3) Uploading the configuration after deploying the Docker file to a container
4) Downloading configuration from a configuration management server from the running docker image inside the container.
5) Keeping the configuration in the host environment and making it available to the running Docker instance through a bind mount.
Is there another better approach not mentioned above?
How can one enforce the one binary principle using immutable infrastructure? Can it be done or is there a trade-off? What is the best practice??
I've about 2 years of experience deploying Docker containers now, so I'm going to talk about what I've done and/or know to work.
So, let me first begin by saying that containers should definitely be immutable (I even mark mine as read-only).
Main approaches:
use configuration files by setting a static entrypoint and overriding the configuration file location by overriding the container startup command - that's less flexible, since one would have to commit the change and redeploy in order to enable it; not fit for passwords, secure tokens, etc
use configuration files by overriding their location with an environment variable - again, depends on having the configuration files prepped in advance; ; not fit for passwords, secure tokens, etc
use environment variables - that might need a change in the deployment code, thus lessening the time to get the config change live, since it doesn't need to go through the application build phase (in most cases), deploying such a change might be pretty easy. Here's an example - if deploying a containerised application to Marathon, changing an environment variable could potentially just start a new container from the last used container image (potentially on the same host even), which means that this could be done in mere seconds; not fit for passwords, secure tokens, etc, and especially so in Docker
store the configuration in a k/v store like Consul, make the application aware of that and let it be even dynamically reconfigurable. Great approach for launching features simultaneously - possibly even accross multiple services; if implemented with a solution such as HashiCorp Vault provides secure storage for sensitive information, you could even have ephemeral secrets (an example would be the PostgreSQL secret backend for Vault - https://www.vaultproject.io/docs/secrets/postgresql/index.html)
have an application or script create the configuration files before starting the main application - store the configuration in a k/v store like Consul, use something like consul-template in order to populate the app config; a bit more secure - since you're not carrying everything over through the whole pipeline as code
have an application or script populate the environment variables before starting the main application - an example for that would be envconsul; not fit for sensitive information - someone with access to the Docker API (either through the TCP or UNIX socket) would be able to read those
I've even had a situation in which we were populating variables into AWS' instance user_data and injecting them into container on startup (with a script that modifies containers' json config on startup)
The main things that I'd take into consideration:
what are the variables that I'm exposing and when and where am I getting their values from (could be the CD software, or something else) - for example you could publish the AWS RDS endpoint and credentials to instance's user_data, potentially even EC2 tags with some IAM instance profile magic
how many variables do we have to manage and how often do we change some of them - if we have a handful, we could probably just go with environment variables, or use environment variables for the most commonly changed ones and variables stored in a file for those that we change less often
and how fast do we want to see them changed - if it's a file, it typically takes more time to deploy it to production; if we're using environment variable
s, we can usually deploy those changes much faster
how do we protect some of them - where do we inject them and how - for example Ansible Vault, HashiCorp Vault, keeping them in a separate repo, etc
how do we deploy - that could be a JSON config file sent to an deployment framework endpoint, Ansible, etc
what's the environment that we're having - is it realistic to have something like Consul as a config data store (Consul has 2 different kinds of agents - client and server)
I tend to prefer the most complex case of having them stored in a central place (k/v store, database) and have them changed dynamically, because I've encountered the following cases:
slow deployment pipelines - which makes it really slow to change a config file and have it deployed
having too many environment variables - this could really grow out of hand
having to turn on a feature flag across the whole fleet (consisting of tens of services) at once
an environment in which there is real strive to increase security by better handling sensitive config data
I've probably missed something, but I guess that should be enough of a trigger to think about what would be best for your environment
How I've done it in the past is to incorporate tokenization into the packaging process after a build is executed. These tokens can be managed in an orchestration layer that sits on top to manage your platform tools. So for a given token, there is a matching regex or xpath expression. That token is linked to one or many config files, depending on the relationship that is chosen. Then, when this build is deployed to a container, a platform service (i.e. config mgmt) will poke these tokens with the correct value with respect to its environment. These poke values most likely would be pulled from a vault.

AppFabric cache not clearing on cluster restart

I have an AppFabric Cache installation (the cluster just has one node, viz. my local machine). I just saw the strangest behavior- I put something in the cache, then restarted the cache cluster using the AF Cache PowerShell command Restart-CacheCluster. When I then try to retrieve the value from the cache, it's still there.
Has anyone seen this behavior? Is it due to some configuration setting I'm missing? This is not causing me problems, but the fact that it is not behaving the way I expect scares me in case other issues arise later.
The cache is using a SQL Server database for configuration (as opposed to an XML file).
As specified here :
The Stop-CacheCluster or Restart-CacheCluster cache cluster commands
cause all data to be flushed from the memory of all cache hosts in the
cluster.
No, it's not possible and I've never seen this kind of issue. I suggest you check the whole process.
Are you using a Read-Through provider ? In that scenario, the cache detects the missing item and calls a specific provider to perform the data load. The item is then seamlessly returned to the cache client and will never be null.
Other things you may have to check
Check the result of Restart-CacheCluster cmdlet (sucess/failure)
Maybe a background task is still running, putting data into the cache
By using cmdlet Get-CacheStatistics, you can check how many items are really in the cache