Utilize Managed Disk for Service Fabric temporary storage - azure-service-fabric

Is it possible to configure and deploy a Service Fabric cluster that uses a Managed Disk as the temporary storage location for things like the replicator log and app type/versions?
For example, I can't use an A1_v2 VM instance size because the D: (Temporary Storage) drive is too small. If I could leverage a Managed Disk and configure SF to use it instead of the local SSD then this instance size would work for my dev/test scenarios.
Any idea if and how I can make this work?

Disclaimer: You can do this, but you shouldn't. Details below.
Consider changing the size of the shared log file instead if you really want to use such small VMs.
"fabricSettings": [{
"name": "KtlLogger",
"parameters": [{
"name": "SharedLogSizeInMB",
"value": "4096"
}]
}]
More info on configuration here.
Now to actually answer:
Here are the settings. You'd probably change Setup/FabricDataRoot to move the Service Fabric local installation and all of the local application working directories, and/or TransactionalReplicator/SharedLogPath to move the reliable collections shared log.
Some things to consider:
Service Fabric Services (and Service Fabric itself) are built to work on local disks and generally should not be hosted on XStore backed disks (premium or not):
Reliable Collections are definitely built to operate against local drives. There's no internal testing that I'm aware of that runs them in this configuration.
Waste of IO: Assuming LRS replicates changes 3 times and you set TargetReplicaSetSize to 3, this configuration will generate 9 copies of the state. Do you need 9 copies of your state?
Impact on Latency and Performance: What should be a local disk IO will turn into network + disk IO, which has a chance to hurt your performance.
Impact on Availability: At a minimum you're adding another dependency, which usually reduces overall availability. If storage ever has an issue you're now more coupled to that other service. Today you're pretty coupled already since the VMSS drives are backed by blobs, so VM provisioning would fail, but that's different than the read/write/activation path for your services.

Related

Migrate to kubernetes

We're planning to migrate our software to run in kubernetes with auto scalling, this is our current infrastructure:
PHP and apache are running in Google Compute Engine n1-standard-4 (4 vCPUs, 15 GB memory)
MySql is running in Google Cloud SQL
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
I found many posts that recomments to store the data file in the Google Cloud Storage and use the API to fetch the file and uploading to the bucket. We have very limited time so I decide to use NFS to share the data files over the pods, the problem is nfs speed is slow, it's around 100mb/s when I copying the file with pv, the result from iperf is 1.96 Gbits/sec.Do you know how to achieve the same result without implement the cloud storage? or increase the NFS speed?
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
There's nothing stopping you from volume mounting an SSD into the Pod so you can continue to use an SSD. I can only speak to AWS terminology, but some EC2 instances come with "local" SSD hardware, and thus you would only need to use a nodeSelector to ensure your Pods were scheduled onto machines that had said local storage available.
Where you're going to run into problems is if you are currently just using one php+apache and thus just one SSD, but now you want to scale the application up and it requires that all php+apache have access to the same SSD. That's a classic distributed application architecture problem, and something kubernetes itself can't fix for you.
If you're willing to expend the effort, you can also try any one of the other distributed filesystems (Ceph, GlusterFS, etc) and see if they perform better for your situation. Then again, "We have very limited time" I guess pretty much means that's off the table.

Azure Service Fabric disk full

We have an Azure Service Fabric running on Azure. When we created the cluster, all data are stored on D: drive which is a temp folder with low disk space. It looks like Service Fabric is also using the D: drive to write his logs and those logs takes half of the space.
At some time we run out of space on some nodes in the cluster. We free up space but the problem will probably come back very soon.
Does everyone know how we could safely reconfigure Service Fabric to store data elsewhere ? Could we do that on existing cluster or do we have to reinstall a new cluster ? Could we mount drives from Azure storage and use that for SF logs or storing our data ?
To fix this issue for that instance, you can reset your cluster. So it will clear cache and free up the memory for service execution.
Edit 1:
Here I found msdn link to change settings of service fabric. This might help.
Customize service fabric cluster settings
(Not really an answer, but I cannot comment since I am under 50 points :))

Node app staging fails at Installing App Management stage

I've been trying to push a new build of an app to Bluemix, but staging keeps failing when it's at "Installing App Management" because it can't create regular files and directories due to the disk quota being exceeded.
I've already tried pushing it with "-k 2G", but it it still fails.
Is there any way to find out how or why the disk quota keeps being exceeded? There's no way I'm near using 2GB of disk space.
Switching to npm v3 is a potentinal solution here as it reduces the number of duplicated dependencies.
You can do that in your package.json, for example:
"engines": { "npm": "3.x" }
By design, the CloudFoundry applications on IBM Bluemix are limited to an available diskquota of 2GB (default is 1 GB): usually if a cloud application needs more than a 1GB (even 1GB is a lot for a cloud application...) it should be redesigned according to cloud patterns, breaking down it into microservices, using external storage services if it needs simply static storage (for example ObjectStorage service on Bluemix).
You have also to consider that a cloud application filesystem is a unreliable filesystem, the application itself could be automatically deployed on a different virtual environment without any evidence for the end users.
Even the logs should be collected on external services (routing the log stream) if you need to keep these safe, otherwise they will be reset as soon as the application will be restarted on a different cluster node.

Persistent storage for Apache Mesos

Recently I've discovered such a thing as a Apache Mesos.
It all looks amazingly in all that demos and examples. I could easily imagine how one would run for stateless jobs - that fits to the whole idea naturally.
Bot how to deal with long running jobs that are stateful?
Say, I have a cluster that consists of N machines (and that is scheduled via Marathon). And I want to run a postgresql server there.
That's it - at first I don't even want it to be highly available, but just simply a single job (actually Dockerized) that hosts a postgresql server.
1- How would one organize it? Constraint a server to a particular cluster node? Use some distributed FS?
2- DRBD, MooseFS, GlusterFS, NFS, CephFS, which one of those play well with Mesos and services like postgres? (I'm thinking here on the possibility that Mesos/marathon could relocate the service if goes down)
3- Please tell if my approach is wrong in terms of philosophy (DFS for data servers and some kind of switchover for servers like postgres on the top of Mesos)
Question largely copied from Persistent storage for Apache Mesos, asked by zerkms on Programmers Stack Exchange.
Excellent question. Here are a few upcoming features in Mesos to improve support for stateful services, and corresponding current workarounds.
Persistent volumes (0.23): When launching a task, you can create a volume that exists outside of the task's sandbox and will persist on the node even after the task dies/completes. When the task exits, its resources -- including the persistent volume -- can be offered back to the framework, so that the framework can launch the same task again, launch a recovery task, or launch a new task that consumes the previous task's output as its input.
Current workaround: Persist your state in some known location outside the sandbox, and have your tasks try to recover it manually. Maybe persist it in a distributed filesystem/database, so that it can be accessed from any node.
Disk Isolation (0.22): Enforce disk quota limits on sandboxes as well as persistent volumes. This ensures that your storage-heavy framework won't be able to clog up the disk and prevent other tasks from running.
Current workaround: Monitor disk usage out of band, and run periodic cleanup jobs.
Dynamic Reservations (0.23): Upon launching a task, you can reserve the resources your task uses (including persistent volumes) to guarantee that they are offered back to you upon task exit, instead of going to whichever framework is furthest below its fair share.
Current workaround: Use the slave's --resources flag to statically reserve resources for your framework upon slave startup.
As for your specific use case and questions:
1a) How would one organize it? You could do this with Marathon, perhaps creating a separate Marathon instance for your stateful services, so that you can create static reservations for the 'stateful' role, such that only the stateful Marathon will be guaranteed those resources.
1b) Constraint a server to a particular cluster node? You can do this easily in Marathon, constraining an application to a specific hostname, or any node with a specific attribute value (e.g. NFS_Access=true). See Marathon Constraints. If you only wanted to run your tasks on a specific set of nodes, you would only need to create the static reservations on those nodes. And if you need discoverability of those nodes, you should check out Mesos-DNS and/or Marathon's HAProxy integration.
1c) Use some distributed FS? The data replication provided by many distributed filesystems would guarantee that your data can survive the failure of any single node. Persisting to a DFS would also provide more flexibility in where you can schedule your tasks, although at the cost of the difference in latency between network and local disk. Mesos has built-in support for fetching binaries from HDFS uris, and many customers use HDFS for passing executor binaries, config files, and input data to the slaves where their tasks will run.
2) DRBD, MooseFS, GlusterFS, NFS, CephFS? I've heard of customers using CephFS, HDFS, and MapRFS with Mesos. NFS would seem an easy fit too. It really doesn't matter to Mesos what you use as long as your task knows how to access it from whatever node where it's placed.
Hope that helps!

AppFabric setup in a domain

So I am a little confused by reading the documents.
I want to setup AppFabric caching and hosting.
Can I do the following?
DC
SQL Server
AppFabric1
AppFabric2
All these computers are joined to the DC.
I want to be able to have AppFabric1 be the mainhost but also part of the cache cluster?
What about AppFabric2? or AppFabricX? How can I make them part of the cache cluster?
Do I have to make AppFabric1 and AppFabric2 configured in Windows as part of a cluster (i.e setup the entire environment as a cluster)?
Can I install AppFabric independently on AppFabric1 and 2 and have them cluster together and "make it work"? If so - how?
I see documentation about setting it up in a webfarm but also a workgroup... and that's it. nothing about computers joined to a domain.
I want to setup AppFabric caching and hosting.
Caching and Hosting are two totaly different things and generally don't share the same use cases.
AppFabric Caching provides an in-memory, distributed cache platform for Windows Server, previously named Velocity. The cache cluster is a collection of one or more instances of the Caching Service working together. You can easily add new cache host without restarting the cluster in the "storage location" (xml or sql server).
Can I install AppFabric independently on AppFabric1 and 2 and have
them cluster together and "make it work"? If so - how?
Don't worry... this can be done easily during installation. In addition, there are powerfull PS module to to the same thing.
AppFabric Hosting enhance the hosting of WCF and Workflow Foundation services in WAS (autostart, monitoring of hosted services, workflow persistence, ...). There is no cluster here and basically you just have to configure to monitoring/persistence DB for each server.
Just try it !
When you are adding the second node in the AppFabric cluster, make sure to choose the option Join Cluster (instead of New Cluster) and point to the path of the share where you stored the configuration (assuming that you used FILE SHARE to store the configuration of the cluster). The share that you used should be accessible from Appfabric2.