Azure Service Fabric disk full - azure-service-fabric

We have an Azure Service Fabric running on Azure. When we created the cluster, all data are stored on D: drive which is a temp folder with low disk space. It looks like Service Fabric is also using the D: drive to write his logs and those logs takes half of the space.
At some time we run out of space on some nodes in the cluster. We free up space but the problem will probably come back very soon.
Does everyone know how we could safely reconfigure Service Fabric to store data elsewhere ? Could we do that on existing cluster or do we have to reinstall a new cluster ? Could we mount drives from Azure storage and use that for SF logs or storing our data ?

To fix this issue for that instance, you can reset your cluster. So it will clear cache and free up the memory for service execution.
Edit 1:
Here I found msdn link to change settings of service fabric. This might help.
Customize service fabric cluster settings
(Not really an answer, but I cannot comment since I am under 50 points :))

Related

Run out of storage on Service Fabric scale set

I've run out of storage on my Azure Service Fabric sclesets, so can no longer deploy any updates. I'm guessing this is because SF is keeping track of all the deployments and using up space.
Can anyone tell me if there is:
1) A way to tell service fabric to delete old deployments (say older than 10 days ago.)
2) A way to increase the storage available on the scalesets (Service Fabric is currently using the OS disk for deployments)
Regarding your first question,
There is no way to tell SF to auto-delete old packages based on days, you can either:
Do upgrades using the flag -UnregisterUnusedApplicationVersionsAfterUpgrade = $true when running the Deploy-FabricApplication.ps1 script
Update the Deploy-FabricApplication.ps1 script or create a scheduled script to check for unused packages older than a specific version, something like described in this SO
Regarding the second Question:
Yes you can change the disk size via ARM template update,
But the issue might also be the LOGs size, take a look in this question might help solve the problem without bigger disks.

Utilize Managed Disk for Service Fabric temporary storage

Is it possible to configure and deploy a Service Fabric cluster that uses a Managed Disk as the temporary storage location for things like the replicator log and app type/versions?
For example, I can't use an A1_v2 VM instance size because the D: (Temporary Storage) drive is too small. If I could leverage a Managed Disk and configure SF to use it instead of the local SSD then this instance size would work for my dev/test scenarios.
Any idea if and how I can make this work?
Disclaimer: You can do this, but you shouldn't. Details below.
Consider changing the size of the shared log file instead if you really want to use such small VMs.
"fabricSettings": [{
"name": "KtlLogger",
"parameters": [{
"name": "SharedLogSizeInMB",
"value": "4096"
}]
}]
More info on configuration here.
Now to actually answer:
Here are the settings. You'd probably change Setup/FabricDataRoot to move the Service Fabric local installation and all of the local application working directories, and/or TransactionalReplicator/SharedLogPath to move the reliable collections shared log.
Some things to consider:
Service Fabric Services (and Service Fabric itself) are built to work on local disks and generally should not be hosted on XStore backed disks (premium or not):
Reliable Collections are definitely built to operate against local drives. There's no internal testing that I'm aware of that runs them in this configuration.
Waste of IO: Assuming LRS replicates changes 3 times and you set TargetReplicaSetSize to 3, this configuration will generate 9 copies of the state. Do you need 9 copies of your state?
Impact on Latency and Performance: What should be a local disk IO will turn into network + disk IO, which has a chance to hurt your performance.
Impact on Availability: At a minimum you're adding another dependency, which usually reduces overall availability. If storage ever has an issue you're now more coupled to that other service. Today you're pretty coupled already since the VMSS drives are backed by blobs, so VM provisioning would fail, but that's different than the read/write/activation path for your services.

Azure Service Fabric deployments consume a lot disk space

I operate an on-premise Azure Service Fabric cluster for testing purposes. It consists of three nodes, which are running on a single virtual machine (Windows Server 2012) with a 50 GB disk attached to it.
Further I set up continuous deployment from TFS release pipeline to the cluster. However after approx. 80 deployments, service fabric consumed all available disk space and further deployments fail.
Most of the space is taken by C:\ProgramData\SF\Data, which took around 28GB, while each code package has a size of ~130 MB. After I have unprovisioned many of the old deployments (manually via SF portal), only around 5GB were released. Many of the old files are still around in C:\ProgramData\SF\Data.
What is the best approach to improve this?
Why are the files from the old deployments still on disk after unprovisioning?
Is it possible to delete these files manually?
Is it possible to automate the deprovisioning?
On a production environment this situation should be relaxed anyhow (since there is only one node per machine and bigger disks). Nevertheless this would only put off the evil day. I would feel safer to avoid this situation at all.
Edit
It seems that SF is deleting the deployment packages with some delay. I checked the test cluster after one day, and all unprovisioned packages vanished finally.
It seems that SF is deleting the deployment packages with some delay. I checked the test cluster after one day, and all unprovisioned packages vanished finally.
Further I found the Unregister-ServiceFabricApplicationType Cmdlet to automate the unprovisioning process (https://msdn.microsoft.com/en-us/library/mt125885.aspx).

How can I specify where my local developer's service fabric cluster is created?

My problem: I am learning Service Fabric, and doing simple tutorials, and the local cluster is filling up my C drive. I run the projects in Visual Studio. It first creates a cluster in a folder SfDevCluster. That takes up 842 MB of space. Then it deploys the services and web api sites. Remember, these are trivial tutorials with almost nothing in them. Now, I notice that I have a folder with a Size = 1.22 TB and Size on Disk of 9.4 GB. I'm not sure how to interpret that. But it consumes the remaining space on my C drive and sets off alarms.
I have other drives with lots of space. I would love to specify that those be used. Is there a way to do that with the service fabric cluster used by Visual Studio? Or is there a way to constrain the overly ambitious size allocations? And if you understand this, can you explain what these unusual folder sizes mean?
In the old days, I would have a hard drive with lots of space. But now, my developer machine has a much faster, but more expensive SSD drive, and space is at a premium. So I need more control of the cluster location.
You can set up a local cluster pointing to a non-system drive by running the DevClusterSetup script in PowerShell. You can find the script under %programfiles%\Microsoft SDKs\Service Fabric\ClusterSetup\. The command line you want is:
.\DevClusterSetup.ps1 -PathToClusterDataRoot <desired_app_and_data_location> -PathToClusterLogRoot <desired_tracelog_location>
If you already have a cluster running, this script will remove it and create a new one (note that this will delete any deployed apps and their data). Once you have the new cluster running, Visual Studio will automatically use that when you deploy locally.
As for the file sizes - this is mostly due to the log file used for replication of state stored in reliable collections. A large, sparse file is preallocated up-front, which is why you see a difference between size and size on disk. We are planning to make these values configurable so that they can be dialed down on local clusters.
In the Service Fabric SDK folder (C:\Program Files\Microsoft SDKs\ServiceFabric), you will find a ClusterSetup folder.
In there you will find ClusterManifestTemplate.json files for the different configurations of the local cluster. These are json configuration files used by the powershell scripts that create and manage the local service fabric cluster.
At the bottom of these files, in "fabricSettings" it is setting the value of the FabricDataRoot and FabricLogRoot, based on the "%systemDrive%". If you replace this by "D:" it should result in a local cluster on the D drive.
After making these changes, I stopped my local fabric, deleted the current fabric folders from my C drive, and rebooted my machine. When I then start a debug session in VS.2017, it creates the local dev fabric on my D drive and deploys the application to that location. (I do notice that some empty folders are created on my C drive but these are not used.)
What you also can do is resetting the local cluster once in a while.
Can be easily done using the Service Fabric Local Cluster Manager application:

Why does Azure deployment take so long?

I'm trying to understand why it can take from 20-60min to deploy a small application to Azure (using the configuration/package upload method, not from within VS).
I've read through this situation and this one but I'm still a little unclear - is there a weird non-technology ritual that occurs while the instances are distributing, like somebody over at Microsoft lighting a candle or doing a dance?
As a fellow Azure user, I share your pain - deploying isn't "quick"/"painless" - and this hurts especially when you're in a development cycle and want to test dev iterations on Azure. However, in general deployments should take much less than 60 minutes - and less than 20 minutes too.
Steve Marx provided a brief overview of the steps involved in deployment:
http://blog.smarx.com/posts/what-happens-when-you-deploy-on-windows-azure
And he references a deeper level explanation at: http://channel9.msdn.com/blogs/pdc2008/es19
There's a lot that goes on behind the scenes when you deploy an application to the Azure cloud. I don't have any special insight into what's going on behind the curtain, but having worked on the VS tools to upload projects to the Azure cloud, these are my impressions as an outsider looking in:
Among other things:
Hardware must be allocated from the available pool of servers
The VHD of the core OS must be uploaded to the machine
A VM instance must be initialized and booted off that VHD image
Your application package must be copied to the VM and installed
The VM monitor must wait for your service to start up, or fail
The data center load balancer and firewall must be made aware of your application's service endpoints
Once all of that has synchronized, your app is accessible from the web.
The VHD image is probably gigabytes in size, much larger than your app upload. Even on a superfast datacenter network, it takes time to move that much stuff into the VM, unpack it, and boot from it. Also, the load balancer and firewall are probably optimized to make routing requests the highest priority. Reconfiguring the firewall and load balancer is lower priority, and has to be done without interrupting traffic flow.
Also note that all this work only has to be done for a new deployment. Updating an existing deployment rolls out much faster - 2 to 3 minutes instead of 20 to 30 minutes.
Check out this PDC10 video by Mark Russinovich. He goes into great detail on what's going on inside Azure with some insights into the (admittedly slow) deployment process.
Original link is no longer working. Here's another link to a version of the same presentation: https://channel9.msdn.com/events/Build/BUILD2011/SAC-853T