How can I specify where my local developer's service fabric cluster is created? - azure-service-fabric

My problem: I am learning Service Fabric, and doing simple tutorials, and the local cluster is filling up my C drive. I run the projects in Visual Studio. It first creates a cluster in a folder SfDevCluster. That takes up 842 MB of space. Then it deploys the services and web api sites. Remember, these are trivial tutorials with almost nothing in them. Now, I notice that I have a folder with a Size = 1.22 TB and Size on Disk of 9.4 GB. I'm not sure how to interpret that. But it consumes the remaining space on my C drive and sets off alarms.
I have other drives with lots of space. I would love to specify that those be used. Is there a way to do that with the service fabric cluster used by Visual Studio? Or is there a way to constrain the overly ambitious size allocations? And if you understand this, can you explain what these unusual folder sizes mean?
In the old days, I would have a hard drive with lots of space. But now, my developer machine has a much faster, but more expensive SSD drive, and space is at a premium. So I need more control of the cluster location.

You can set up a local cluster pointing to a non-system drive by running the DevClusterSetup script in PowerShell. You can find the script under %programfiles%\Microsoft SDKs\Service Fabric\ClusterSetup\. The command line you want is:
.\DevClusterSetup.ps1 -PathToClusterDataRoot <desired_app_and_data_location> -PathToClusterLogRoot <desired_tracelog_location>
If you already have a cluster running, this script will remove it and create a new one (note that this will delete any deployed apps and their data). Once you have the new cluster running, Visual Studio will automatically use that when you deploy locally.
As for the file sizes - this is mostly due to the log file used for replication of state stored in reliable collections. A large, sparse file is preallocated up-front, which is why you see a difference between size and size on disk. We are planning to make these values configurable so that they can be dialed down on local clusters.

In the Service Fabric SDK folder (C:\Program Files\Microsoft SDKs\ServiceFabric), you will find a ClusterSetup folder.
In there you will find ClusterManifestTemplate.json files for the different configurations of the local cluster. These are json configuration files used by the powershell scripts that create and manage the local service fabric cluster.
At the bottom of these files, in "fabricSettings" it is setting the value of the FabricDataRoot and FabricLogRoot, based on the "%systemDrive%". If you replace this by "D:" it should result in a local cluster on the D drive.
After making these changes, I stopped my local fabric, deleted the current fabric folders from my C drive, and rebooted my machine. When I then start a debug session in VS.2017, it creates the local dev fabric on my D drive and deploys the application to that location. (I do notice that some empty folders are created on my C drive but these are not used.)

What you also can do is resetting the local cluster once in a while.
Can be easily done using the Service Fabric Local Cluster Manager application:

Related

kernelspec not found after setting JUPYTER_PATH

I am working in Google Vertex AI, which has a two-disk system of a boot disk and a data disk, the latter of which is mounted to /home/jupyter. I am trying to expose python venv environments with kernelspec files, and then keep those environments exposed across repeated stop-start cycles. All of the default locations for kernelspec files are on the boot disk, which is ephemeral and recreated each time the VM is started (i.e., the exposed kernels vaporize each time the VM is stopped). Conceptually, I want to use a VM start-up script to add a persistent data disk path to the JUPYTER_PATH variable, since, according to the documentation, "Jupyter uses a search path to find installable data files, such as kernelspecs and notebook extensions." During interactive testing in the Terminal, I have not found this to be true. I have also tried setting the data directory variable, but it does not help.
export JUPYTER_PATH=/home/jupyter/envs
export JUPYTER_DATA_DIR=/home/jupyter/envs
I have a beginner's understanding of jupyter and of the important ramifications of using two-disk systems. Could someone please help me understand:
(1) Why is Jupyter failing to search for kernelspec files on the JUPYTER_PATH or in the JUPYTER_DATA_DIR?
(2) If I am mistaken about how the search paths work, what is the best strategy for maintaining virtual environment exposure when Jupyter is installed on an ephemeral boot disk? (Note, I am aware of nb_conda_kernels, which I am specifically avoiding)
A related post focused on the start-up script can be found at this url. Here I am more interested in the general Jupyter + two-disk use case.

Copying directories into minikube and persisting them

I am trying to copy some directories into the minikube VM to be used by some of the pods that are running. These include API credential files and template files used at run time by the application. I have found you can copy files using scp into the /home/docker/ directory, however these files are not persisted over reboots of the VM. I have read files/directories are persisted if stored in the /data/ directory on the VM (among others) however I get permission denied when trying to copy files to these directories.
Are there:
A: Any directories in minikube that will persist data that aren't protected in this way
B: Any other ways of doing the above without running into this issue (could well be going about this the wrong way)
To clarify, I have already been able to mount the files from /home/docker/ into the pods using volumes, so it's just the persisting data I'm unclear about.
Kubernetes has dedicated object types for these sorts of things. API credential files you might store in a Secret, and template files (if they aren't already built into your Docker image) could go into a ConfigMap. Both of them can either get translated to environment variables or mounted as artificial volumes in running containers.
In my experience, trying to store data directly on a node isn't a good practice. It's common enough to have multiple nodes, to not directly have login access to those nodes, and for them to be created and destroyed outside of your direct control (imagine an autoscaler running on a cloud provider that creates a new node when all of the existing nodes are 90% scheduled). There's a good chance your data won't (or can't) be on the host where you expect it.
This does lead to a proliferation of Kubernetes objects and associated resources, and you might find a Helm chart to be a good resource to tie them together. You can check the chart into source control along with your application, and deploy the whole thing in one shot. While it has a couple of useful features beyond just packaging resources together (a deploy-time configuration system, a templating language for the Kubernetes YAML itself) you can ignore these if you don't need them and just write a bunch of YAML files and a small control file.
For minikube, data kept in $HOME/.minikube/files directory is copied to / directory in VM host by minikube.

Azure Service Fabric disk full

We have an Azure Service Fabric running on Azure. When we created the cluster, all data are stored on D: drive which is a temp folder with low disk space. It looks like Service Fabric is also using the D: drive to write his logs and those logs takes half of the space.
At some time we run out of space on some nodes in the cluster. We free up space but the problem will probably come back very soon.
Does everyone know how we could safely reconfigure Service Fabric to store data elsewhere ? Could we do that on existing cluster or do we have to reinstall a new cluster ? Could we mount drives from Azure storage and use that for SF logs or storing our data ?
To fix this issue for that instance, you can reset your cluster. So it will clear cache and free up the memory for service execution.
Edit 1:
Here I found msdn link to change settings of service fabric. This might help.
Customize service fabric cluster settings
(Not really an answer, but I cannot comment since I am under 50 points :))

Azure Service Fabric deployments consume a lot disk space

I operate an on-premise Azure Service Fabric cluster for testing purposes. It consists of three nodes, which are running on a single virtual machine (Windows Server 2012) with a 50 GB disk attached to it.
Further I set up continuous deployment from TFS release pipeline to the cluster. However after approx. 80 deployments, service fabric consumed all available disk space and further deployments fail.
Most of the space is taken by C:\ProgramData\SF\Data, which took around 28GB, while each code package has a size of ~130 MB. After I have unprovisioned many of the old deployments (manually via SF portal), only around 5GB were released. Many of the old files are still around in C:\ProgramData\SF\Data.
What is the best approach to improve this?
Why are the files from the old deployments still on disk after unprovisioning?
Is it possible to delete these files manually?
Is it possible to automate the deprovisioning?
On a production environment this situation should be relaxed anyhow (since there is only one node per machine and bigger disks). Nevertheless this would only put off the evil day. I would feel safer to avoid this situation at all.
Edit
It seems that SF is deleting the deployment packages with some delay. I checked the test cluster after one day, and all unprovisioned packages vanished finally.
It seems that SF is deleting the deployment packages with some delay. I checked the test cluster after one day, and all unprovisioned packages vanished finally.
Further I found the Unregister-ServiceFabricApplicationType Cmdlet to automate the unprovisioning process (https://msdn.microsoft.com/en-us/library/mt125885.aspx).

Copying a virtual machine data drive in Microsoft Azure

Added more details at the bottom of the question.
We are testing deployment scenarios in Azure VM preview and have run into an issue.
Here is our scenario. We have a software stack that we use in all of our servers. We have created an image with all of that stack installed on an attached data drive. We have created a image of the VM that we can use as a template. Now what we want to do is to to create a VM based on that template and create a copy of the data drive and attach it to the newly created VM in an automated manner.
Our problem is that while we have found lots of information about creating drives, we can't find any guidance on how to copy the data drive using Azure for Powershell.
Any thoughts, code, or RTFMs happily accepted.
Cheers,
Terence
We have sucessfully created an operating system image that we can use to create VM's. But there is a data disk that holds our standard software stack that we want to reuse by copying it across VMs. The scenario that we are trying to implement is:
Create a VM from a standard VM image - PBIMaster
Attach a disk as F to that image called PBIMasterDisk
Install all of the software required for our app on F: (to big for the OS disk and besides sticking it on the OS disk seems messy)
Build an image from PBIMaster call it PBIMasterImage save it.
Create a new image from PBIMaster call it Node1
Copy PBIMasterDisk to a new Azure disk call it Node1Software disk
Attach Node1Softwaredisk to Node1 as F:
Since the image has the correct registry settings from the previous installs our stack is ready to go.
9 Add appropriate endpoints.
Rinse and repeat for each additional node.
Hopefully that makes our scenario clearer.
Thanks.
If I understood your objective correctly you already have uploaded two VHD in your subscription and you have also create a VM based on your OS Disk VHD1:
OS Disk (VHD1)
Data Disk (VHD2)
Now you want to copy VHD2 to VHD3 and then attach VHD3 to your VM (which is based on OS disk) via Powershell.
As of there is no powershell command which will let you copy DataDisk (VHD2) to another data disk (i.e VHD3)..
I haven't tried but you can use the following code to try copying your DataDisk:
http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/12/introducing-asynchronous-cross-account-copy-blob.aspx
This method does copy blobs directly at cloud storage level so there is no bandwidth usage towards on-premise and potentially zero cost if you are in same DC. Trying using the same subscription and see if that solves your problem.