Add custom logs in azure HDInsight application - scala

I am deploying my scala+apache spark 2.0 application on azure HDInsight cluster. We can see default yarn logs of the application through azure portal. But, Our requirement is to add our own custom logger (error, debug logs) for application specific (business cases) logs. We are not able to create custom logger which can be accessible outside the cluster (by storing azure blob storage).

We are working towards HDInsight Integration with Azure Operations management suite. Which let's you add custom logs in addition to common Spark logs. Let us know if this is something you are interested in and we can invite you to our preview.
Thanks,
Ashish
ashishth#microsoft.com

Related

Postgresql server not shown on azure application map

I'm trying to use Application Insights to monitor an application composed of different microservices in an AKS (Azure Kubernetes Services) cluster.
As AKS does not support the auto-instrumentation scenario, I had to instrument myself my js/.net services with the dedicated libs.
And this works fine, I can see my different microservices on an application map.
But I can't see my database server in the dependencies like in the documentation's example, even if those dependencies should be automatically collected as stated in the dependencies documentation.
I'm using Azure Database for PostgreSQL - Flexible Server. Is this normal? Is it due to the fact I am using PostgreSQL instead of SQL Server? Is this related to the fact that I'm using Npqsl instead of SqlClient ?

Mounting Azure Blob Storage to Azure Databricks without using cluster

We have a requirement that while provisioning the Databricks service thru CI/CD pipeline in Azure DevOps we should able to mount a blob storage to DBFS without connecting to a cluster. Is it possible to mount object storage to DBFS cluster by using a bash script from Azure DevOps ?
I looked thru various forums but they all mention about doing this using dbutils.fs.mount but the problem is we cannot run this command in Azure DevOps CI/CD pipeline.
Will appreciate any help on this.
Thanks
What you're asking is possible but it requires a bit of extra work. In our organisation we've tried various approaches and I've been working with Databricks for a while. The solution that works best for us is to write a bash script that makes use of the databricks-cli in your Azure Devops pipeline. The approach we have is as follows:
Retrieve a Databricks token using the token API
Configure the Databricks CLI in the CI/CD pipeline
Use Databricks CLI to upload a mount script
Create a Databricks job using the Jobs API and set the mount script as file to execute
The steps above are all contained in a bash script that is part of our Azure Devops pipeline.
Setting up the CLI
Setting up the Databricks CLI without any manual steps is now possible since you can generate a temporary access token using the Token API. We use a Service Principal for authentication.
https://learn.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/tokens
Create a mount script
We have a scala script that follows the mount instructions. This can be Python as well. See the following link for more information:
https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html#mount-azure-data-lake-storage-gen2-filesystem.
Upload the mount script
In the Azure Devops pipeline the databricks-cli is configured by creating a temporary token using the token API. Once this step is done, we're free to use the CLI to upload our mount script to DBFS or import it as a notebook using the Workspace API.
https://learn.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/workspace#--import
Configure the job that actually mounts your storage
We have a JSON file that defines the job that executes the "mount storage" script. You can define a job to use the script/notebook that you've uploaded in the previous step. You can easily define a job using JSON, check out how it's done in the Jobs API documentation:
https://learn.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/jobs#--
At this point, triggering the job should create a temporary cluster that mounts the storage for you. You should not need to use the web interface, or perform any manual steps.
You can apply this approach to different environments and resource groups, as do we. For this we make use of Jinja templating to fill out variables that are environment or project specific.
I hope this helps you out. Let me know if you have any questions!

Spring Cloud Data Flow with Azure Event Hub limitations?

We plan to use Spring Cloud Data Flow on Azure Cloud using Azure EventHub as a messaging binder.
On Azure EventHub, there are hard limits :
100 Namespaces
10 topics per namespaces.
The Spring Cloud Azure Event Hub Stream Binder seems to be able to configure only one namespace, so how can we manage multiple namespaces?
Maybe we should use multiple binders, to have multiple instances of the Spring Cloud Azure Event Hub Stream Binder?
Does anyone have any ideas? or documentation we did not find?
Regards
RĂ©mi
Spring Cloud Data Flow and Spring Cloud Skipper support the concept of "platform accounts". Using that, you can set up multiple accounts, for each namespace or any other K8s clusters even. This opens a lot of flexibility to work around these hard limits in Azure stack.
We have a recipe on multi-platform deployments.
When deploying the streams from SCDF, you'd pick and choose the platform account (aka namespace or other configs), so automatically the deployed stream apps (with Azure binder in the classpath) would be running in different namespaces. Effectively, dodging the limits enforced in Azure.
The provenance tracking of where the apps run and the audit trail is automatically also captured in SCDF, so at any given time, you'd know who did what and in which namespace.

Spring Cloud Dataflow - how to pass credentials to task

I use spring cloud dataflow deployed to pivotal cloud foundry, to run spring batch jobs as spring cloud tasks, and the jobs require aws credentials to access an s3 bucket.
I've tried passing the aws credentials as task properties, but the credentials are showing up in the task's log files as arguments or properties. (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#spring-cloud-dataflow-global-properties)
For now, I am manually setting the credentials as env variables in pcf after each deployment, but I'm trying to automate this. The tasks aren't deployed until the tasks are actually launched, so on a deployment I have to launch the task, then wait for it to fail due to missing credentials, then set the env variable credentials with the cf cli. How so I provide these credentials, without them showing in the pcf app's logs?
I've also explored using vault and spring cloud config, but again, I would need to pass credentials to the task to access spring cloud config.
Thanks!
Here's a Task/Batch-Job example.
This App uses spring-cloud-starter-aws. And this starter already provides the Boot autoconfiguration and the ability to override AWS creds as Boot properties.
You'd override the properties while launching from SCDF like:
task launch --name S3LoaderJob --arguments "--cloud.aws.credentials.accessKey= --cloud.aws.credentials.secretKey= --cloud.aws.region.static= --cloud.aws.region.auto=false"
You can decide to control the log-level of the Task, so it doesn't log them in plain text.
Secure credentials for tasks should be configured either via environment variables in your task definition or by using something like Spring Cloud Config Server to provide them (and store them encrypted at rest). Spring Cloud Task stores all command line arguments in the database in clear text which is why they should not be passed that way.
After considering the approaches included in the provided answers, I continued testing and researching and concluded that the best approach is to use a Cloud Foundry "User Provided Service" to supply AWS credentials to the task.
https://docs.cloudfoundry.org/devguide/services/user-provided.html
Spring Boot auto-processes the VCAP_SERVICES environment variable included in each app's container.
http://engineering.pivotal.io/post/spring-boot-injecting-credentials/
I then used properties placeholders in the application-cloud.properties to map the processed properties into spring-cloud-aws properties:
cloud.aws.credentials.accessKey=${vcap.services.aws-s3.credentials.aws_access_key_id}
cloud.aws.credentials.secretKey=${vcap.services.aws-s3.credentials.aws_secret_access_key}

Set Monitoring Level to Verbose in an Azure Web Role using Powershell

I've created some custom Performance Counters in our web application deployed to an Azure Web Role. In order to be able to see the values of that Performance Counters in the dashboard, I have to go to the portal, set the Monitoring Level to Verbose, and add the new Metrics in the dashboard.
The problem is that we are creating the infrastructure by code using PowerShell, and every time we recreate the infrastructure, we lost these settings.
Can I set the Monitoring Level and the Metrics (and possibly alerts) via PowerShell?
It seems that you cannot set the monitoring levels and metrics via PowerShell or the REST API. The only think you are allowed to do via REST is to create alerts: http://msdn.microsoft.com/en-us/library/azure/dn510366.aspx
Thanks.