how to connect Power BI service to google cloud storage (Bucket) - google-cloud-storage

I was working with Azure Blob Storage, where I stored some csv files. I then build some dashboards on PowerBI using those csv files.
The connection between Power BI and Azure Blob storage is easy going.
Now I want to use the same concept, but only replacing Azure Blob Storage with Google Cloud Storage Bucket(GCS-B).
My problem is, I can't connect Power BI to GCS-B. Any Ideas?

After a long googling, documentation reading... There is What I got.
- The only way to connect Power BI service to google cloud storage, is by running a Virtual machine as a Getway, then running some script inside this virtual machine that get the data from gcs and then load it to power bi. I think this is not a useful way to do it.
- What I tried and it works fine, but unfortunately only for power BI desktop is the following R script. That I run in power Bi as data source option:
wd <-getwd()
setwd(wd)
file.create("service_account.json")
json <- '{"type": "service_account",
"project_id": "xxxxxxx",
"private_key_id": "xxxxx"
"private_key": "-----BEGIN PRIVATE KEY-----\\n...
...}'
write(json, "service_account.json")
write(j, "service_account.json")
options(googleAuthR.scopes.selected = "https://www.googleapis.com/auth/cloud-
platform")
library(googleCloudStorageR)
Sys.setenv("GCS_AUTH_FILE" = "service_account.json")
### optional, if you haven't set environment argument GCS_AUTH_FILE
gcs_auth()
gcs_global_bucket("xxxxx")
gcs_get_global_bucket()
df <- gcs_get_object(gcs_list_objects()$name[[1]])
with this script you can load the first csv file from the specified bucket to power BI desktop.

Just make the bucket or the object public, how to do it read here:
Than copy link from each file (column Public access) and add it to Power Bi like Web source.

Related

ADF Copy data from Azure Data Bricks Delta Lake to Azure Sql Server

I'm trying to use the data copy activity to extract information from azure databricks delta lake, but I've noticed that it doesn't pass the information directly from the delta lake to the SQL server I need, but must pass it to an azure blob storage, when running it, it throws the following error
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key Caused by: Invalid configuration value detected for fs.azure.account.key
Looking for information I found a possible solution but it didn't work.
Invalid configuration value detected for fs.azure.account.key copy activity fails
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
These are some images of the structure that I have in ADF:
In the image I get a message that tells me that I must have a Storage Account to continue
These are the configuration images, and execution failed:
Conf:
Fail:
Thank you very much
The solution for this problem was the following:
Correct the way the Storage Access Key configuration was being defined:
in the instruction: spark.hadoop.fs.azure.account.key..blob.core.windows.net
The following change must be made:
spark.hadoop.fs.azure.account.key.
storageaccountname.dfs.core.windows.net
Does anyone have any idea how the hell to pass information from an azure databricks delta lake table to a table in Sql Server??
To achieve Above scenario, follow below steps:
First go to your Databricks cluster Edit it and under Advance options >> spark >> spark config Add below code if you are using blob storage.
spark.hadoop.fs.azure.account.key.<storageaccountname>.blob.core.windows.net <Accesskey>
spark.databricks.delta.optimizeWrite.enabled true
spark.databricks.delta.autoCompact.enabled true
After that as you are using SQL Database as a sink.
Enable staging and give same blob storage account linked service as Staging account linked service give storage path from your blob storage.
And then debug it. make sure you complete Prerequisites from official document.
My sample Input:
Output in SQL:

Reading Json file from Azure datalake as a file using Json.load in Azure databricks /Synapse notebooks

I am trying to parse Json data with multi nested level. I am using the approach is giving filename and using open(File-name) to load the data. when I am providing datalake path, it is throwing error that file path not found. I am able to read data in dataframes but How can I read file from data lake without converting to dataframes and reading it as a file and open it?
Current code approach on local machine which is working:
f = open(File_Name.Json)
data = json.load(f)
Failing scenario when provding datalake path:
f = open(Datalake path/File_Name.Json)
data = json.load(f)
You need to mount the data lake folder to a location in dbfs (in Databricks), although mounting is a security risk. Anyone with access to Databricks resource will have access to all mounted locations.
Documentation on mounting to dbfs: https://docs.databricks.com/data/databricks-file-system.html#mount-object-storage-to-dbfs
The open function works only with local files, not understanding (out of box) the cloud file paths. You can of course try to mount the cloud storage, but as it was mentioned by #ARCrow, it would be a security risk (until you create so-called passthrough mount that will control access on the cloud storage level).
But if you're able to read file into dataframe, then it means that cluster has all necessary settings for accessing the cloud storage - in this case you can just use dbutils.fs.cp command to copy file from the cloud storage to local disk, and then open it with open function. Something like this:
dbutils.fs.cp("Datalake path/File_Name.Json", "file:///tmp/File_Name.Json")
with open("/tmp/File_Name.Json", "r") as f:
data = json.load(f)

How to use file name prefix in Data Factory when importing data into Azure data lake from SAP BW Open Hub?

I have a source of SAP BW Open Hub in data factory and a sink of Azure data lake gen2 and am using a copy activity to move the data.
I am attempting to transfer the data to the lake and split into numerous files, with 200000 rows per file. I would also like to be able to prefix all of the filenames e.g. 'cust_', so the files would be something along the lines of cust_1, cust_2, cust_3 etc.
This method only seems to be an issue when using SAP BW Open Hub as a source (it works fine when using SQL Server as a source. Please see the warning message below. After checking with out internal SAP BW team, they assure me that the data is in a tabular format, and no explicit partition is enabled, so there shouldn't be an issue.
When executing the copy activity, the files are transferred to the lake but the file name prefix setting is ignored, and the filenames instead are set automatically, as below (the name seems to be automatically made up of the SAP BW Open Hub table and the request ID):
Here is the source config:
All other properties on the other tabs are set to default and have been unchanged.
QUESTION: without using a data flow, is there any way to split the files when pulling from SAP BW Open Hub and also be able to dictate the filenames in the lake?
I tried to reproduce the issue and it works fine with a work around. Instead of splitting the data while copying from SAP BW to Azure data lake storage, you can just simply copy the entire exact data (without partition) into the Azure SQL Database. Please follow copy data from SAP Business warehouse by using azure data factory (make sure to use Azure SQL Database as sink).
Now the data is in you Azure SQL Database, you can now simply use the copy activity to copy the data to Azure data lake storage.
In source configuration, keep “Partition option” as None.
Source Config:
Sink config:
Output:

Producing a CSV of Cloud Bucket files

What's the best way to create a CSV file listing images in a Google Cloud bucket to be imported into AutoML Vision?
If you want to listen the files that are saved on a bucket you can use a Google cloud function to listen the new files and create the csv file in another bucket
For example you can use this python code as starting point, this code log the details of a new uploaded file
def hello_gcs_generic(data, context):
"""Background Cloud Function to be triggered by Cloud Storage.
This generic function logs relevant data when a file is changed.
Args:
data (dict): The Cloud Functions event payload.
context (google.cloud.functions.Context): Metadata of triggering event.
Returns:
None; the output is written to Stackdriver Logging
"""
print('Event ID: {}'.format(context.event_id))
print('Event type: {}'.format(context.event_type))
print('Bucket: {}'.format(data['bucket']))
print('File: {}'.format(data['name']))
print('Metageneration: {}'.format(data['metageneration']))
print('Created: {}'.format(data['timeCreated']))
print('Updated: {}'.format(data['updated']))
Basically the function is listening the storage events "google.storage.object.finalize
" (this happen when a file is uploaded)
To deploy this function on the cloud you can use this command
gcloud functions deploy hello_gcs_generic --runtime python37 --trigger-resource [your bucket name] --trigger-event google.storage.object.finalize
or you can use the GCP console (Web UI) to deploy this function.
selecting "cloud storage" on the trigger field
select "Finalize/create" on the event type
specifiying your bucket
Even you can directly process the files using Auto ML within a cloud function as is mentioned in this example.

How do you use storage service in Bluemix?

I'm trying to insert some storage data onto Bluemix, I searched many wiki pages but I couldn't come to conclude how to proceed. So can any one tell me how to store images, files in storage of Bluemix through any language code ( Java, Node.js)?
You have several options at your disposal for storing files in your app. None of them include doing it in the app container file system as the file space is ephemeral and will be recreated from the droplet each time a new instance of your app is created.
You can use services like MongoLab, Cloudant, Object Storage, and Redis to store all kinda of blob data.
Assuming that you're using Bluemix to write a Cloud Foundry application, another option is sshfs. At your app's startup time, you can use sshfs to create a connection to a remote server that is mounted as a local directory. For example, you could create a ./data directory that points to a remote SSH server and provides a persistent storage location for your app.
Here is a blog post explaining how this strategy works and a source repo showing it used to host a Wordpress blog in a Cloud Foundry app.
Note that as others have suggested, there are a number of services for storing object data. Go to the Bluemix Catalog [1] and select "Data Management" in the left hand margin. Each of those services should have sufficient documentation to get you started, including many sample applications and tutorials. Just click on a service tile, and then click on the "View Docs" button to find the relevant documentation.
[1] https://console.ng.bluemix.net/?ace_base=true/#/store/cloudOEPaneId=store
Check out https://www.ng.bluemix.net/docs/#services/ObjectStorageV2/index.html#gettingstarted. The storage service in Bluemix is OpenStack Swift running in Softlayer. Check out this page (http://docs.openstack.org/developer/swift/) for docs on Swift.
Here is a page that lists some clients for Swift.
https://wiki.openstack.org/wiki/SDKs
As I search There was a service that name was Object Storage service and also was created by IBM. But, at the momenti I couldn't see it in the Bluemix Catalog. I guess , They gave it back and will publish new service in the future.
Be aware that pobject store in bluemix is now S3 compatible. So for instance you can use Boto or boto3 ( for python guys ) It will work 100% API comaptible.
see some example here : https://ibm-public-cos.github.io/crs-docs/crs-python.html
this script helps you to list recursively all objects in all buckets :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint)
for bucket in s3.buckets.all():
print(bucket.name)
for obj in bucket.objects.all():
print(" - %s") % obj.key
If you want to specify your credentials this would be :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
for bucket in s3.buckets.all():
print(bucket.name)
for obj in bucket.objects.all():
print(" - %s") % obj.key
If you want to create a "hello.txt" file in a new bucket. :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
my_bucket=s3.create_bucket('my-new-bucket')
s3.Object(my_bucket, 'hello.txt').put(Body=b"I'm a test file")
If you want to upload a file in a new bucket :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
my_bucket=s3.create_bucket('my-new-bucket')
timestampstr = str (timestamp)
s3.Bucket(my_bucket).upload_file(<location of yourfile>,<your file name>, ExtraArgs={ "ACL": "public-read", "Metadata": {"METADATA1": "resultat" ,"METADATA2": "1000","gid": "blabala000", "timestamp": timestampstr },},)