According to Swift docs https://docs.openstack.org/swift/latest/overview_expiring_objects.html adding X-Delete-After header to a PUT or POST object will make it expire after the seconds passed in that value.
Is this feature supported in IBM Bluemix Object Storage?
The IBM Bluemix Object Storage supports expiring objects, i.e., automatic deletion. You can use the X-Delete-At or X-Delete-After header attributes that you mentioned in your question. As usual, this is found in the documentation for IBM Bluemix Object Storage under "Managing Objects".
The X-Delete-At uses epoch time and this command (taken from the docs) would have deleted the object at "2016/04/01 08:00:00":
swift post -H "X-Delete-At:1459515600" container1 file7
Related
My first post here and I'm new to Data Fusion and I'm with low to no coding skills.
I want to get data from ZohoCRM to BigQuery. Module from ZohoCRM (e.g. accounts, contacts...) to be a separate table in BigQuery.
To connect to Zoho CRM I obtained a code, token, refresh token and everything needed as described here https://www.zoho.com/crm/developer/docs/api/v2/get-records.html. Then I ran a successful get records request as described here via Postman and it returned the records from Zoho CRM Accounts module as JSON file.
I thought it will be all fine and set the parameters in Data Fusion
DataFusion_settings_1 and DataFusion_settings_2 it validated fine. Then I previewed and ran the pipeline without deploying it. It failed with the following info from the logs logs_screenshot. I tried to manually enter a few fields in the schema when the format was JSON. I tried changing the format to csv, nether worked. I tried switching the Verify HTTPS Trust Certificates on and off. It did not help.
I'd be really thankful for some help. Thanks.
Update, 2020-12-03
I got in touch with Google Cloud Account Manager, who then took my question to their engineers and here is the info
The HTTP plugin can be used to "fetch Atom or RSS feeds regularly, or to fetch the status of an external system" it does not seems to be designed for APIs
At the moment a more suitable tool for data collected via APIs is Dataflow https://cloud.google.com/dataflow
"Google Cloud Dataflow is used as the primary ETL mechanism, extracting the data from the API Endpoints specified by the customer, which is then transformed into the required format and pushed into BigQuery, Cloud Storage and Pub/Sub."
https://www.onixnet.com/insights/gcp-101-an-introduction-to-google-cloud-platform
So in the next weeks I'll be looking at Data Flow.
Can you please attach the complete logs of the preview run? Make sure to redact any PII data. Also what is the version of CDF you are using? Is CDF instance private or public?
Thanks and Regards,
Sagar
Did you end up using Dataflow?
I am also experiencing the same issue with the HTTP plugin, but my temporary way to go around it was to use a cloud scheduler to periodically trigger a cloud function that fetches my data from the API and exports them as a JSON to GCS, which can then be accessed by Data Fusion.
My solution is of course non-ideal, so I am still looking for a way to use the Data Fusion HTTP plugin. I was able to make it work to get sample data from public API end-points, but for a reason still unknown to me I can't get it to work for my actual API.
Is there a way through either the IBM Cloud API, or the Softlayer API, to programmatically run/schedule/setup snapshots on an endurance storage device? aka iSCSI drive.
I've looked through the documentation, but have not found anything.
you need to take a look at these methods:
https://sldn.softlayer.com/reference/services/softlayer_network_storage/createsnapshot
the method above will allow you to create a new manual snapshot
https://sldn.softlayer.com/reference/services/softlayer_network_storage/enablesnapshots
the method above will allow you to schedule the snapshots
see bellow some examples of code:
https://softlayer.github.io/php/enableSnapshots/
https://softlayer.github.io/rest/createsnapshot/
I am using a data factory pipeline with a custom activity (configured to run on Azure Batch) that has a data lake store input dataset and output dataset. The data lake store linked service is using service to service auth (service principal) and is working fine while being used in a Copy activity through Copy Wizard. But when used with a custom activity that tries to check if a file is present in the data lake, the activity fails with an error "Authorization is required". Upon using a Azure Blob Store as the input and output datasets, the same custom activity works fine.
Seems like an issue with Azure Batch (Compute node) not able to authorize Data Lake Store. Please help if you have solved the above mentioned problem.
I had this exact same issue about 3 weeks ago. I feel your pain!
This is a Microsoft bug!
After much trial and error and redeployments I raised a support ticket with Microsoft who confirmed that service principal authentication for data lake store currently only works with copy activities. Not with custom activities.
This is the official response I got on Monday 10th April.
The issue happen because of a bug that custom activity’s connector
schema doesn’t match the latest published connector schema. Actually,
we notice the issue on custom activity and have plan to fix & deploy
to prod in next 2 weeks.
Be aware that if you change your linked service back to use a session token etc you'll also need to redeploy your pipelines that contain the custom activities. Otherwise you'll get another error something like the following...
Access is forbidden, please check credentials and try again. Code:
'AuthenticationFailed' Message: 'Server failed to authenticate the
request. Make sure the value of Authorization header is formed
correctly including the signature.
Hope this helps.
What does eventual or strong mean in the context of Google Cloud Storage consistency?
From the Consistency section of the documentation:
Google Cloud Storage provides strong global consistency for all
read-after-write, read-after-update, and read-after-delete operations,
including both data and metadata. When you upload a file (PUT) to
Google Cloud Storage, and you receive a success response, the object
is immediately available for download (GET) and metadata (HEAD)
operations, from any location in Google's global network.
That means it will take time to replicate all over the networks and it will not be available until the replication is finished (to demonstrate strong consistency). It is more understandable by the statement from the doc that says, "When you upload an object, the object is not available until it is completely uploaded." And that is why, the latency for writing to a globally-consistent replicated store may be slightly higher than to a non-replicated or non-committed store because a success response is returned only when multiple writes complete, not just one. Here what it says more.
IN this article "http://en.wikipedia.org/wiki/Object_storage" ,
It says Lustre is a object based file system and says ceph is a hybrid storage.
I really don't know their differences. ceph also is a distributed file system , block storage , object storage. Anyone know of that , Ceph 's file system and block storage is based object store or not ?
The content of a file stored on the Ceph file system which provides a POSIX API can be retrieved via the librados API which is an Object Store API similar to SWIFT or S3. Although this is why Ceph deserves to be called a UFOS (Unified File and Object Storage) or hybrid storage, it is not a supported use case.
If the Ceph file system implementation changes to modify the names of the objects used to store the content of the files, the user of the librados API will need to know about it and adapt.
An hybrid storage would allow the user to conveniently store an object named foo via the object store API and retrieve it under a similar name (for instance /objectstore/foo) via the POSIX API, without knowing the implementation details.