I am currently working on a project and trying to find out what is the most suitable Azure option to make daily API calls to Dark Sky(https://darksky.net/dev)
In short we are looking to bring weather data inside our data lake storage(Azure Data Lake Store), and we would have to make 100s of thousands of calls a day to dark sky in order to retrieve the data(unfortunatelly they don't provice batch yet). My question is which service would be best to make the calls, get the data and write it to the data lake?
The ingestion will not be continous, and I would like to combine the messages in order to minimize write operations on the data lake.
I have been considering Azure functions and Azure batch, but I am unsure which one is more suitable and also more cost efective considering the given requirements.
Thank you for your help.
Related
I have a flutter application that is meant for collecting data on road usage, and I would like to carry out analysis after collection of data.
I've saved the file and I am able to read the file via readAsString and writeAsString to
'data/user/0/com.example.flutter_example_app/app_flutter/fault_report.txt'
How can I access the data quickly and easily without having to integrate too much stuff [For instance, opening an API and writing to cloud and etc]?
The app is supposed to be just a collection method and all analysis will be carried out afterward.
It depends on how ofter you would like to retrieve your data. If you only want to get the data once in a while, you can use File Explorer, Bluetooth file sharing, Cloud Drive or etc. to retrieve your data. But if you want to retrieve data more often, making some simple api will not take too much time.
we have a scenario where we need to aggregate data from several services and show in UI. The current scenario is when an agent logins in, we need to show cases assigned to that agent. Case information needs to be aggregated from several microservices. There would be around 1K cases assigned to agent at a time and all of the needs to be shown to agent so that he can perform sorting based on certain case data.
What be best approach to show data in this scenario? should we do API calls to several services for each case and aggregate and show ? Or there are better approaches to achieve this.
No. You'll certainly not call multiple APIs to aggregate data on runtime. Even if you call the apis parallely, it will be a huge latency.
You need to pre-aggregate the case details and cache them in a distributed caching system (e.g. Redis or memcached) using a streaming platform (e.g. Kafka). Also, store the pre-aggregated case details in a persistent database. Basically, it's a kind of materialized views.
Caching will enable you to serve the case details fast to the user without any noticeable latency. And streaming will help you to keep the cache and DB aggregations updated in a near-real time. Storing the materialized view in database will save you from storing everything in memory. You can use an LRU cache. Only the recently used data will be in cache. If you need to show any case data that is not in cache, you'd read it from database and store it in cache for future requests.
I recommend you read these two Martin Kleppmann articles here and here
I am working on the migration of a db2 process, which connects to several remotes servers, exports data into our local db, and then manipulates it (insert computed data, calculated times, etc) I have created some activities in DataConnect to replicate the export data from different datasources and load to local tables. The scripts that handles the data have to be done in DB2 Warechouse on Cloud (ex dashdb)
Currently, this scripts run automatically triggered by the first task (manual) However, having the new processes separated (2 services) it does not allow me to automate it. Furthermore, we have many activities in dataconnect, then it keeps switching between dc and db2...and you have to go from one console to the other.
Does anyone know of a Bluemix service which allow schedule or trigger jobs or events from services? Is there a way to use the API and programmatically do this?
Thanks
Well,
Bluemix offers Workload Scheduler. Data Connect allows to schedule activities.
Juan,
A couple of things come to mind for automation here. Do the databases that you are speaking of have IP line of sight to the Warehouse DB ? If so remote tables may be able to help depending on the source database. Once the data is visible you should be able to write a SQL process that manages the process all from Warehouse DB.
The other possibility is external tables as long as the data is visible on the head node. There are some other choices like s3 storage. I think the concept is if that you can push your data into S3 storage you can pull it into Warehouse DB. I think you should be able to coordinate this all from the Warehouse db side as long as the data is visible through remote tables and/or external tables.
I'm trying to fetch all events from entire system by using REST API to synchronize with own application. I was extracting events for every user using REST API for his own calendar file.
For example:
Fetch johndoe.nsf/api/calendar/events
Fetch jasonmartin.nsf/api/calendar/events
Fetch jeanmoore.nsf/api/calendar/events
etc.
It's working with low number of users. But I need to do it for around 2,5k users, which kills my system.
Is there any central database from I can extract this data?
I tried this with resource reservation databese, but only what I got was empty response.
No, there is no central database of calendar events. There couldn't be. Notes and Domino is a distributed environment. Information can be spread over dozens of servers.
But you could write Java or C application that runs on the Domino server and aggregates the information from all the users' calendars into one central database, and that application will probably run faster than your remote calls through the REST API. But you'll still have to make REST API calls into that central database, and the sum of the activity will be greater than what you are dealing with now.
Maybe my iCal freeware tool could help you
http://abdata.ch/publish-ibm-domino-calendar-entries-in-icalendar-format/
I am asking for advice on possibly better solutions for the part of the project I'm working on. I'll first give some background and then my current thoughts.
Background
Our clients can use my company's products to generate potentially large data sets for use in their industry. When the data sets are generated, the clients will file a processing request to us.
We want to send the clients a summary email which contains some statistical charts as well as sampling points from the data sets so they can do some initial quality control work. If the data sets are of bad quality, they don't need to file any request.
One problem is that the charts and sampling points can be potentially too large to be sent in an email. The charts and the sampling points we want to include in the emails are pictures. Although we can use low-quality format such as JPEG to save space, we cannot control how many data sets would be included in the summary email, so the total size could still exceed the normal email size limit.
In terms of technologies, we are mainly developing in Python on Ubuntu 14.04.
Goals of the Solution
In general, we want to present a report-like thing to the clients to do some initial QA. The report may contains external links but does not need to be very interactive. In other words, a static report should be fine.
We want to reduce the steps or things that our clients must do to read the report. For example, if the report can be just an email, the user only needs to 1). log in and 2). open the email. If they use a client software, they may skip 1). and just open and begin to read.
We also want to minimize the burden of maintaining extra user accounts for both us and our clients. For example, if the solution requires us to register a new user account, this solution is, although still acceptable, not ranked very high.
Security is important because our clients don't want their reports to be read by unauthorized third parties.
We want the process automated. We want the solution to provide programming interface so that we can automate the report sending/sharing process.
Performance is NOT a critical issue. Our user base is not large. I think at most in hundreds. They also don't generate data that frequently, at most once a week. We don't need real-time response. Even a delay of a few hours is still acceptable.
My Current Thoughts of Solution
Possible solution #1: In-house web service. I can set up a server machine and develop our own web service. We put the report into our database and the clients can then query via the Internet.
Possible solution #2: Amazon Web Service. AWS is quite mature but I'm not sure if they could be expensive because so far we just wanna share a report with our remote clients which doesn't look like a big deal to use AWS.
Possible solution #3: Google Drive. I know Google Drive provides API to do uploading and sharing programmatically, but I think we need to register a dedicated Google account to use that.
Any better solutions??
You could possibly use AWS S3 and Cloudfront. Files can easily be loaded into S3 using the AWS SDK's and API. You can then use the API to generate secure links to the files that can only be opened for a specific time and optionally from a specific IP.
Files on S3 can also be automatically cleaned up after a specific time if needed using lifecycle rules.
Storage and transfer prices are fairly cheap with AWS and remember that the S3 storage cost indicated is by the month so if you only have an object loaded for a few days then you only pay for a few days.
S3: http://aws.amazon.com/s3/pricing
Cloudfront: https://aws.amazon.com/cloudfront/pricing/
Here's a list of the SDK's for AWS:
https://aws.amazon.com/tools/#sdk
Or you can use their command line tools for Windows batch or powershell scripting:
https://aws.amazon.com/tools/#cli
Here's some info on how the private content urls are created:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
I will suggest to built this service using mix of your #1 and #2 options. You can do the processing and for transferring the data leverage AWS S3 which is quiet cheap.
Example: 100GB costs like approx $3.
Also AWS S3 will be beneficial as you are covered for any disaster on your local environment your data will be safe in S3.
For security you can leverage data encryption and signed URLS in AWS S3.