Loading Data From API which uses continuous token and Load it to DWH - rest

I am trying to load data from citrixcloud Rest API which uses continuous token and Load it to DWH using ADF. But the processes time is taking more time than python. Can anyone has any solution to make processes faster:
Continuous token will be coming out in json and when we pass continuous token to restapi again it will fetch next set of records
I want to know if any other way that we can do to make it work fast. In python it take 2 mins to run in ADF it taken 14 mins to run.

Related

How can I efficiently call api that returns 2+ million data, save the data locally and show to user on flutter

I am currently working on an app that works offline, when the app starts for the first time i need to call api to load some large data, process it and save locally.
My first approach was to load all the data at once, but doing this, returns upstream request timeout error as the data are just too large.
current approach i am using is to make the api call with limit to the number of data i get, process it and show to user.
Now what is the best approach to get the remaining data? i don't want to use pagination as i have a search feature in the app.
If you have any better alternative way to handle this do suggest please.

Cannot create a batch pipeline to get data from ZohoCRM with http plugin 1.2.1 to BigQuery. Retuns Spark Program 'phase-1' failed

My first post here and I'm new to Data Fusion and I'm with low to no coding skills.
I want to get data from ZohoCRM to BigQuery. Module from ZohoCRM (e.g. accounts, contacts...) to be a separate table in BigQuery.
To connect to Zoho CRM I obtained a code, token, refresh token and everything needed as described here https://www.zoho.com/crm/developer/docs/api/v2/get-records.html. Then I ran a successful get records request as described here via Postman and it returned the records from Zoho CRM Accounts module as JSON file.
I thought it will be all fine and set the parameters in Data Fusion
DataFusion_settings_1 and DataFusion_settings_2 it validated fine. Then I previewed and ran the pipeline without deploying it. It failed with the following info from the logs logs_screenshot. I tried to manually enter a few fields in the schema when the format was JSON. I tried changing the format to csv, nether worked. I tried switching the Verify HTTPS Trust Certificates on and off. It did not help.
I'd be really thankful for some help. Thanks.
Update, 2020-12-03
I got in touch with Google Cloud Account Manager, who then took my question to their engineers and here is the info
The HTTP plugin can be used to "fetch Atom or RSS feeds regularly, or to fetch the status of an external system" it does not seems to be designed for APIs
At the moment a more suitable tool for data collected via APIs is Dataflow https://cloud.google.com/dataflow
"Google Cloud Dataflow is used as the primary ETL mechanism, extracting the data from the API Endpoints specified by the customer, which is then transformed into the required format and pushed into BigQuery, Cloud Storage and Pub/Sub."
https://www.onixnet.com/insights/gcp-101-an-introduction-to-google-cloud-platform
So in the next weeks I'll be looking at Data Flow.
Can you please attach the complete logs of the preview run? Make sure to redact any PII data. Also what is the version of CDF you are using? Is CDF instance private or public?
Thanks and Regards,
Sagar
Did you end up using Dataflow?
I am also experiencing the same issue with the HTTP plugin, but my temporary way to go around it was to use a cloud scheduler to periodically trigger a cloud function that fetches my data from the API and exports them as a JSON to GCS, which can then be accessed by Data Fusion.
My solution is of course non-ideal, so I am still looking for a way to use the Data Fusion HTTP plugin. I was able to make it work to get sample data from public API end-points, but for a reason still unknown to me I can't get it to work for my actual API.

Most suitable Azure service make thousands of outbound API calls

I am currently working on a project and trying to find out what is the most suitable Azure option to make daily API calls to Dark Sky(https://darksky.net/dev)
In short we are looking to bring weather data inside our data lake storage(Azure Data Lake Store), and we would have to make 100s of thousands of calls a day to dark sky in order to retrieve the data(unfortunatelly they don't provice batch yet). My question is which service would be best to make the calls, get the data and write it to the data lake?
The ingestion will not be continous, and I would like to combine the messages in order to minimize write operations on the data lake.
I have been considering Azure functions and Azure batch, but I am unsure which one is more suitable and also more cost efective considering the given requirements.
Thank you for your help.

How to measure performance/load time of dashboard charts?

I need to do performance testing over dashboard which contains different types of charts in a web page. All I have is JMeter now and I want to improvise the testing for dashboards / charts.
Kindly suggest me your ideas on how to do performance testing over multiple charts in a web page.
We are using Jasper Reports for data fetching and charts are made using JS components.
I need to measure the charts total load time of all the charts along with data i.e. I need to measure load time with complete UI loaded.
Thanks in advance.
First of all, have you seen Performance Testing with JMeter chapter? It explains how you can use JMeter for recording your scenario in browser and converting it into JMeter .jmx script.
JavaScript doesn't do any magic, it generates HTTP requests which can be intercepted with JMeter's HTTP(S) Test Script Recorder and replayed with increased load later on.
I would only suggest considering using Parallel Controller as AJAX requests are often being executed asynchronously (in parallel) and well-behaved JMeter test must act exactly like real user using real browser.

Need a design/approach to build a custom sync engine between elasticsearch and RestAPIs

My application is interacting with elasticsearch instance for all data retrieval and searchability. Elasticsearch needs to feed data from RestAPI (Bulk data around 2-4 gigs). For that I need to write my rest Client to feed elasticsearch with the rest api. Which is perfectly alright.
Since client database is not exposed and we are integration with Rest endpionts only, how to fetch the last updated data at database (mongoDB). Also there is no field called "Last updated" in mongoDB.
I need to write a custom transporter utility which will run on a fixed time, which will fetch the data from rest apis, compare it with elasticsearch and update the modified content only.
Please help !!!!
Thanks