Create a Dataset in BigQuery using API - rest

So forgive my ignorance, but I can't seem to work this out.
I want to create a "table" in BigQuery, from an API call.
I am thinking https://developer.companieshouse.gov.uk/api/docs/search/companies/companysearch.html#here
I want to easily query the Companies House API, without writing oodles of code?
And then cross reference that with other datasets - like Facebook API, LinkedIn API.
eg. I want to input a company ID/ name on Companies house and get a fuzzy list of the people and their likely Social connections (Facebook, LinkedIn and Twitter)
Maybe BigQuery is the wrong tool for this? Should I just code it??
Or
It is, and adding a dataset with an API is just not obvious to me how to figure it out - in which case - please enlighten me.

You will not be able to directly use BigQuery and perform the task at hand. BigQuery is a web service that allows you to analyze massive datasets working in conjunction with Google Storage (or any other storage system).
The correct way of going about the situation would be to perform a curl request to collect all the data you require from Companies House and store the data as a spreadsheet (csv). Afterwards you may store the csv within Google Cloud Storage and load the data into BigQuery.
If you simply wish to link clients from Companies House and social media applications such as Facebook or LinkedIn, then you may not even need to use BigQuery. You may construct a structured table using Google Cloud SQL. The fields would consist of the necessary client information and you may later do comparisons with the FaceBook or LinkedIn API responses.

So if you are looking to load data from various sources and do big-query operations through API - Yes there is a way and adding to the previous answer, big-query is meant to do only analytical queries (on big data) otherwise simply, it's gonna cost you a lot and slower than a regular search api if you intend to do thousands of search queries on big datasets joining various tables etc.,
let's try to query using api from bigquery from public datasets
to authenticate - you will need to generate the authentication token using your application default credentials
gcloud auth print-access-token
Now using the token generated by gcloud command - you can use it for rest api calls.
POST https://www.googleapis.com/bigquery/v2/projects/<project-name>/queries
Authorization: Bearer <Token>
Body: {
"query": "SELECT tag, SUM(c) c FROM (SELECT CONCAT('stackoverflow.com/questions/', CAST(b.id AS STRING)), title, c, answer_count, favorite_count, view_count, score, SPLIT(tags, '|') tags FROM \`bigquery-public-data.stackoverflow.posts_questions\` a JOIN (SELECT CAST(REGEXP_EXTRACT(text,r'stackoverflow.com/questions/([0-9]+)/') AS INT64) id, COUNT(*) c FROM `fh-bigquery.hackernews.comments` WHERE text LIKE '%stackoverflow.com/questions/%' AND EXTRACT(YEAR FROM time_ts)>=#year GROUP BY 1 ORDER BY 2 DESC) b ON a.id=b.id), UNNEST(tags) tag GROUP BY 1 ORDER BY 2 DESC LIMIT #limit",
"queryParameters": [
{
"parameterType": {
"type": "INT64"
},
"parameterValue": {
"value": "2014"
},
"name": "year"
},
{
"parameterType": {
"type": "INT64"
},
"parameterValue": {
"value": "5"
},
"name": "limit"
}
],
"useLegacySql": false,
"parameterMode": "NAMED"
}
Response:
{
"kind": "bigquery#queryResponse",
"schema": {
"fields": [
{
"name": "tag",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "c",
"type": "INTEGER",
"mode": "NULLABLE"
}
]
},
"jobReference": {
"projectId": "<project-id>",
"jobId": "<job-id>",
"location": "<location>"
},
"totalRows": "5",
"rows": [
{
"f": [
{
"v": "javascript"
},
{
"v": "102"
}
]
},
{
"f": [
{
"v": "c++"
},
{
"v": "90"
}
]
},
{
"f": [
{
"v": "java"
},
{
"v": "57"
}
]
},
{
"f": [
{
"v": "c"
},
{
"v": "52"
}
]
},
{
"f": [
{
"v": "python"
},
{
"v": "49"
}
]
}
],
"totalBytesProcessed": "3848945354",
"jobComplete": true,
"cacheHit": false
}
Query - The most popular tags on Stack Overflow questions linked from Hacker News since 2014:
#standardSQL
SELECT tag, SUM(c) c
FROM (
SELECT CONCAT('stackoverflow.com/questions/', CAST(b.id AS STRING)),
title, c, answer_count, favorite_count, view_count, score, SPLIT(tags, '|') tags
FROM `bigquery-public-data.stackoverflow.posts_questions` a
JOIN (
SELECT CAST(REGEXP_EXTRACT(text,
r'stackoverflow.com/questions/([0-9]+)/') AS INT64) id, COUNT(*) c
FROM `fh-bigquery.hackernews.comments`
WHERE text LIKE '%stackoverflow.com/questions/%'
AND EXTRACT(YEAR FROM time_ts)>=2014
GROUP BY 1
ORDER BY 2 DESC
) b
ON a.id=b.id),
UNNEST(tags) tag
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
Result :
So, we do some of our analytical queries using api to build periodical reports. But, I let you explore the other options & big-query API to create & load data using API.

Related

In which case meta's whatsapp payload examples will receive with multiple element in array

Meta's whatsapp API integration and response on webhook,
https://developers.facebook.com/docs/whatsapp/cloud-api/webhooks/payload-examples
I am new to the whatsapp cloud integration and I am confused why inbound message response of webhook is too weird with nested array, in which cases facebook(meta) will give an multiple elements in nested of nested array.
Is it good way to get entry[0].changes[0].value.messages[0].text.body or I require to add loop on every case?
What are the changes we will received multiple elements?
{
"object": "whatsapp_business_account",
"entry": [{
"id": "WHATSAPP_BUSINESS_ACCOUNT_ID",
"changes": [{
"value": {
"messaging_product": "whatsapp",
"metadata": {
"display_phone_number": PHONE_NUMBER,
"phone_number_id": PHONE_NUMBER_ID
},
"contacts": [{
"profile": {
"name": "NAME"
},
"wa_id": PHONE_NUMBER
}],
"messages": [{
"from": PHONE_NUMBER,
"id": "wamid.ID",
"timestamp": TIMESTAMP,
"text": {
"body": "MESSAGE_BODY"
},
"type": "text"
}]
},
"field": "messages"
}]
}]
}
You can read the documentation of graph-api webhook,
https://developers.facebook.com/docs/graph-api/webhooks/getting-started#validate-payloads
Event Notifications are aggregated and sent in a batch with a maximum of 1000 updates. However batching cannot be guaranteed so be sure to adjust your servers to handle each Webhook individually.
You can also check the property-wise batch possibility in the provided link.

what is the best model to save data on elasticsearch?

I have a rails application and use elastic search as a search engine in my rails app. this app collects data from the mobile application and could collect from any kind of mobile app. mobile app sends two types of data user profile details and user actions details. my app admins could search over this data with multiple conditions and operations and fetch the specific results and which are user profile details. after that my app admins could communicate with this profile, for example, send an email, SMS, or even chat online. In my case I have two options to save user data; first of all, I want to save user profiles details and user action details in a separate document with this structure profile doc:
POST profilee-2022-06-09/_doc
{
"profile": {
"app_id": "abbccddeeff",
"profile_id": "2faae1d6-5875-4b36-b119-74a14589c841",
"whatsapp_number": "whatsapp:+61478421940",
"phone": "+61478421940",
"email": "user#mail.com",
"first_name": "john",
"last_name": "doe"
}
}
user actions details:
POST events_app_id_2022-05-17/_doc
{
"app_id": "9vlgwrr6rg",
"event": "Email_Sign_Up",
"profile_id": "2faae1d6-5875-4b36-b119-74a14589c840",
"media": "x1z1",
"date_time": "2022-05-17T11:48:02.511Z",
"device_id": "2faae1d6-5875-4b36-b119-74a14589c840",
"lib": "android",
"lib_version": "1.0.0",
"os": "Android",
"os_version": "12",
"manufacturer": "Google",
"brand": "google",
"model": "sdk_gphone64_arm64",
"google_play_services": "available",
"screen_dpi": 440,
"screen_height": 2296,
"screen_width": 1080,
"app_version_string": "1.0",
"app_build_number": 1,
"has_nfc": false,
"has_telephone": true,
"carrier": "T-Mobile",
"wifi": true,
"bluetooth_version": "ble",
"session_id": "b1ad31ab-d440-435f-ac12-3d03c30ac44f",
"insert_id": "1e285b51-abcf-46ae-8359-9a9d58970cdf"
}
As I said before app admins search over this document to fetch specific profiles and use that result to communicate with them, in this case, the problem is the mobile user could create a profile and a few days or a few months later create some actions so user profile details and user action details are generated in different days so if app admins want to fetch specific result from this data and wrote some complex query I have at least two queries by application on my elastic search in my app it's impossible because each query must save for later use by admin, so As a result of business logic it's impossible to me, and I have to add in some case I need to implement join query that based on elastic search documentation It has cost so it's impossible In the second scenario I decided to save both user profile and action in one docs somethings like this:
POST profilee-2022-06-09/_doc
{
"profile": {
"app_id": "abbccddeeff",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"whatsapp_number": "whatsapp:+61478421940",
"phone": "+61478421940",
"email": "user#mail.com",
"first_name": "john",
"last_name": "doe",
"events": [
{
"app_id": "abbccddeeff",
"event": "sign_in",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"media": "x1z1",
"date_time": "2022-06-06T11:52:02.511Z"
},
{
"app_id": "abbccddeeff",
"event": "course_begin",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"media": "x1z1",
"date_time": "2022-06-06T11:56:02.511Z"
},
{
"app_id": "abbccddeeff",
"event": "payment",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"media": "x1z1",
"date_time": "2022-06-06T11:58:02.511Z"
}
]
}
}
In this case, In the same state, I have to do as same as I do in before and I have to generate a profile index per day and append user action to it, so It means I have to update continuously each day, assume I have 100,000 profile and each one have 50 actions it means 100,000 * 50 per day update that have severity on my server so still it's impossible. So Could you please help me what is the best model to save my data in elastic search based on my descriptions?
Update: Does elastic search useful for my requirements? If I switch to other databases like MongoDB or add Hadoop it be more useful in my case?

Filtering for ads with performance data in a Facebook Graph API Request

I'm pulling ad performance for an entire business account using the Graph explorer and I would like to only pull the data for the ads that have conversion data (i.e. the 'insights' dictionary exists).
My query so far is:
<BUSINESS_ID>?fields=client_ad_accounts{ads{name,insights{impressions,inline_link_clicks,spend}}}
but that gives me the ad IDs for every single ad in each account. Most accounts have more than 2000 ads (most of which are inactive), so it's an unnecessarily large query.
Here is a small snippet from the result of the current query, with only one ad ID actually having performance data:
"ads": {
"data": [
{
"id": "xxxxx"
},
{
"id": "xxxxx"
},
{
"id": "xxxxx"
},
{
"id": "xxxxx"
},
{
"id": "xxxxx"
},
{
"insights": {
"data": [
{
"impressions": "3000",
"spend": "41.24",
"date_start": "2020-03-08",
"date_stop": "2020-04-06"
}
],
I've tried to use
&filtering=[{field: "insights",operator:"IN", value: ["client_ad_accounts.ads"]}]
and other variants thereof to filter out the inactive ads, but none worked. How can I structure the query to cut out the inactive ads?
I finally figured out what I was doing wrong! I'll post it the corrected request for anyone else suffering with the API: https://graph.facebook.com/v6.0/act_ID/insights?fields=date_start,date_stop,ad_name,campaign_name,adset_name,impressions,reach,inline_link_clicks,spend,actions&limit=3000&date_preset=yesterday&level=ad&filtering=[field:action_type,operator:IN,value:offsite_conversion.fb_pixel_purchase}]&access_token=xxxx

druid groupBy query - json syntax - intervals

Im attempting to create this query (which works as I hope)
SELECT userAgent, COUNT(*) FROM page_hour GROUP BY userAgent order by 2 desc limit 10
as a json. I've tried this:
{
"queryType": "groupBy",
"dataSource": "page_hour",
"granularity": "hour",
"dimensions": ["userAgent"],
"aggregations": [
{ "type": "count", "name": "total", "fieldName": "userAgent" }
],
"intervals": [ "2020-02-25T00:00:00.000/2020-03-25T00:00:00.000" ],
"limitSpec": { "type": "default", "limit": 50, "columns": ["userAgent"] },
"orderBy": {
"dimension" : "total",
"direction" : "descending"
}
}
but instead of doing the aggregation over the full range it appears to pick an arbitrary time span (EG 2020-03-19T14:00:00Z)
If you want results from the entire interval to be combined in a single result entry per user agent, set granularity to all in the query.
A few notes on Druid queries:
You can generate a native query by entering a SQL statement in the management console and selecting the explain/plan menu option from the three-dot menu by the run button.
It's worth confirming expectations that the count query-time aggregator will return the number of database rows (not the number of ingested events). This could be the reason the resulting number is smaller than anticipated.
A granularity of all will prevent bucketing results by hour.
A fieldName spec within the count aggregator? I don't know what behavior might be defined for this, so I would remove this property. The docs:
see: https://druid.apache.org/docs/latest/querying/aggregations.html#count-aggregator

magento 2 rest api product filters

I am working on magento 2 api. I need products based on below filters
store id
by product name search
shorting by name
category id
add limit
I have try with this api but no option available
index.php/rest/V1/categories/{id}/products
Please someone suggest how to archive this.
Thanks
You are looking for the (GET) API /rest/V1/products.
the store ID should be automatically detected by the store, because you can pass the store code in the URL before. If you have a store with code test, the API will start with GET /rest/test/V1/products/[...].
You can use the likecondition type. Ex.: products with "sample" in their name: ?searchCriteria[filter_groups][0][filters][0][field]=name
&searchCriteria[filter_groups][0][filters][0][value]=%sample%
&searchCriteria[filter_groups][0][filters][0][condition_type]=like
you are looking for the sortOrders. Ex.: searchCriteria[sortOrders][0][field]=name. You can even add the sort direction, for example DESC, with searchCriteria[sortOrders][0][direction]=DESC.
Use the category_id field and the eq condition type. Ex.: if you want products from category 10: searchCriteria[filter_groups][0][filters][0][field]=category_id&
searchCriteria[filter_groups][0][filters][0][value]=10&
searchCriteria[filter_groups][0][filters][0][condition_type]=eq
use searchCriteria[pageSize]. Ex.: 20 products starting from the 40th, equivalent in SQL to LIMIT 20 OFFSET 40: &searchCriteria[pageSize]=20&searchCriteria[currentPage]=3
Of course you can perform AND and OR operations with filters.
[
"filter_groups": [
{
"filters": [
{
"field": "type_id",
"value": "simple",
"condition_type": "eq"
}
]
},
{
"filters": [
{
"field": "category_id",
"value": "611",
"condition_type": "eq"
}
]
}
],
"page_size": 100,
"current_page": 1,
"sort_orders": [
{
"field": "name",
"direction": "ASC"
}
]
]