Get Product Data from shopify GraphQL for over 10000 Products

Get Product Data from shopify GraphQL for over 10000 Products - rest

I have an extremely large selection of products in a collection (140,000), to get the data of 250 is fine but I need to get a list of tags for 140,000 products, I have created a bulkOperationRunQuery to get the data. Here is the query I use to run
mutation {
bulkOperationRunQuery(
query: """
{
products{
edges{
node{
id
tags
}
}
}
}
"""
) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}}
This Works but takes far to long to process, how can I make this quicker is there a set limit on the request

That is all you get for a massive ask like that. If you have 140,000 products you ask for the once. Then you have them, and speed should be of little consequence. There is no need to repeat yourself by asking for them again and again. If you are interested in changes, just listen to product change webhooks. Save yourself a lot of grief that way.

Related

Firestore rules and data structure

I have a question regarding data structure and rules ... I have content on which users can vote. Something like this:
Firestore object:
{
name: "Cat",
description: "A cat named Cat",
votes: 56
}
Now ... I want authenticated users to be able to have update access to the votes, but not to any other values of the object and of course read rights since the content has to be displayed.
I did this because I wanted to avoid additional queries when displaying the content.
Should I create another collection "votes" maybe where the votes are kept and for each document make an additional request to get them?

In rules, you have access to the state of the data both before and after the writes - so you can test specific fields to be sure they have not changed:
function existing() {
return resource.data;
}
function resulting() {
return request.resource.data;
}
function matchField(fieldName) {
return existing()[fieldName] == resulting()[fieldName];
}
....
allow update: if matchField("name") && matchField("description")
....
The functions just make the rule easier to read.

Why am I able to bypass pagination when I call the same field twice (with different queries) in GitHub's GraphQL API

I noticed something I don't understand while trying to get the number of open issues per repository for a user.
When I use the following query I am asked to perform pagination (as expected) -
query {
user(login:"armsp"){
repositories{
nodes{
name
issues(states: OPEN){
totalCount
}
}
}
}
}
The error message after running the above -
{
"data": {
"user": null
},
"errors": [
{
"type": "MISSING_PAGINATION_BOUNDARIES",
"path": [
"user",
"repositories"
],
"locations": [
{
"line": 54,
"column": 5
}
],
"message": "You must provide a `first` or `last` value to properly paginate the `repositories` connection."
}
]
}
However when I do the following I actually get all the results which doesn't make any sense to me -
query {
user(login:"armsp"){
repositories{
totalCount
}
repositories{
nodes{
name
issues(states: OPEN){
totalCount
}
}
}
}
}
Shouldn't I be asked for pagination in the second query too ?

TLDR; This appears to be a bug. There's no way to bypass the limit applied when fetching a list of resources.
Limiting responses like this is a common feature of public APIs -- if the response could include thousands or millions of results, it'll tie up a lot of server resources to fulfill it all at once. Allowing users to make those sort of queries is both costly and a potential security risk.
Github's intent appears to be to always limit the amount of results when fetching a list of resources. This isn't well documented on the GraphQL side, but matches the behavior of their REST API:
Requests that return multiple items will be paginated to 30 items by default. You can specify further pages with the ?page parameter. For some resources, you can also set a custom page size up to 100 with the ?per_page parameter.
For connections, it looks like the check for the first or last parameter is only ran whenever the nodes field is present in the selection set. This makes sense, since this is ultimately the field we want to limit -- requesting other fields like totalDiskUsage or totalDiskUsage, even without a limit argument, is harmless with the regard to above concerns.
Things get funky when you consider how GraphQL handles selection sets with selections that have the same name. Without getting into the nitty gritty details, GraphQL will let you request the same field multiple times. If the field in question has a selection set, it will effectively merge the selection sets into a single one. So
query {
user(login:"armsp") {
repositories {
totalCount
}
repositories {
totalDiskUsage
}
}
}
becomes and is equivalent to
query {
user(login:"armsp") {
repositories {
totalCount
totalDiskUsage
}
}
}
Side note: The above does not hold true if you explicitly give one of the fields an alias since then the two fields have different response names.
All that to say, technically this query:
query {
user(login:"armsp"){
repositories{
totalCount
}
repositories{
nodes{
name
issues(states: OPEN){
totalCount
}
}
}
}
}
should also blow up with the same MISSING_PAGINATION_BOUNDARIES error. The fact that it doesn't means the selection set merging is somehow borking the check that's in place. This is clearly a bug. However, even while this appears to "work", it still doesn't get around whatever limits Github has applies at the storage layer -- you will always get at most 100 results even when exploiting the above bug.

Meteor Subscriptions Selecting the Entire Set?

I've defined a publication:
Meteor.publish('uninvited', function (courseId: string) {
return Users.find({
'profile.courses': {
$ne: courseId
}
});
});
So, in when a subscriber subscribes to this, I expect Users.find() to return only users that are not enrolled in that particular course. So, on my client, when I write:
this.uninvitedSub = MeteorObservable.subscribe("uninvited", this.courseId).subscribe(() => {
this.uninvited = Users.find().zone()});
I expect uninvited to contain only a subset of users, however, I'm getting the entire set of users regardless of whether or not they are enrolled in a particular course. I've made sure that my data is correct and that there are users enrolled in the course that I'm concerned with. I've also verified that this.courseId is working as expected. Is there an error with my code, or should I further look into my data to see if there's anything wrong with it?
**Note:
When I write this:
this.uninvitedSub = MeteorObservable.subscribe("uninvited", this.courseId).subscribe(() => {
this.uninvited = Users.find({
'profile.courses': {}
}).zone();
});
With this, it works as expected! Why? The difference is that my query now contains 'profile.courses': {}.

MongoDB Social Network Adding Followers

I'm implementing a social network in MongoDB and I need to keep track of Followers and Following for each User. When I search for Users I want to display a list like Facebook with the User Name, Picture and number of Followers & Following. If I just wanted to display the User Name and Picture (info that doesn't change) it would be easy, but I also need to display the number of Followers & Following (which changes fairly regularly).
My current strategy is to embed the People a User follows into each User Document:
firstName: "Joe",
lastName: "Bloggs",
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
},
{
_id: ObjectId("51f26293a5c5ea4331cb786a"),
name: "The Palace Bar",
profilePictureUrl: "https://s3-eu-west-1.amazonaws.com/businesses/xxx.jpg",
}
]
The question is - What is the best strategy to keep track of the number of Followers & Following for each User?
If I include the number of Follows / Following as part of the embedded document i.e.
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
**followers: 10,**
**following: 400**
}
then every time a User follows someone requires multiple updates across all the embedded documents.
Since the consistency of this data isn't really important (i.e. Showing someone I have 10 instead of 11 followers isn't the end of the world), I can queue this update. Is this approach ok or can anyone suggest a better approach ?

You're on the right track. Think about which calculation is performed more - determining the number of followers/following or changing number of followers/following? Even if you're caching the output of the # of followers/following calculation it's still going to be performed one or two orders of magnitude more often than changing the number.
Also, think about the opposite. If you really need to display the number of followers/following for each of those users, you'll have to then do an aggregate on each load (or cache it somewhere, but you're still doing a lot of calcs).
Option 1: Cache the number of followers/following in the embedded document.
Upsides: Can display stats in O(1) time
Downsides: Requires O(N) time to follow/unfollow
Option 2: Count the number of followers/following on each page view (or cache invalidation)
Upsides: Can follow/unfollow in O(1) time
Downsides: Requires O(N) time to display
Add in the fact that follower/following stats can be eventually consistent whereas the counts have to be displayed on demand and I think it's a pretty easy decision to cache it.

I've gone ahead and implement the update followers/following based on the same strategy recommended by Mason (Option 1). Here's my code in NodeJs and Mongoose and using the AsyncJs Waterfall pattern in case anyone is interested or has any opinions. I haven't implemented queuing yet but the plan would be to farm most of this of to a queue.
async.waterfall([
function (callback) {
/** find & update the person we are following */
Model.User
.findByIdAndUpdate(id,{$inc:{followers:1}},{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(callback);
},
function (followee, callback) {
/** find & update the person doing the following */
var query = {
$inc:{following:1},
$addToSet: { follows: followee}
}
Model.User
.findByIdAndUpdate(credentials.username,query,{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(function(err,follower){
callback(err,follower,followee);
});
},
function(follower,followee,callback){
/** update the following count */
Model.User
.update({'follows._id':follower.id},{'follows.$.following':follower.following},{upsert:true,multi:true},function(err){
callback(err,followee);
});
},
function(followee,callback){
/** update the followers count */
Model.User
.update({'follows._id':followee.id},{'follows.$.followers':followee.followers},{upsert:true,multi:true},callback);
}
], function (err) {
if (err)
next(err);
else {
res.send(HTTPStatus.OK);
next();
}
});

CouchDB - an outer join on a many to many relationship

I have couchDB database with 3 sets of documents
Items, Users, Reviews
A many to many relationship is maintained in Reviews document for Items and Users
User
{"type":"user","user_id":"U1"},
{"type":"user","user_id":"U2"},
{"type":"user","user_id":"U3"}
Item
{"type":"item","item_id":"I1"},
{"type":"item","item_id":"I2"},
{"type":"item","item_id":"I3"},
{"type":"item","item_id":"I4"}
Review
{"type":"review","item_id":"I1","user_id":"U1","score":4},
{"type":"review","item_id":"I1","user_id":"U2","score":3},
{"type":"review","item_id":"I2","user_id":"U1","score":4},
{"type":"review","item_id":"I3","user_id":"U3","score":1}
I want to get an outer join on Items and Users using reviews so as to get the below results
Intended Result
{"total_rows":16,"offset":0,"rows":[
{"id":"...","key":"I1","value":["item"]},
{"id":"...","key":"I1","value":["review","U1",4]},
{"id":"...","key":"I1","value":["review","U2",3]},
{"id":"...","key":"I1","value":["review","U3",0]},
{"id":"...","key":"I2","value":["item"]},
{"id":"...","key":"I2","value":["review","U1",4]},
{"id":"...","key":"I2","value":["review","U2",0]},
{"id":"...","key":"I2","value":["review","U3",0]},
{"id":"...","key":"I3","value":["item"]},
{"id":"...","key":"I3","value":["review","U1",0]},
{"id":"...","key":"I3","value":["review","U2",0]},
{"id":"...","key":"I3","value":["review","U3",1]},
{"id":"...","key":"I4","value":["item"]},
{"id":"...","key":"I4","value":["review","U1",0]},
{"id":"...","key":"I4","value":["review","U2",0]},
{"id":"...","key":"I4","value":["review","U3",0]}
]}
I am using the tips mentioned in here
http://wiki.apache.org/couchdb/EntityRelationship#Many_to_Many:_Relationship_documents
"many_join_review": {
"map": "function(doc) {
if (doc.type == 'review') {
emit(doc.item_id,[doc.type,doc.user_id,doc.score]);
} else if (doc.type == 'item') {
emit(doc.item_id,[doc.type]);
}"
}
I am getting the below result instead
{"total_rows":8,"offset":0,"rows":[
{"id":"d5e26b9da1683232d1c208241a0056fc","key":"I1","value":["item"]},
{"id":"d5e26b9da1683232d1c208241a006bc8","key":"I1","value":["review","U1",4]},
{"id":"d5e26b9da1683232d1c208241a006be0","key":"I1","value":["review","U2",3]},
{"id":"d5e26b9da1683232d1c208241a005eb0","key":"I2","value":["item"]},
{"id":"d5e26b9da1683232d1c208241a0075cf","key":"I2","value":["review","U1",4]},
{"id":"d5e26b9da1683232d1c208241a006409","key":"I3","value":["item"]},
{"id":"d5e26b9da1683232d1c208241a008313","key":"I3","value":["review","U3",1]},
{"id":"d5e26b9da1683232d1c208241a0065d7","key":"I4","value":["item"]}
]}
So what should I change to have the intended result as in how to get 0 as score on items whose reviews don't exist for a user. Do I need to add something under reduce section.
Thanks

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Get Product Data from shopify GraphQL for over 10000 Products - rest

Related

Firestore rules and data structure

Why am I able to bypass pagination when I call the same field twice (with different queries) in GitHub's GraphQL API

Meteor Subscriptions Selecting the Entire Set?

MongoDB Social Network Adding Followers

CouchDB - an outer join on a many to many relationship

Categories

Resources