Paging through post with `https://api.linkedin.com/rest/posts` does not work as expected - linkedin-api

I am using https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=50 to retrieve all posts for my organization.
Expected Result
According to https://learn.microsoft.com/en-us/linkedin/shared/api-guide/concepts/pagination?context=linkedin%2Fcontext&view=li-lms-2022-06 I should be able to get all posts, by checking the result length against the count parameter. As long as it is equal to the count the result is not complete and I should continue paging to get all the posts.
Observed Result
The number of posts retrieved for each page varies. Even though the paging continues to get more results. The last call gives me a system error, even though there are more post to expect as also the next links in the result indicates (see log below). As a matter of fact I know that there are about 700 posts for my organization. And also calls such as
https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=100 give me results again.
What can be done about this? I there a bug in the api? How can I page successfully through my organizations posts?
Log of my paging through the results. I have printed the paging information that came back with the result and also the result_count: the number of elements that came back.
https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=0
{'count': 10, 'links': [], 'start': 0}
result_count: 9 start: 0, total: 10
https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=10
{'count': 10,
'links': [{'href': '/rest/posts?author=urn%3Ali%3Aorganization%3A267051&count=10&q=author&start=0',
'rel': 'prev',
'type': 'application/json'}],
'start': 10}
result_count: 8 start: 0, total: 18
start: 0
https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=20
{'count': 10,
'links': [{'href': '/rest/posts?author=urn%3Ali%3Aorganization%3A267051&count=10&q=author&start=10',
'rel': 'prev',
'type': 'application/json'}],
'start': 20}
result_count: 7 start: 0, total: 25
https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=30
{'count': 10,
'links': [{'href': '/rest/posts?author=urn%3Ali%3Aorganization%3A267051&count=10&q=author&start=20',
'rel': 'prev',
'type': 'application/json'}],
'start': 30}
result_count: 9 start: 0, total: 34
https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=40
{'count': 10,
'links': [{'href': '/rest/posts?author=urn%3Ali%3Aorganization%3A267051&count=10&q=author&start=30',
'rel': 'prev',
'type': 'application/json'},
{'href': '/rest/posts?author=urn%3Ali%3Aorganization%3A267051&count=10&q=author&start=50',
'rel': 'next',
'type': 'application/json'}],
'start': 40}
result_count: 10 start: 0, total: 44
https://api.linkedin.com/rest/posts?q=author&author=urn%3Ali%3Aorganization%3A267051&count=10&start=50
I have before used https://api.linkedin.com/rest/shares api: it had a well functioning paging, but did not give me the post urns, therefore I decided to switch to the post api. But now I am struggeling with the paging there as described above.

Related

MongoDB - Best way to delete documents by query based on results of another query

I have a collection that can contain several million documents, for simplicity, lets say they look like this:
{'_id': '1', 'user_id': 1, 'event_type': 'a', 'name': 'x'}
{'_id': '2', 'user_id': 1, 'event_type': 'b', 'name': 'x'}
{'_id': '3', 'user_id': 1, 'event_type': 'c', 'name': 'x'}
{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}
{'_id': '8', 'user_id': 4, 'event_type': 'a', 'name': 'x'}
{'_id': '9', 'user_id': 4, 'event_type': 'b', 'name': 'x'}
{'_id': '10', 'user_id': 4, 'event_type': 'c', 'name': 'x'}
I want to have a daily job that runs and deletes all documents by user_id, if the user_id has a doc with event_type 'c'
So the resulting collection will be
{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}
I did it successfully with mongoshell like this
var cur = db.my_collection.find({'event_type': 'c'})
ids = [];
while (cur.hasNext()) {
ids.push(cur.next()['user_id']);
if (ids.length == 5){
print('deleting for user_ids', ids);
print(db.my_collection.deleteMany({user_id: {$in: ids}}));
ids = [];
}
}
if (ids.length){db.my_collection.deleteMany({user_id: {$in: ids}})}
Created a cursor to hold all docs with event_type 'c', grouped them into batches of 5 then deleted all docs with these ids.
It works but looks very slow, like each cur.next() only gets one doc at a time.
I wanted to know if there is a better or more correct way to achieve this, if it was elasticsearch I would create a sliced scroll, scan each slice in parallel and submit parallel deleteByQuery requests with 1000 ids each. Is something like this possible/preferable with mongo?
Scale wise I expect there to be several million docs (~10M) at the collection, 300K docs that match the query, and ~700K that should be deleted
It sounds like you can just use deleteMany with the original query:
db.my_collection.deleteMany({
event_type: 'c'
})
No size limitations on it, it might just take a couple of minutes to run depending on instance size.
EDIT:
I would personally try to use the distinct function, it's the cleanest and easiest code. distinct does have a 16mb limit about 300k~ unique ids a day (depending on userid field size) sounds a bit close to the threshold, or past it.
const userIds = db.my_collection.distinct('user_id', { event_type: 'c'});
db.my_collection.deleteMany({user_id: {$in: userIds}})
Assuming you except scale to increase, or this fails your tests then the best way is to use something similar to your approach, just in much larger batches. for example:
const batchSize = 100000;
const count = await db.my_collection.countDocuments({'event_type': 'c'});
let iteration = 0;
while (iteration * batchSize < count) {
const batch = await db.my_collection.find({'event_type': 'c'}, { projection: { user_id: 1}}).limit(batchSize).toArray();
if (batch.length === 0) {
break
}
await db.my_collection.deleteMany({user_id: {$in: batch.map(v => v.user_id)}});
iteration++
}

PostgreSQL: Return filtered rows along with total number of rows matching a condition

Say I have a table, posts, that I'm able to query and filter on a client app. Each post has a type associated with it, and I'd like to be able to see both the filtered posts on the client and the total number of rows that match the filters by type on a dashboard. Obviously, I'd like to do this in a single query. It's also important to note that I'm paginating the data so I can't just use filter(...).length in some backend logic, as there might be 100000 posts but only 10 returned to the client.
Here's my query that correctly filters the data:
knex('posts').select('id', 'created_at', 'content', 'type').modify((builder) => (
filterPosts(builder, filters)
)).paginate({currentPage: offset, perPage: limit})
I'm wondering if there's some way to count the number of posts (by type) that match the filters, and return those counts in my existing query.
E.g. my results currently look like this:
[
{
id: 123,
created_at: "Jan 1, 2022",
content: "Lorem ipsum",
type: "Type 1"
},
{
id: 456,
created_at: "Feb 1, 2022",
content: "Ipsum dolor",
type: "Type 2"
}
...
]
and I'd like something like this:
[
{
id: 123,
created_at: "Jan 1, 2022",
content: "Lorem ipsum",
type: "Type 1"
countType1: 3, // Total rows where type = "Type 1" that match the filters
countType2: 6 // Total rows where type = "Type 2" that match the filters
},
{
id: 456,
created_at: "Feb 1, 2022",
content: "Ipsum dolor",
type: "Type 2",
countType1: 3,
countType2: 6
}
...
]
I've tried using a window function but so far can only get the number of posts of the current row's type, not all types
knex('posts').select(
'id',
'created_at',
'content',
'type',
'count(*) over (partition by posts.type)' // If I could add a WHERE clause here I'd be golden
).modify((builder) => (
filterPosts(builder, filters)
)).paginate({currentPage: offset, perPage: limit})
The above gives:
[
{
id: 123,
created_at: "Jan 1, 2022",
content: "Lorem ipsum",
type: "Type 1"
count: 3,
},
{
id: 456,
created_at: "Feb 1, 2022",
content: "Ipsum dolor",
type: "Type 2",
count: 6
}
...
]
Which isn't optimal since it's possible that 10 posts of only Type 2 are returned to the client due to the pagination, making the client think there are 0 posts of Type 1.
Open to suggestions on how to improve this, any help is greatly appreciated!
This post seems to be on the right track but I can't figure out how to get it working for my scenario
I was able to solve this by building off that post I linked:
knex
.with(
'posts',
knex
.from(
knex('posts').select(
{ id: 'posts.id' },
{ created_at: 'posts.created_at' },
{ content: 'posts.content' },
{ type: 'posts.type' },
)
)
.modify((builder) => filterPosts(builder, filters)) // Contains joins and where clauses
)
.from('posts')
.rightJoin(
knex.raw(
"(select count(1) filter(where posts.type = 'Type 1'), count(1) filter(where posts.type = 'Type 2') from posts) c(type_1_count, type_2_count) on true"
)
)
.select(
'posts.*',
'type_1_count',
'type_2_count'
);

Mapbox Changing Polygone Color based users area showed

I'm currently building map using Mapbox GL. On this polygone there is polygone that are color based on 1 metric.
The metric range is between 1 to 25.
I have only 12 color panel.
ColorPannel
The goals would be to
Retrieve to top left, top right, bottom left and bottom right of the users map screen.
Get all the polygone that fit into the area. (SQL request)
From all those polygone, I retrieve the metric MIN and MAX.
Create 12 range of value based of MIN and MAX.
How could I reload the color for each polygone showed on the map based on the 12 range of value that I received from the back-end. This reload of color need to be executed when the users stop moving the area.
Here is my sample of the code :
map.addLayer({
'id': 'terrain1-data',
'type': 'fill',
'source': 'level_hight',
'source-layer': 'sold_level_high-36rykl', 'maxzoom': zoomThresholdZoneHtM, 'paint': {
'fill-color': [
'interpolate',
['linear'],
["to-number",['get', 'MYMETRIC']],
0,
'#FFFFFF',
5,
'#008855',
6,
'#13be00',
7,
'#75e100',
8,
'#aee500',
9,
'#dfff00',
10,
'#fff831',
11,
'#ffe82f',
12,
'#ffd500',
13,
'#ffa51f',
14,
'#ff7b16',
15,
'#ff0a02',
16,
'#c80000'
],
'fill-opacity': [
'case',
['boolean', ['feature-state', 'hover'], false],
0.8,
0.5
],
'fill-outline-color': '#000000',
}
});
Thanks in advance. Sorry I'm starting using Mapbox.
If my understanding is correct, you want to set the number dynamically in 'interpolate'. In the following case you want to change 0, 5, ... according to the data, light?
'fill-color': [
'interpolate',
['linear'],
["to-number",['get', 'MYMETRIC']],
0,
'#FFFFFF',
5,
Then, normally you will get the date from your server and there's a chance to calculate those numbers in JavaScript code. Then put the calculated number in 'interpolate' would work.
Here's a sample. The number is generated randomly,
map.on('load', function () {
map.addSource('maine', {
'type': 'geojson',
'data': {
'type': 'FeatureCollection',
'features': [
{
'type': 'Feature',
'geometry': {
'type': 'Polygon',
'coordinates': [
[[0, 0], [0, 1], [1, 1], [1, 0]]
]
},
'properties': {
'height': 10
}
},
{
'type': 'Feature',
'geometry': {
'type': 'Polygon',
'coordinates': [
[[1, 1], [1, 2], [2, 2], [2, 1]]
]
},
'properties': {
'height': 20
}
},
]
}
});
map.addLayer({
'id': 'maine',
'type': 'fill',
'source': 'maine',
'layout': {},
'paint': {
'fill-color': [
'interpolate',
['linear'],
['to-number', ['get', 'height']],
c(10),
'#000000',
c(10) + 10,
'#FFFFFF',
]
}
});
});
function c(n) {
return Math.round(Math.random() * n);
}

empty related_resources in response from credit card payment

I'm using paypalrestsdk for Credit Card Payment. When I switch to SANDBOX mode and make a request, the paypal service return me this:
{'update_time': u'2016-11-17T16:47:46Z',
'payer':
{'payment_method': u'credit_card',
'funding_instruments': [
{'credit_card':
{'first_name': u'first_name',
'billing_address': {'city': u'London', 'postal_code': u'123','line1': u'fooo', 'country_code': u'EN'},
'expire_month': u'12',
'number': u'xxxxxxxxxxxx1111',
'last_name': u'last_name',
'expire_year': u'2020',
'type': u'visa'}}]},
'links': [
{'href': u'https://api.sandbox.paypal.com/v1/payments/payment/PAY-1GH35642K71421451LAW56MQ',
'method': u'GET',
'rel': u'self'}
],
'transactions': [
{'item_list': {
'items': [
{'currency': u'USD',
'price': u'367.77',
'name': u'Foooo',
'quantity': u'10'}],
'shipping_address': {'city': u'London', 'line1': u'line1', 'recipient_name': u'name', 'phone': u'321312', 'state': u'state', 'postal_code': u'123', 'country_code': u'EN'}},
'related_resources': [],
'amount': {'currency': u'USD', 'total': u'3688.77', 'details': {'subtotal': u'3677.70', 'shipping': u'11.07'}},
'description': u'Charge for order: #1'}],
'state': u'created',
'create_time': u'2016-11-17T16:47:46Z',
'intent': u'sale',
'id': u'PAY-1GH35642K71421451LAW56MQ'}
why is the related_resources empty? How can i test my code in sandbox mode? Of course in PRODUCTION mode related_resources contain sales as in example: https://developer.paypal.com/docs/integration/direct/accept-credit-cards/
The number of credit card is 4111111111111111.
4111111111111111 is not working anymore, you can try with an other one like 4929931129414294

Django/NetworkX Eliminating Repeated Nodes

I want to use d3js to visualize the connections between the users of my Django website.
I am reusing the code for the force directed graph example wich requires that each node has two attributes (ID and Name). I have created a node for each user in user_profiles_table and added an edge between already created nodes based on each row in connections_table. It does not work; networkx creates new nodes when I start working with the connection_table.
nodeindex=0
for user_profile in UserProfile.objects.all():
sourcetostring=user_profile.full_name3()
G.add_node(nodeindex, name=sourcetostring)
nodeindex = nodeindex +1
for user_connection in Connection.objects.all():
target_tostring=user_connection.target()
source_tostring=user_connection.source()
G.add_edge(sourcetostring, target_tostring, value=1)
data = json_graph.node_link_data(G)
result:
{'directed': False,
'graph': [],
'links': [{'source': 6, 'target': 7, 'value': 1},
{'source': 7, 'target': 8, 'value': 1},
{'source': 7, 'target': 9, 'value': 1},
{'source': 7, 'target': 10, 'value': 1},
{'source': 7, 'target': 7, 'value': 1}],
'multigraph': False,
'nodes': [{'id': 0, 'name': u'raymondkalonji'},
{'id': 1, 'name': u'raykaeng'},
{'id': 2, 'name': u'raymondkalonji2'},
{'id': 3, 'name': u'tester1cet'},
{'id': 4, 'name': u'tester2cet'},
{'id': 5, 'name': u'tester3cet'},
{'id': u'tester2cet'},
{'id': u'tester3cet'},
{'id': u'tester1cet'},
{'id': u'raykaeng'},
{'id': u'raymondkalonji2'}]}
How can I eliminate the repeated nodes?
You probably get repeated nodes because your user_connection.target() and user_connection.source() functions return the node name, not its id. When you call add_edge, if the endpoints do not exist in the graph, they are created, which explain why you get duplicates.
The following code should work.
for user_profile in UserProfile.objects.all():
source = user_profile.full_name3()
G.add_node(source, name=source)
for user_connection in Connection.objects.all():
target = user_connection.target()
source = user_connection.source()
G.add_edge(source, target, value=1)
data = json_graph.node_link_data(G)
Also note that you should dump the data object to json if you want a properly formatted json string. You can do that as follows.
import json
json.dumps(data) # get the string representation
json.dump(data, 'somefile.json') # write to file