Retrieving full list of transactions for hyperledger - ibm-cloud

may I ask if there is a way to retrieve all the transactions I have in my blockchain?
For instance, "A" transferred 10 to "B" (first transaction), "B" transferred 10 to "A" (second transaction), so on and so forth.
Thus, is there a way to retrieve the list of transactions and display it?

If you want all of them, you can iterate over all the blocks in the blockchain using the REST endpoint. This will give you all the block's information, including all the transactions in the block.
You can get the whole height (length) of the blockchain using the GET /chain endpoint.

Related

Mongo - Tree data with frequent full load

I am using mongodb and storing tree data( MongoDB is the only option for now ).
10 ->>> Root node
/\
/ \
8 6 ---->> 8 & 6 child node of 10
/\ /\
/ \ / \
4 5 2 1 ---->> 4 & 5 child node of 8 ...
Each node is a separate document in mongoDB and each document has bunch of fields.
Sample data,
{
"_id": "234463456453643563456",
"name": "Mike",
"empId": "10",
"managerId": "8",
"hierarchy": [
8,
10
]
"projects" : [ "123", "456", "789"]
}
Here, hierarchies field will have manager ids from 1st level to top level.
Any document might get updated with any field and node might move to any location. Basically, an org change.
I have a use case where changes will be captured in other system and my system will be updated with the full active load( 200k records out of 800k records ) every 2 hours.
Here, if there is any org change like, 8 is moving under 6, the bottom to top hierarchy will change for all nodes under 8. If the full load failed in b/w the org hierarchy result will not be correct until the complete load is done.
The result should be either before the full update or after the full update not in b/w. I was thinking on versioning to handle this. Is there any better way to handle this with mongo?
There are about 200k records for full load. But, the actual changes might be less than 1k record many times which we dont know.
If you need an all-or-nothing (atomic) database update, where your database clients must not read an invalid mid-update state, then you need a transaction.
You can optimize by recognizing that some subsets of the graph are valid after you update them, and so queries against that subset of the graph is valid, and then you don't need to use the transaction feature of the database.
But you'll still be blocking or rejecting queries from some clients, and that makes your schema, queries or architecture more complex.
If this is a business issue, then I'd push on the business requirements. (If you're in a position to do that, you didn't say whether that was an option.)
Your clients are already reading data that are potentially 2 hours out of date. If the batch update you're applying is sorted, then you can make those updates in time-order, and your clients will always be receiving a state that was recently valid (but maybe not the most recent).

REST, Pagination with filters dependent on external system and sql

I have a REST web-service which is expected to expose a paginated GET call.
For eg: I have a list of students( "Name" , "Age" , "Class" ) in my sql table. And I have to expose a paginated API to get all students given a class. So far so good. Just a typical REST api does the job and pagination can be achieved by the sql query.
Now suppose we have the same requirement just that we need to send back students who are from particular state. This information is hosted by a web-service, S2. S2 has an API which given a list of student names and a state "X" returns the students that belong to state X.
Here is where I'm finding it difficult to support pagination.
eg: I get a request with page_size 10, a class C and a state X which results in 10 students from class C from my db. Now I make a call to S2 with these 10 students and state X, in return, the result may include 0 students, all 10 students, or any number students between 0 and 10 from state 'X'.
How do I support pagination in this case?
Brute force would be to make db calls and S2 calls till the page size is met and then only reply. I don't like this approach .
Is there a common practice followed for this, a general rule of thumb, or is this architecture a bad service design?
(EDIT): Also please tell about managing the offset value.
if we go with the some approach and get the result set , how can I manage the offset for next page request ?
Thanks for reading :)
Your service should handle the pagination and not hand it off the SQL. Make these steps:
Get all students from S1 (SQL database) where class = C.
Using the result, get all students from S2 that are in the result and where state = X.
Sort the second result in a stable way.
Get the requested page you want from the sorted result.
All this is done in the code that calls both S1 and S2. Only it has the knowledge to build the pages.
Not doing the pagination with SQL can lead to performance problems in case of large databases.
Some solution in between can be applied. I assume that the pagination parameters (offset, page size) are configurable for both services, yours and the external one.
You can implement prefetch logic for both services, lets say the prefetch chunk size can be 100.
The frontend can be served with required page size 10.
If the prefetched chunks do not result in a frontend page size 10, the backend should prefetch another chunk till the fronend can be served with 10 students.
This approach require more logic in backend to calculate the next offsets for prefetching, but if you want performance and pagination solved you must invest some effort.

Inter-microservices Communication using REST & PUB/SUB

This is still a theory in my mind.
I'm rebuilding my backend by splitting things into microservices. The microservices I'm imagining for starting off are:
- Order (stores order details and status of each order)
- Customer (stores customer details, addresses, orders booked)
- Service Provider (stores service provider details, status & location of each service provider, order(s) currently being processed by the service provider, etc.)
- Payment (stores payment info for each order)
- Channel (communicates with customers via email / SMS / mobile push)
I hope to be able to use PUB/SUB to create a message with corresponding data, which can be used by any other microservice subscribing to that message.
First off, I understand the concept that each microservice should have complete code & data isolation (thus, on different instances / VMs); and that all microservices should communicate strictly using HTTP REST API contracts.
My doubts are as follows:
To show a list of orders, I'll be using the Order DB to get all orders. In each Order document (I'll be using MongoDB for storage), I'll be having a customer_id Foreign Key. Now the issue of resolving customer_name by using customer_id.
If I need to show 100 orders on the page and go with the assumption that each order has a unique customer_id associated with it, then will I need to do a REST API call 100 times so as to get the names of all the 100 customer_ids?
Or, is data replication a good solution for this problem?
I am envisioning something like this w.r.t. PUB/SUB: The business center personnel mark an order as assigned & select the service provider to allot to that order. This creates a message on the cross-server PUB/SUB channel.
Then, the Channel microservice (which is on a totally different instance / VM) captures this message & sends a Push message & SMS to the service provider's device using the data within the message's contents.
Is this possible at all?
UPDATE TO QUESTION 2: I want the Order microservice to be completely independent of any other microservices that will be built upon / side-by-side it. Channel microservice is an example of a microservice that depends upon events taking place within Order microservice.
Also, please guide me as to what all technologies / libraries to use.
What I'll be developing on:
Java
MongoDB
Amazon AWS instances for each microservice.
Would appreciate anyone's help on this.
Thanks!
#1
If I need to show 100 orders and each order has a unique customer_id, will I need to do 100 REST API call?
No, just make 1 request with 100 order_id(s) and return a dictionary of order_id <=> customer_id
#2
It's a single request
POST
/orders/new
{
"selected_service_provider_id" : "123"
...
}
Which can return you order_id and you can print it locally for the customer or track progress or what have you.
On the server side, you receive an order and process it. Processing can include sending an SMS at some stage. This functionality can be implemented inside original service that received this request or as a separate call to another dedicated service.
To your first question, you don't need to do 100 queries, just one with the array of your 100 documents, like the following:
db.collection.find( { _id : { $in : [1,2,3,4] } } );
https://stackoverflow.com/a/7713461/1384539
I know this question is 1 year old, but I would like to add my answer to the first point.
One option would be to use some form of CQRS and store on the OrderDB also some of the user details when creating an order. This way when you have to show the list of orders you already have all the details you need. Also, the order document would represent a photograph of the user state at the moment of the order creation.
Of course, in case you don't have the user details when storing the order, you just need to make a GET call to the User Service, but that would be 1 call, not 100.

In CQRS/Eventsourcing which is best approach for a parent to modify the state of all it's children ??

Usecase: Suppose I have the following aggregates
Root aggregate - CustomerRootAggregate (manages each CustomerAggregate)
Child aggregate of the Root aggregate - CustomerAggregate (there are 10 customers)
Question: How do I send DisableCustomer command to all the 10 CustomerAggregate to update their state to be disabled ?
customerState.enabled = false
Solutions: Since CQRS does not allow the write side to query the read side to get a list of CustomerAggregate IDs I thought of the following:
CustomerRootAggregate always store the IDs of all it's CustomerAggregate in the database as json. When a Command for DisableAllCustomers is received by CustomerRootAggregate it will fetch the CustomerIds json and send DisableCustomer command to all the children where each child will restore it's state before applying DisableCustomer command. But this means I will have to maintain CustomerIds json record's consistency.
The Client (Browser - UI) should always send the list of CustomerIds to apply DisableCustomer to. But this will be problematic for a database with thousands of customers.
In the REST API Layer check for the command DisableAllCustomers and fetch all the IDs from the read side and sends DisableAllCustomers(ids) with IDs populated to write side.
Which is a recommended approach or is a better approach ?
Root aggregate - CustomerRootAggregate (manages each CustomerAggregate)
Child aggregate of the Root aggregate - CustomerAggregate (there are 10 customers)
For starters, the language "child aggregate" is a bit confusing. Your model includes a "parent" entity that holds a direct reference to a "child" entity, then both of those entities must be part of the same aggregate.
However, you might have a Customer aggregate for each customer, and a CustomerSet aggregate that manages a collection of Id.
How do I send DisableCustomer command to all the 10 CustomerAggregate to update their state to be disabled ?
The usual answer is that you run a query to get the set of Customers to be disabled, and then you dispatch a disableCustomer command to each.
So both 3 and 2 are reasonable answers, with the caveat that you need to consider what your requirements are if some of the DisableCustomer commands fail.
2 in particular is seductive, because it clearly articulates that the client (human operator) is describing a task, which the application then translates into commands to by run by the domain model.
Trying to pack "thousands" of customer ids into the message may be a concern, but for several use cases you can find a way to shrink that down. For instance, if the task is "disable all", then client can send to the application instructions for how to recreate the "all" collection -- ie: "run this query against this specific version of the collection" describes the list of customers to be disabled unambiguously.
When a Command for DisableAllCustomers is received by CustomerRootAggregate it will fetch the CustomerIds json and send DisableCustomer command to all the children where each child will restore it's state before applying DisableCustomer command. But this means I will have to maintain CustomerIds json record's consistency.
This is close to a right idea, but not quite there. You dispatch a command to the collection aggregate. If it accepts the command, it produces an event that describes the customer ids to be disabled. This domain event is persisted as part of the event stream of the collection aggregate.
Subscribe to these events with an event handler that is responsible for creating a process manager. This process manager is another event sourced state machine. It looks sort of like an aggregate, but it responds to events. When an event is passed to it, it updates its own state, saves those events off in the current transaction, and then schedules commands to each Customer aggregate.
But it's a bunch of extra work to do it that way. Conventional wisdom suggests that you should usually begin by assuming that the process manager approach isn't necessary, and only introduce it if the business demands it. "Premature automation is the root of all evil" or something like that.

API pagination best practices

I'd love some some help handling a strange edge case with a paginated API I'm building.
Like many APIs, this one paginates large results. If you query /foos, you'll get 100 results (i.e. foo #1-100), and a link to /foos?page=2 which should return foo #101-200.
Unfortunately, if foo #10 is deleted from the data set before the API consumer makes the next query, /foos?page=2 will offset by 100 and return foos #102-201.
This is a problem for API consumers who are trying to pull all foos - they will not receive foo #101.
What's the best practice to handle this? We'd like to make it as lightweight as possible (i.e. avoiding handling sessions for API requests). Examples from other APIs would be greatly appreciated!
I'm not completely sure how your data is handled, so this may or may not work, but have you considered paginating with a timestamp field?
When you query /foos you get 100 results. Your API should then return something like this (assuming JSON, but if it needs XML the same principles can be followed):
{
"data" : [
{ data item 1 with all relevant fields },
{ data item 2 },
...
{ data item 100 }
],
"paging": {
"previous": "http://api.example.com/foo?since=TIMESTAMP1"
"next": "http://api.example.com/foo?since=TIMESTAMP2"
}
}
Just a note, only using one timestamp relies on an implicit 'limit' in your results. You may want to add an explicit limit or also use an until property.
The timestamp can be dynamically determined using the last data item in the list. This seems to be more or less how Facebook paginates in its Graph API (scroll down to the bottom to see the pagination links in the format I gave above).
One problem may be if you add a data item, but based on your description it sounds like they would be added to the end (if not, let me know and I'll see if I can improve on this).
If you've got pagination you also sort the data by some key. Why not let API clients include the key of the last element of the previously returned collection in the URL and add a WHERE clause to your SQL query (or something equivalent, if you're not using SQL) so that it returns only those elements for which the key is greater than this value?
You have several problems.
First, you have the example that you cited.
You also have a similar problem if rows are inserted, but in this case the user get duplicate data (arguably easier to manage than missing data, but still an issue).
If you are not snapshotting the original data set, then this is just a fact of life.
You can have the user make an explicit snapshot:
POST /createquery
filter.firstName=Bob&filter.lastName=Eubanks
Which results:
HTTP/1.1 301 Here's your query
Location: http://www.example.org/query/12345
Then you can page that all day long, since it's now static. This can be reasonably light weight, since you can just capture the actual document keys rather than the entire rows.
If the use case is simply that your users want (and need) all of the data, then you can simply give it to them:
GET /query/12345?all=true
and just send the whole kit.
There may be two approaches depending on your server side logic.
Approach 1: When server is not smart enough to handle object states.
You could send all cached record unique id’s to server, for example ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"] and a boolean parameter to know whether you are requesting new records(pull to refresh) or old records(load more).
Your sever should responsible to return new records(load more records or new records via pull to refresh) as well as id’s of deleted records from ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"].
Example:-
If you are requesting load more then your request should look something like this:-
{
"isRefresh" : false,
"cached" : ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"]
}
Now suppose you are requesting old records(load more) and suppose "id2" record is updated by someone and "id5" and "id8" records is deleted from server then your server response should look something like this:-
{
"records" : [
{"id" :"id2","more_key":"updated_value"},
{"id" :"id11","more_key":"more_value"},
{"id" :"id12","more_key":"more_value"},
{"id" :"id13","more_key":"more_value"},
{"id" :"id14","more_key":"more_value"},
{"id" :"id15","more_key":"more_value"},
{"id" :"id16","more_key":"more_value"},
{"id" :"id17","more_key":"more_value"},
{"id" :"id18","more_key":"more_value"},
{"id" :"id19","more_key":"more_value"},
{"id" :"id20","more_key":"more_value"}],
"deleted" : ["id5","id8"]
}
But in this case if you’ve a lot of local cached records suppose 500, then your request string will be too long like this:-
{
"isRefresh" : false,
"cached" : ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10",………,"id500"]//Too long request
}
Approach 2: When server is smart enough to handle object states according to date.
You could send the id of first record and the last record and previous request epoch time. In this way your request is always small even if you’ve a big amount of cached records
Example:-
If you are requesting load more then your request should look something like this:-
{
"isRefresh" : false,
"firstId" : "id1",
"lastId" : "id10",
"last_request_time" : 1421748005
}
Your server is responsible to return the id’s of deleted records which is deleted after the last_request_time as well as return the updated record after last_request_time between "id1" and "id10" .
{
"records" : [
{"id" :"id2","more_key":"updated_value"},
{"id" :"id11","more_key":"more_value"},
{"id" :"id12","more_key":"more_value"},
{"id" :"id13","more_key":"more_value"},
{"id" :"id14","more_key":"more_value"},
{"id" :"id15","more_key":"more_value"},
{"id" :"id16","more_key":"more_value"},
{"id" :"id17","more_key":"more_value"},
{"id" :"id18","more_key":"more_value"},
{"id" :"id19","more_key":"more_value"},
{"id" :"id20","more_key":"more_value"}],
"deleted" : ["id5","id8"]
}
Pull To Refresh:-
Load More
It may be tough to find best practices since most systems with APIs don't accommodate for this scenario, because it is an extreme edge, or they don't typically delete records (Facebook, Twitter). Facebook actually says each "page" may not have the number of results requested due to filtering done after pagination.
https://developers.facebook.com/blog/post/478/
If you really need to accommodate this edge case, you need to "remember" where you left off. jandjorgensen suggestion is just about spot on, but I would use a field guaranteed to be unique like the primary key. You may need to use more than one field.
Following Facebook's flow, you can (and should) cache the pages already requested and just return those with deleted rows filtered if they request a page they had already requested.
Option A: Keyset Pagination with a Timestamp
In order to avoid the drawbacks of offset pagination you have mentioned, you can use keyset based pagination. Usually, the entities have a timestamp that states their creation or modification time. This timestamp can be used for pagination: Just pass the timestamp of the last element as the query parameter for the next request. The server, in turn, uses the timestamp as a filter criterion (e.g. WHERE modificationDate >= receivedTimestampParameter)
{
"elements": [
{"data": "data", "modificationDate": 1512757070}
{"data": "data", "modificationDate": 1512757071}
{"data": "data", "modificationDate": 1512757072}
],
"pagination": {
"lastModificationDate": 1512757072,
"nextPage": "https://domain.de/api/elements?modifiedSince=1512757072"
}
}
This way, you won't miss any element. This approach should be good enough for many use cases. However, keep the following in mind:
You may run into endless loops when all elements of a single page have the same timestamp.
You may deliver many elements multiple times to the client when elements with the same timestamp are overlapping two pages.
You can make those drawbacks less likely by increasing the page size and using timestamps with millisecond precision.
Option B: Extended Keyset Pagination with a Continuation Token
To handle the mentioned drawbacks of the normal keyset pagination, you can add an offset to the timestamp and use a so-called "Continuation Token" or "Cursor". The offset is the position of the element relative to the first element with the same timestamp. Usually, the token has a format like Timestamp_Offset. It's passed to the client in the response and can be submitted back to the server in order to retrieve the next page.
{
"elements": [
{"data": "data", "modificationDate": 1512757070}
{"data": "data", "modificationDate": 1512757072}
{"data": "data", "modificationDate": 1512757072}
],
"pagination": {
"continuationToken": "1512757072_2",
"nextPage": "https://domain.de/api/elements?continuationToken=1512757072_2"
}
}
The token "1512757072_2" points to the last element of the page and states "the client already got the second element with the timestamp 1512757072". This way, the server knows where to continue.
Please mind that you have to handle cases where the elements got changed between two requests. This is usually done by adding a checksum to the token. This checksum is calculated over the IDs of all elements with this timestamp. So we end up with a token format like this: Timestamp_Offset_Checksum.
For more information about this approach check out the blog post "Web API Pagination with Continuation Tokens". A drawback of this approach is the tricky implementation as there are many corner cases that have to be taken into account. That's why libraries like continuation-token can be handy (if you are using Java/a JVM language). Disclaimer: I'm the author of the post and a co-author of the library.
Pagination is generally a "user" operation and to prevent overload both on computers and the human brain you generally give a subset. However, rather than thinking that we don't get the whole list it may be better to ask does it matter?
If an accurate live scrolling view is needed, REST APIs which are request/response in nature are not well suited for this purpose. For this you should consider WebSockets or HTML5 Server-Sent Events to let your front end know when dealing with changes.
Now if there's a need to get a snapshot of the data, I would just provide an API call that provides all the data in one request with no pagination. Mind you, you would need something that would do streaming of the output without temporarily loading it in memory if you have a large data set.
For my case I implicitly designate some API calls to allow getting the whole information (primarily reference table data). You can also secure these APIs so it won't harm your system.
Just to add to this answer by Kamilk : https://www.stackoverflow.com/a/13905589
Depends a lot on how large dataset you are working on. Small data sets do work on effectively on offset pagination but large realtime datasets do require cursor pagination.
Found a wonderful article on how Slack evolved its api's pagination as there datasets increased explaining the positives and negatives at every stage : https://slack.engineering/evolving-api-pagination-at-slack-1c1f644f8e12
I think currently your api's actually responding the way it should. The first 100 records on the page in the overall order of objects you are maintaining. Your explanation tells that you are using some kind of ordering ids to define the order of your objects for pagination.
Now, in case you want that page 2 should always start from 101 and end at 200, then you must make the number of entries on the page as variable, since they are subject to deletion.
You should do something like the below pseudocode:
page_max = 100
def get_page_results(page_no) :
start = (page_no - 1) * page_max + 1
end = page_no * page_max
return fetch_results_by_id_between(start, end)
Another option for Pagination in RESTFul APIs, is to use the Link header introduced here. For example Github use it as follow:
Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next",
<https://api.github.com/user/repos?page=50&per_page=100>; rel="last"
The possible values for rel are: first, last, next, previous. But by using Link header, it may be not possible to specify total_count (total number of elements).
I've thought long and hard about this and finally ended up with the solution I'll describe below. It's a pretty big step up in complexity but if you do make this step, you'll end up with what you are really after, which is deterministic results for future requests.
Your example of an item being deleted is only the tip of the iceberg. What if you are filtering by color=blue but someone changes item colors in between requests? Fetching all items in a paged manner reliably is impossible... unless... we implement revision history.
I've implemented it and it's actually less difficult than I expected. Here's what I did:
I created a single table changelogs with an auto-increment ID column
My entities have an id field, but this is not the primary key
The entities have a changeId field which is both the primary key as well as a foreign key to changelogs.
Whenever a user creates, updates or deletes a record, the system inserts a new record in changelogs, grabs the id and assigns it to a new version of the entity, which it then inserts in the DB
My queries select the maximum changeId (grouped by id) and self-join that to get the most recent versions of all records.
Filters are applied to the most recent records
A state field keeps track of whether an item is deleted
The max changeId is returned to the client and added as a query parameter in subsequent requests
Because only new changes are created, every single changeId represents a unique snapshot of the underlying data at the moment the change was created.
This means that you can cache the results of requests that have the parameter changeId in them forever. The results will never expire because they will never change.
This also opens up exciting feature such as rollback / revert, synching client cache etc. Any features that benefit from change history.
Refer to API Pagination Design, we could design pagination api through cursor
They have this concept, called cursor — it’s a pointer to a row. So you can say to a database “return me 100 rows after that one”. And it’s much easier for a database to do since there is a good chance that you’ll identify the row by a field with an index. And suddenly you don’t need to fetch and skip those rows, you’ll go directly past them.
An example:
GET /api/products
{"items": [...100 products],
"cursor": "qWe"}
API returns an (opaque) string, which you can use then to retrieve the next page:
GET /api/products?cursor=qWe
{"items": [...100 products],
"cursor": "qWr"}
Implementation-wise there are many options. Generally, you have some ordering criteria, for example, product id. In this case, you’ll encode your product id with some reversible algorithm (let’s say hashids). And on receiving a request with the cursor you decode it and generate a query like WHERE id > :cursor LIMIT 100.
Advantage:
The query performance of db could be improved through cursor
Handle well when new content was inserted into db while querying
Disadvantage:
It’s impossible to generate a previous page link with a stateless API