How to check random behavior with postman? - rest

I have an api which exhibits some randomness, e.g., there are two responses which occur randomly but equally often.
How can I test something like this with postman?
One simple way would be to send the request multiple times and check the statistics of the responses (should be roughly half in the example above), but this can become problematic if there are many different possibilities and the probability of each one becomes rather small.
An alternative would be to include a debug key in the request which sets the random seed. This would allow to "freeze" certain expected responses so only a few requests need to be made.
Are there any other alternatives? What is the usual way to do this?

Related

Multiple Dialogflow commands asked at same time

I have an Action where the user can set values of different parameters. Currently this is implemented something like this, and it works well:
Now I want to make the conversation less robot-like and more flexible, so I would like to allow users to set or change more than one value at a time. They should be able to say things like
Change the Interest Rate to 4% and the Term to 15 years.
or
Change the Interest Rate to 4%, the Term to 15 years, and the Years to Average Principal to 3.
There are a couple of ways to do this, but none of them are great, and all of them have issues of some sort when you try to scale them. (So they might work well for two or three parameters entered, but they probably won't work well for more than that.)
(It is worth noting, just for reference, that the Assistant itself has only recently started accepting more than one instruction at a time. But it only handles two, and this doesn't work for all commands.)
Add phrases with additional parameters
With this solution, you would supplement the phrases you have that collect one parameter with a similar set of phrases that collect two parameters. And then another set that also collect three parameters. You should be able to do these all as a single Intent and, in your fulfillment, determine which ones have been set.
It might look something like this:
That looks like it starts getting complicated, doesn't it? You need to list each combination of absolute values and percentages. If you have other types, you need to include each of those combinations as well. That starts getting unwieldy for 3 possible parameters, and certainly is above that. You also run the risk that it might get confused about which parameter should be set with which value (I haven't tested this - it is a theoretical concern).
Add an optional continuation phrase and handle that recursively
You can also treat this as the user saying "set a value, and then do something else" and treat the "do something else" part as another statement made to Dialogflow. The Intent might look something like this:
You can implement the "another statement made to Dialogflow" using the Dialogflow API. With Dialogflow V1, you'd use the Query endpoint. With Dialogflow V2, you'd use the detectIntent endpoint. In either case, you'd send the additional part of the query (if the user said something) and would get back the results from that. You'd add the resulting message from the call to the message from setting the current set of values and send the whole thing back.
As a recursive call, however, this does take up time. Since the initial call to Dialogflow really needs to be answered within 5 seconds, every additional call to Dialogflow (and then to your fulfillment) needs to be handled as quickly as possible. But even so, you probably won't be able to handle more than 2 or 3 of these before things time out on the front end.
It also runs the risk (or benefit) that other intents besides the edit.attribute Intent might be called in the "additional" portion. If you want to limit the risk of this, you could set a context to make sure that only Intents that have that incoming context would be called.
Summary
This really isn't an easy problem to solve. On one hand, you have the problem of having to list out every combination. On the other hand, recursion takes time, and you don't have a lot of time to process everything. In both cases, there is a real possibility of the phrase being understood incorrectly and you'll need to figure out error handling in the case where some values have been changed and others haven't.
You may need to experiment a lot, and the results may still not be satisfactory.
You can implement the "another statement made to Dialogflow" using the
Dialogflow API. With Dialogflow V1, you'd use the Query endpoint.
With Dialogflow V2, you'd use the detectIntent endpoint. In either
case, you'd send the additional part of the query (if the user said
something) and would get back the results from that. You'd add the
resulting message from the call to the message from setting the
current set of values and send the whole thing back.
As a recursive call, however, this does take up time. Since the
initial call to Dialogflow really needs to be answered within 5
seconds, every additional call to Dialogflow (and then to your
fulfillment) needs to be handled as quickly as possible. But even so,
you probably won't be able to handle more than 2 or 3 of these before
things time out on the front end.
The first thing that came to mind after reading those two paragraphs was batch requests.
A batch request allows a client application to pack multiple API calls into a single HTTP request (this batching technique is also known as a multi-part request).
Many Google APIs support a batch endpoint and I was able to verify that DialogFlow has a batch endpoint by checking its API Discovery document. This batch endpoint is not formerly documented in DialogFlow's API reference but you can leverage the documentation of other APIs (like this one) to get a feel for how it works. This link should also be instructive now that the global batch endpoint is no longer supported.
Assuming your queries are independent (ie. they don't rely on the results of other queries) you should be able to use a batch request to fetch more data.

Why Google Analytics API v3 is triggering ALWAYS sampling at 50%?

I have build a very simple crawler for Google Analytics (v3) and it used to work well until this week that I started to get sampled data in all queries.
I used to overcome sampling by simply reducing the date range of the queries, but now I get 50% of all sessions (aprox.), even for sample spaces of less than 100 sessions.
It seems like that something is triggering sampling, but I cannot realize what can be. Anyone has suffered similar issues?
EDITED
We are also suffering sampling when querying the "Users Overview" standard report from GA web interface (along with others), even when there are only 883 sessions and we are asking for a single day.
A sample query is below, where we are querying several metrics over 3 dimensions, with a sample size of 883 sessions and a sampling or around 50% (query URL is cropped, but parameters are listed on "query" key).
It seems that the reason could be related with querying ga:users metric with several dimensions, including ga:appId.
I have tried different combinations and only ga:users is returning sampled data when queried with more dimensions than ga:date.
In summary, if I query any other metric from the example with the same 3 dimensions it returns full space data.
Two weeks ago this was not happening, so I suppose that Google has changed the way ga:users is computed recently.
Moreover, as a side-effect I realized that querying users on batches is somehow misleading if you plan to compute the total number of users, because you cannot simply sum them. That is, ga:users is similar to ga:1dayUsers when queried with ga:date, and then you cannot aggregate data. Also weird is the fact that you cannot use ga:appId with ga:1dayUsers, but you can with ga:users.
We have also detected another problem after discarding ga:users in crawler. The issue is related with segment parameter, that it is also triggering sampling when used in combination with the remaining metrics and dimensions.
We collect data from several apps in the same view (not recommendable, but it is there for legacy reasons). Therefore we use a segment defined on-the-fly like "sessions::condition::ga:appId=#com.xxx.yyy.zzz".
The fact is that when we filter that way we suffer sampling, but if we use a common filter like "ga:appId=com.xxx.yyy.zzz" we do not get sampled results.
Probably the question is why we use the segment-based filter instead of standard filter, and the reason is because we need it for some specific metrics like ga:7dayUsers and related, which cannot be combined with ga:appId as dimension and so you cannot either use ga:appId in filters. Confusingly, for those metrics, when we use the segment-based filter we do not get sampled results.
Now it seems that all our API calls are returning real data.
Not sure yet however, why a default report in web interface like "Users Overview" is returning sampled data for a single day with less than 1000 sessions.
Hope this information could help someone else if having similar issues with sampling.

Best workflow for any RESTful operation in web CRUD

As a general rule for any RESTful CRUD operation, I follow these steps:
Validating information on client-side
Sending required information in JSON format to the server (possibly a web service)
Validating information on the server
Doing the operation
Returning JSON as the result of operation
Updating DOM based on server's response
Though this list is general, I think it's the most complete list. The only problem is that, I do it for any and every operation. I mean, DRY (don't repeat yourself) tells us to stop repeating things. Is it considered a repetition? Or should we follow these steps always?
Well, you can skip validating the data client-side if you wish…
Seriously, those are the necessary minimum for doing a lot of things; you must validate server side to prevent a whole host of potential problems and the other parts are just fundamental. OK, you can skip the sending to the server, but then you're not interacting with a REST service in the first place. You could also skip updating the DOM, but then you're not showing the results. In other words, every step of that sequence serves its own purpose that is independent from the other ones: they're not redundant.
But that doesn't mean that you should ignore DRY. Not at all. Instead, you should factor out as much of that code as possible into a single place so as to keep the number of repetitions to a minimum. (Maybe even find a framework to do some of that for you.)

What ist a RESTful-resource in the context of large data sets, i.E. weather data?

So I am working on a webservice to access our weather forecast data (10000 locations, 40 parameters each, hourly values for the next 14 days = about 130 million values).
So I read all about RESTful services and its ideology.
So I understand that an URL is adressing a ressource.
But what is a ressource in my case?
The common use case is that you want to get the data for a couple of parameters over a timespan at one or more location. So clearly giving every value its own URL is not pratical and would result in hundreds of requests. I have the feeling that my specific problem doesn't excactly fit into the RESTful pattern.
Update: To clarify: There are two usage patterns of the service. 1. Raw data; rows and rows of data for several locations and parameters.
Interpreted data; the raw data calculated into symbols (Suns & clouds, for example) and other parameters.
There is not one 'forecast'. Different clients have different needs for data.
The reason I think this doesn't fit into the REST-pattern is, that while I can actually have a 'forecast' ressource, I still have to submit a lot of request parameters. So a simple GET-request on a ressource doesn't work, I end up POSTing data all over the place.
So I am working on a webservice to access our weather forecast data (10000 locations, 40 parameters each, hourly values for the next 14 days = about 130 million values). ... But what is a ressource in my case?
That depends on the details of your problem domain. Simply having a large amount of data is not a good reason to avoid REST. There are smart ways and dumb ways to model and expose that data.
As you rightly see, your main goal at this point should be to understand what exactly a resource is. Knowing only enough about weather forecasting to follow the Weather Channel, I won't be much help here. It's for domain experts like yourself to make that call.
If you were to explain in a little more detail the major domain concepts you're working with, it might make it a little easier to give specific advice.
For example, one resource might be Forecast. When weatherpeople talk about Forecasts, what words keep coming up? When you think about breaking a forecast down into smaller elements, what words do you use to describe the pieces?
Do this process recursively, and you'll probably be able to make a list of important terms. Don't forget that these terms can describe things or actions. Think about what these terms really mean, what data you can use to model them, how they can be aggregated.
At this point you'll have the makings of something you can start building a RESTful system around - but not before.
Don't forget that a RESTful system is not a data dump wrapped in HTTP - it's a hypertext-driven system.
Also don't forget that media types are the point of contact between your server and its clients. A media type is only limited by your imagination and can model datasets of any size if you're clever about it. It can contain XML, JSON, YAML, binary elements such as a Bloom Filter, or whatever works for the problem.
Firstly, there is no once-and-for-all right answer.
Each valid url is something that makes sense to query, think of them as equivalents to providing query forms for people looking for your data - that might help you narrow down the scenarios.
It is a matter of personal taste and possibly the toolkit you use, as to what goes into the basic url path and what parameters are encoded. The debate is a bit like the XML debate over putting values in elements vs attributes. It is not always a rational or logically decided issue nor will everybody be kind in their comments on your decisions.
If you are using a backend like Rails, that implies certain conventions. Even if you're not using Rails, it makes sense to work in the same way unless you have a strong reason to change. That way, people writing clients to talk to Rails-based services will find yours easier to understand and it saves you on documentation time ;-)
Maybe you can use forecast as the ressource and go deeper to fine grained services with xlink.
Would it be possible to do something like this,Since you have so many parameters so i was thinking if somehow you can relate it to a mix of id / parameter combo to decrease the url size
/WeatherForeCastService//day/hour
www.weatherornot.com/today/days/x // (where x is number of days)
www.weatherornot.com/today/9am/hours/h // (where h is number of hours)

a simple/practical example of fuzzy c-means algorithm

I am writing my master thesis on the subject of dynamic keystroke authentication. To support ongoing research, I am writing code to test out different methods of feature extraction and feature matching.
My current simple approach just checks if the reference password keycodes matches the currently typed in keycodes and also checks if the keypress times (dwell) and the key-to-key times (flight) are the same as reference times +/- 100ms (tolerance). This is of course very limited and I want to extend it with some sort of fuzzy c-means pattern matching.
For each key the features look like: keycode, dwelltime, flighttime (first flighttime is always 0).
Obviously the keycodes can be taken out of the fuzzy algorithm because they have to be exactly the same.
In this context, how would a practical implementation of fuzzy c-means look like?
Generally, you would do the following:
Determine how many clusters you want (2? "Authentic" and "Fake"?)
Determine what elements you want to cluster (individual keystrokes? login attempts?)
Determine what your feature vectors will look like (dwell time, flight time?)
Determine what distance metric you will be using (how will you measure the distance of each sample from each cluster?)
Create exemplar training data for each cluster type (what does an authentic login look like?)
Run the FCM algorithm on the training data to generate the clusters
To create the membership vector for any given login attempt sample, run it through the FCM algorithm using the clusters you found in step 6
Use the resulting membership vector to determine (based on some threshold criteria) whether the login attempt is authentic
I'm not an expert, but this seems like an odd approach to determining whether a login attempt is authentic or not. I've seen FCM used for pattern recognition (eg. which facial expression am I making?), which makes sense because you're dealing with several categories (eg. happy, sad, angry, etc...) with defining characteristics. In your case, you really only have one category (authentic) with defining characteristics. Non-authentic keystrokes are simply "not like" authentic keystrokes, so they won't cluster.
Perhaps I am missing something?
I don't think you really want to do clustering here. You might want to do some proper fuzzy matching though instead of just allowing some delta on each value.
For clustering, you need to have many data points. Additionally, you'd need to know the proper number of means you need.
But what are these multiple objects meant to be? You have one data point for every keycode. You don't want to have the user type the password 100 times to see if he can do it consistently. And even then, what do you expect the clusters to be? You already know which keycode comes at which position, you don't want to find out what keycodes the user use for his password...
Sorry, I really don't see any clustering here. The term "fuzzy" seems to have mislead you to this clustering algorithm. Try "fuzzy logic" instead.