Is Bloomfilter one way hash? - bloom-filter

I am planning to distribute a bloomfilter through S3 for one of client application. There are couple of options I can take here.
Allow client to download the file directly from S3 through a pre-signed URL.
Respond the whole bloomfilter content to Client via API response.
Can someone please point to the documentation which explains about Bloomfilter decoding or decrypting? Can any decode the Bloomfilter and get the data back if they have access to Bloomfilter file or is it one way hashing?
Thanks,
Hareesh

Bloom filters do not decode or decrypt.
Bloom filers are a fast way to tell if a key is in a set, with the special property that they return false positives but no false negatives.
This picture explains this very well: https://en.wikipedia.org/wiki/Bloom_filter#/media/File:Bloom_filter_speed.svg

Related

What is the best practice to HTTP GET only the list of objects that I require

I am in a situation where in I want to REST GET only the objects that I require by addressing them with one of the parameters that I know about those objects.
E.g., I want to GET all the USERS in my system with ids 111, 222, 333.
And the list can be bigger, so I don't think it is best way to append the URL with what is required, but use payload with json.
But I am skeptical using JSON payload in a GET request.
Please suggest a better practice in the REST world.
I am skeptical using JSON payload in a GET request.
Your skepticism is warranted; here's what the HTTP specification has to say about GET
A payload within a GET request message has no defined semantics
Trying to leverage Undefined Behavior is a Bad Idea.
Please suggest a better practice in the REST world.
The important thing to recognize is that URI are identifiers; the fact that we sometimes use human readable identifiers (aka hackable URI) is a convenience, not a requirement.
So instead of a list of system ids, the URI could just as easily be a hash digest of the list of system ids (which is probably going to be unique).
So your client request would be, perhaps
GET /ea3279f1d71ee1e99249c555f3f8a8a8f50cd2b724bb7c1d04733d43d734755b
Of course, the hash isn't reversible - if there isn't already agreement on what that URI means, then we're stuck. So somewhere in the protocol, we're going to need to make a request to the server that includes the list, so that the server can store it. "Store" is a big hint that we're going to need an unsafe method. The two candidates I would expect to see here are POST or PUT.
A way of thinking about what is going on is that you have a single resource with two different representations - the "query" representation and the "response" representation. With PUT and POST, you are delivering to the server the query representation, with GET you are retrieving the response representation (for an analog, consider HTML forms - we POST application/x-www-form-urlencoded representations to the server, but the representations we GET are usually friendlier).
Allowing the client to calculate a URI on its own and send a message to it is a little bit RPC-ish. What you normally do in a REST API is document a protocol with a known starting place (aka a bookmark) and a sequence of links to follow.
(Note: many things are labeled "REST API" that, well, aren't. If it doesn't feel like a human being navigating a web site using a browser, it probably isn't "REST". Which is fine; not everything has to be.)
But I believe POST or PUT are for some requests that modify the data. Is it a good idea to use query requests with them ?
No, it isn't... but they are perfect for creating new resources. Then you can make safe GET calls to get the current representation of the resource.
REST (and of course HTTP) are optimized for the common case of the web: large grain hypermedia, caching, all that good stuff. Various use cases suffer for that, one of which is the case of a transient message with safe semantics and a payload.
TL;DR: if you don't have one of the use cases that HTTP is designed for, use POST -- and come to terms with the fact that you aren't really leveraging the full power of HTTP. Or use a different application - you don't have to use HTTP if its a bad fit.

What's the best best way to send complex option data in a GET request without using body payload?

so I finished a server in Node using Express (developed through testing) and while developing my frontend I realized that Java doesn't allow any body payload in GET requests. I've done some reading around and understand that the Http specs do allow this, but most often the standard is to not put any payload in GET. So if not this, then what's the best way to put this into a GET request. The data I include is mostly simple fields except in my structure some of them are nested and some of them are arrays, that's why sending JSON seemed so easy. How else should I do this?
For example I have a request with this data
{
token,
filter: {
author_id,
id
},
limit,
offset
}
I've done some reading around and understand that the Http specs do allow this, but most often the standard is to not put any payload in GET.
Right - the problem is that there are no defined semantics, which means that even if you control both the client and the server, you can't expect intermediaries participating in the discussion to cooperate. See RFC-7231
The data I include is mostly simple fields except in my structure some of them are nested and some of them are arrays, that's why sending JSON seemed so easy. How else should I do this?
The HTTP POST method is the appropriate way to deliver a payload to a resource. POST is the most general method in the HTTP vocabulary, it covers all use cases, even those covered by other cases.
What you lose in POST is the fact that the request is safe and idempotent, and you don't get any decent caching behavior.
On the other hand, if the JSON document is being used to constrain the representation that is returned by the resource, then it is correct to say that the JSON is part of the identifier for that document, in which case you encode it into the query
/some/hierarchical/part?{encoded json goes here}
This gives you back the safe semantic, supports caching, and so on.
Of course, if your json structures are complicated, then you may find yourself running into various implicit limits on URI length.
I found some interesting specs for GET that allow more complex objects to be posted (such as arrays and objects with properties inside). Many frameworks that support GET queries seem to parse this.
For arrays, redefine the field. For example for the array ids=[1,2,3]
test.com?ids=1&ids=2&ids=3
For nested objects such as
{
filter.id: 5,
filter.post: 2
}
test.com?filter[id]=5&filter[post]=2

ESAPI for email address "blabla#example.com"

I saw some related questions. But I was not getting what exactly I was looking for. Sorry, if this turns out to be a silly request. Hopefully, I am having this specific query:
So I am trying to make a ReST API with MySQL database.
I am trying to read data from a table which is basically pulling out the valid email addresses of the users.
The output is going to be displayed on a HTML page.
temp = blabla#example.com
temp = ESAPI.encoder().canonicalize(temp);
temp = ESAPI.encoder().encodeForHTML(temp);
OUTPUT: temp = blabla#gmail.com
How can I avoid this from happening? and get blabla#email.com
I think the behavior here is as expected. But I just wanted to know if there is a work around other can Conditional Handling (if..else)
Also, what if someone can point me to the reasoning behind some of design choices for ESAPI. I t should be interesting read.
SHORT ANSWER:
If you can truly trust what's coming from your database, you don't need to perform canonicalize. If you know your data isn't going to be used by a browser, don't encode for HTML. If however you suspect your data will be used by a browser, encode it, have the caller deal with the results. If that's deemed unacceptable, expose an "unsafe" version of your webservice, one whose URL will explicitly use warning words to flag as "potentially malicious," forcing your caller to be aware that they're engaging in unsafe activity.
LONG ANSWER:
Well first, according to your use-case, you're essentially providing data to a calling client. My first instinct upon reading your question is that I don't think you're comfortable with your data contexts.
So, typically you're going to see a call to canonicalize() when you need safe data to perform validation against. So, the first questions to ask are these:
q1: Can I trust the data coming from my database?
Guidelines for q1: If the data is appropriately validated and neutralized, say by using a call to ESAPI.validator().getValidInput( args ); by the process that stores the data, then the application will store a safe email string into the database. If you can provably trust your input data at this point, it should be completely safe for you to not canonicalize your output as you're doing here.
If however, you cannot trust the data at this point, then you're in a scenario where before you pass along data to a downstream system, you'll need to validate it. A call to ESAPI.validator().getValidInput( args ); will BOTH canonicalize the input and ensure that its a valid email address. However this comes with the baggage that your caller is going to have to properly transform the neutralized input, which according to your question is what you want to avoid.
If you want to send safe data downstream, and you cannot defensibly trust your data source, you have no choice but to send safe data to your caller and have them work with it on their end--except perhaps to expose an unsafe method, which I will discuss shortly.
q2: Will browsers be used to consume my data?
Guidelines for q2: the encoder.encodeForHTML() method is designed to neutralize browser interpretation. Since you're talking about RESTful web services, I don't understand why you think you need to use it, because a browser should correctly interpret blabla#gmail.com to the correct canonical form--unless perhaps its being correctly trapped as a data element, such as in a dropdown box. But this is something I'm guessing you have NO control over?
As you can now tell, there are no fast answers to questions like this. You have to have some idea of how the data will be used by your caller. Since you have the possibility of having your data treated correctly as data by the browser, and the possibility of the data treated as code, you might be forced to offer a "safe" and "unsafe" call to retrieve your data, assuming that you have no control over how the client uses your service. That puts you in a bad spot, because a lazy caller might simply only ever use the unsafe version. When this happens in my industry, I'll usually make it so that the URL to call for an unsafe function looks something like mywebservice.com/unSafeNonPCICompliantMethod or something similar, so that you force your caller to explicitly accept the risk. If its being used in the correct context on the browser... the unsafe method might actually be safe. You just won't know.

post body in REST

I was referring to the O'Reilly book on REST api design, that clearly lays down the message format specifically around the areas of how links should be used to represent interrelated resources and stuff. But all the examples are for reading a resource (GET) and how the server structures the message. But what about a Create (POST) ? Should the message structure for create of a similarily inter-connected object be similar i.e through links ??
By the way of an example, let us consider we want to create a Person object with a Parent field . Should the json message format sent to server thru POST (Post msg body) be like :-
{
name:'test',
age:12,
links:[
{
rel:'parent',
href:'/people/john'
}
]
}
Here is a media type you could look at
http://stateless.co/hal_specification.html
Yes, that is one way of doing it. GET information might be usefully made human-readable, but POST/PUT information targets the machine.
Adding information to reduce the server's need to process information (e.g. by limiting itself to verifying information makes sense rather than recovering it all from scratch) also makes a lot of sense, performance-wise. As long as you do verify: keep in mind that user data must be treated as suspect on general principles. You don't want the first ExtJS-savvy guy being able to forge requests to your services.
You might also format data in XML or CSV, depending on what's best for the specific application. And keeping in mind that you might want to refactor or reuse the code, so adhering to a single standard also makes sense. All things considered, JSON is probably the best option.

RESTful API design: should unchangable data in an update (PUT) be optional?

I'm in the middle of implementing a RESTful API, and I am unsure about the 'community accepted' behavior for the presence of data that can not change. For example, in my API there is a 'file' resource that when created contains a number of fields that can not be modified after creation, such as the file's binary data, and some metadata associated with it. Additionally, the 'file' can have a written description, and tags associated.
My question concerns doing an update to one of these 'file' resources. A GET of a specific 'file' will return all the metadata, description & tags associated with the file, plus the file's binary data. Should a PUT of a specific 'file' resource include the 'read only' fields? I realize that it can be coded either way: a) include the read only fields in the PUT data and then verify they match the original (or issue an error), or b) ignore the presence of the read only fields in the PUT data because they can't change, never issuing an error if they don't match or are missing because the logic ignores them.
Seems like it could go either way and be acceptable. The second method of ignoring the read only fields can be more compact, because the API client can skip sending that read only data if they want; which seems good for people who know what they are doing...
Personally, both ways are acceptable.... however, if I were you, I'll opt for option A (check read-only fields to ensure they are not changed, else throw an error). Depending on the scope of your project, you cannot assume what the consumers know about your Restful WS in depth because most of them don't read documentations or WADL, even if they are the experienced ones. :)
If you don't provide immediate feedback to the consumers that certain fields are read-only, they will have a false assumption that your web service will take care all the changes they have made without double checking, OR once they find out the "inconsistent" updates, they complain to others that your web service is buggy.
You can approach this in two different ways if the read-only field doesn't match the original values...
Don't process the request. Send a 409 Conflict code and specific error message.
Process the request, send a 200 OK and a message stating that changes made the read-only fields are ignored.
Unless the read-only data makes up a significant portion of the data (to the extreme that transmitting the read-only data has a noticeable impact on network traffic and response times), you should write the service to accept the read only fields in the PUT and check them for changes. It's just simpler to have the same data going in and out.
Look at it this way: You could make inclusion of the read only fields optional in the PUT, but you will still have to / should write the code in the service to check that any read only fields that were received contain the expected values. You have to write the read only checking either way.
Prohibiting the read-only fields in the PUT is a bad idea because it will require the clients to strip away fields they received from you in the GET. This requires that the client get more intimately involved with your data and semantics than they really need to be. The clients will consider this a headache, an unnecessary complication, and downright mean of you to add to their burden. Taking data received from your GET, modifying one field of interest, and sending it back to you with a PUT should be a brain-dead simple round-trip for the client. Don't complicate things when you don't have to.