I'm working with a service that will forward data to a URL of your choosing via HTTP POST requests.
Is there a simple way to publish to a Pubsub topic with a POST? The service I'm using (Hologram.io's Advanced Webhook Builder) can't store any files, so I can't upload a Google Cloud service account JSON key file.
Thanks,
Ryan
You have 2 challenges in your use cases:
Format
Authentication
Format
You need to customize the webhook to comply with the PubSub format. Some webhoock are enough customizable for that but it's not the case of all. If you can't customize the webhook call as PubSub expect, you need to use an intermediary layer (Cloud Functions or Cloud Run for example)
Authentication
Directly to PubSub or with an intermediary layer, the situation is the same: the requester (the webhook) needs to be authenticated and authorized to access to the Google Cloud service.
One of the bad, and possible, practice, is to set allUsers authorized to access your resources. Here an example with a PubSub topic
Don't do that. Even if you increase "your" process security by defining a schema (and thus to reject all the messages that aren't compliant with this schema), letting a resource publicly, and without authentication, accessible on the wild internet is criminal!
In the webhook context (I had this case previously in my company) I recommend you to use a static authentication (a long lived authentication header; not a short lived (1h) as a Google OAuth2 token); an API Key for example. It's not perfect, because in case of API Key leak, the bad actors will be able to use this breach for a long time (rotate as soon as you can your API Keys!), but it's safer than nothing!
I wrote a pretty old article on this use case (with ESPv2 and Cloud Run), but the principle, and the configuration, is almost the same on API Gateway, a Google Cloud manage services. In the article, I create a proxy for Cloud Run, Cloud Functions and App Engine, but you can do the same thing with PubSub by setting the correct target URL.
Related
Context: We are hosting an online shop that needs to track customer behaviour. To achieve this tracking we have integrated several tracking events based on the customer journey in our shop. Based on the GDPR requirements in Europe we are forced to send the tracking events to infrastructure that is controlled by us as a company. Sending data via the Google Analytics Tag Manager directly to Google Servers is forbidden by the GDPR law. Sidenote: To simplify this question, I intentionally leave out all stuff regards user consent management.
Problem statement: We have the need that each client sends every tracking event directly from the browser to a Pub/Sub endpoint. Now, my question is how a best practise for a proper security would look like.
Current proposal: The Pub/Sub endpoint doesn't require an authentication --> AllUsers have been granted Pub/Sub Publisher permission. In addition I've created an API-KEY that is restricted to
the Pub/Sub API only
to specific HTTP referrers (basically the domain our webshop operates)
Are there other strategies that could be applied? Is the current proposal a valid (aka secure) way to go?
Giving pub/sub publisher access to allUsers are not recommended. Create service account and give publisher access to that and send messages using that service account.
I want to use Dialogflow for my enterprise usage. So want to know whether Dialog flow will be able to hit Non public URLs?
Since Dialogflow is a service hosted by Google, fulfillment requests specified by Webhook URLs must be able to be reached by Dialogflow for them to be invoked. In addition, the webhook endpoints must expose themselves using SSL/TLS and must be associated with a non-self-signed certificate. When a request is made from Dialogflow, dialogflow can provide authentication credentials to ensure that it is indeed Dialogflow that is making the request.
One pattern for your usage is to expose the Webhooks to the Internet and only allow connections from the Google IP address range and also require authentication (known only to Dialogflow). This would go a long way in preventing malicious access to your Webhook.
An alternative would be to define your Webhook as a GCP hosted endpoint and then you would own the routing back to your internal system from there. That could use a variety of technologies beyond HTTP including Pub/Sub. For example, when Dialogflow invokes the Webhook, a GCP application could be called that posts a message to PubSub. Your Enterprise application could be a subscriber and be notified that it has work to do. It does work and responds with a new message which is received by your GCP hosted Webhook that then returns the response to Dialogflow. As such, there is no surface area for an attacker to try and penetrate.
I am developing a web application with Spring Boot and a React.js SPA, but my question is not specific to those libraries/frameworks, as i assume reporting client-side JS errors to the server (for logging and analyzing) must be a common operation for many modern web applications.
So, suppose we have a JS client application that catches an error and a REST endpoint /errors that takes a JSON object holding the relevant information about what happened. The client app sends the data to the server, it gets stored in a database (or whatever) and everyone's happy, right?
Now I am not, really. Because now I have an open (as in allowing unauthenticated create/write operations) API endpoint everyone with just a little knowledge could easily spam.
I might validate the structure of JSON data the endpoint accepts, but that doesn't really solve the problem.
In questions like "Open REST API attached to a database- what stops a bad actor spamming my db?" or "Secure Rest-Service before user authentification", there are suggestions such as:
access quotas (but I don't want to save IPs or anything to identify clients)
Captchas (useless for error reporting, obviously)
e-mail verification (same, just imagine that)
So my questions are:
Is there an elegant, commonly used strategy to secure such an endpoint?
Would a lightweight solution like validating the structure of the data be enough in practice?
Is all this even necessary? After all I won't advertise my error handling API endpoint with a banner in the app...
I’ve seen it done three different ways…
Assuming you are using OAuth 2 to secure your API. Stand up two
error endpoints.
For a logged in user, if an errors occurs you would
hit the /error endpoint, and would authenticate using the existing
user auth token.
For a visitor, you can expose a /clientError (or
named in a way that makes sense to you) endpoint that takes the
client_credentials token for the client app.
Secure the /error endpoint using an api key that would be scope for
access to the error endpoint only.
This key would be specific to the
client and would be pass in the header.
Use a 3rd party tool such as Raygun.io, or any APM tool, such as New Relic.
I do have some data stored in my Real-Time Firebase database. I am willing to expose some of this data via a REST API to my B2B customers.
I know that Firebase is itself a REST API but its authentication mechanisms don't fit my needs. I am willing my customers to access the API with a simple API Key passed in the HTTP request headers.
To summarize, I need an API layer sitting on top of my Firebase real-time database with the following properties:
Basic Authentication via an API key passed in the HTTP request headers
Some custom logic that makes sure customers respect the API limits (maximum requests per day for example)
The only thing I can think of is implementing this layer in AWS lambda but that also sounds a bit off. From the lambda, I would have to access my Firebase database and serve that data. That seems too many network requests; something native to Firebase would be great.
Thanks,
Guven.
Why not have a simple API which provides them an Oauth token for the original firebase REST API if they have the correct Api Key
It'll be more secure as only you'll be able to make the tokens as only you'll have the service account private key. Also saves you the headache of making a whole REST API. Also the Oauth tokens expire relatively quickly so it's less of a risk than a normal key that you furnish
I personally have created my own Servlets where a user posts their data if they are authenticated using an id pass combo.
In the Servlets i use the default REST API provided by Firebase with the Oauth generated in my servlet. This way, i can have the DB security rules set to false for all writes from any client api. And the REST API and their admin sdk on my server ignore the security rules by default.
After some research, I have decided that AWS is the best platform such API related features.
Gateway API lets you setup your API interface in a matter of seconds
DynamoDB stores your API data; you can easily populate the data here
AWS Lambda lets you write the integration code between Gateway API and DynamoDB
On top of these, the platform offers these features out of the box:
Creation & handling and verification of API keys for authentication
Usage plans to make sure that API consumers don't exceed your API usage limits
Most of what I was looking for is offered in these AWS services.
I built a package in R which basically wraps around the Cloud Storage JSON API. I included a default OAuth app (that is a client id and client secret, see documentation) in the package. The client id and secret are created and hosted in my own cloud platform project with my billing details. The R package uses the OAuth app to ask for end user's authentication before any API calls and stores the token for the end user. Any subsequent API calls are sent with the retrieved token.
I noticed that the stats about the end users' API calls are showing up in my own project because it hosts the OAuth app. In this case, do I get charged for those API calls by end users?
All calls to GCS are always billed to the bucket that they reference. Calls that don't specify a specific bucket, like "list buckets in a project", are billed to the project in question.
Your JAVA JavaScript Structructured Omitts Notations are very local and require a great deal with NAMESPACE as it will resolve quite rampantly if not given a proper address pool I suggest Googles DNS alongside subsequent calls within the given IP zone 10.10.10.10/12 etc ... As higher languages like human language tend to fall outside these zone and need to be delegated ... Might be jumping from 0.0.0.0 to higher address pools without knowing can be a pain.