How to measure per user bandwidth usage on google cloud storage? - google-cloud-storage

We want to charge users based on the amount of traffic their data has. Actually the amount of downstream bandwidth their data is consuming.
I have exported google cloud storage access_logs. From the logs, I can count the number of times a file is accessed. (filesize * count will be the bandwidth usage)
But the problem is that this doesn't work well with cached content. My calculated value is much more than the actual usage.
I went with this method because our traffic will be new and won't use the cache, which means that the difference won't matter. But in reality, it seems like it is a real problem.
This is a common use case and I think there should be a better way to solve this problem with google cloud storage.
{
"insertId": "-tohip8e1vmvw",
"logName": "projects/bucket/logs/cloudaudit.googleapis.com%2Fdata_access",
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {
"principalEmail": "firebase-storage#system.gserviceaccount.com"
},
"authorizationInfo": [
{
"granted": true,
"permission": "storage.objects.get",
"resource": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"resourceAttributes": {}
},
{
"granted": true,
"permission": "storage.objects.getIamPolicy",
"resource": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"resourceAttributes": {}
}
],
"methodName": "storage.objects.get",
"requestMetadata": {
"destinationAttributes": {},
"requestAttributes": {
"auth": {},
"time": "2019-07-02T11:58:36.068Z"
}
},
"resourceLocation": {
"currentLocations": [
"eu"
]
},
"resourceName": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"serviceName": "storage.googleapis.com",
"status": {}
},
"receiveTimestamp": "2019-07-02T11:58:36.412798307Z",
"resource": {
"labels": {
"bucket_name": "bucket.appspot.com",
"location": "eu",
"project_id": "project-id"
},
"type": "gcs_bucket"
},
"severity": "INFO",
"timestamp": "2019-07-02T11:58:36.062Z"
}
An entry of the log.
We are using a single bucket for now. Can also use multiple if it helps.

One possibility is to have a separate bucket for each user and get the bucket's bandwidth usage through timeseries api.
The endpoint for this purpose is:
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.timeSeries/list
And following are the parameters to achieve bytes sent for one hour (we can specify time range above 60s) whose sum will be the total bytes sent from the bucket.
{
"dataSets": [
{
"timeSeriesFilter": {
"filter": "metric.type=\"storage.googleapis.com/network/sent_bytes_count\" resource.type=\"gcs_bucket\" resource.label.\"project_id\"=\"<<<< project id here >>>>\" resource.label.\"bucket_name\"=\"<<<< bucket name here >>>>\"",
"perSeriesAligner": "ALIGN_SUM",
"crossSeriesReducer": "REDUCE_SUM",
"secondaryCrossSeriesReducer": "REDUCE_SUM",
"minAlignmentPeriod": "3600s",
"groupByFields": [
"resource.label.\"bucket_name\""
],
"unitOverride": "By"
},
"targetAxis": "Y1",
"plotType": "LINE",
"legendTemplate": "${resource.labels.bucket_name}"
}
],
"options": {
"mode": "COLOR"
},
"constantLines": [],
"timeshiftDuration": "0s",
"y1Axis": {
"label": "y1Axis",
"scale": "LINEAR"
}
}

Related

GA4 (Google Analytics) sessions based on UTM params

I am trying to fetch sessions from GA4 which are relevant to specific UTM params.
In GA3 we were able to use segments (sessions::condition::ga:source==X;ga:medium==Y) but I can not find a way to do this on GA4.
POST https://analyticsdata.googleapis.com/v1beta/#{property}:runReport`
Payload like this:
body = {
"metrics": [
{
"name": "sessions::condition::ga:source==X;ga:medium==Y"
}
],
"dimensions": [
{
"name": "date"
}
],
"dateRanges": [
{
"startDate": '2022-01-01',
"endDate": '2022-01-30',
"name": "current_year"
}
]
}
Returns: Field sessions::condition::ga:source==X;ga:medium==Y is not a valid metric.. Is there a way to do this via new API?
Should I use dimension filter to achieve that? I need to query on both source&medium but it is not clear how do I do this?
"dimensionFilter": {
"filter": {
"fieldName": "firstUserMedium",
"stringFilter": {
"value": "Y"
}
}
}
A dimension filter on sessionSource & sessionMedium returns sessions that have those specific utm_source & utm_medium values. See the dimensions & metrics page for a description of these and other dimensions & metrics.
The needed dimension filter is similar to the following. See Dimension Filters in Creating a Report for more info.
"dimensionFilter": {
"andGroup": {
"expressions": [
{
"filter": {
"fieldName": "sessionSource",
"stringFilter": {
"value": "X"
}
}
},
{
"filter": {
"fieldName": "sessionMedium",
"stringFilter": {
"value": "Y"
}
}
}
]
}
},
Segments are not yet available today in the GA4 Data API.
I think you should check the dimensions and metrcis list for GA4 they dont start with ga
POST https://analyticsdata.googleapis.com/v1beta/properties/GA4_PROPERTY_ID:runReport
{
"dateRanges": [{ "startDate": "2020-09-01", "endDate": "2020-09-15" }],
"dimensions": [{ "name": "country" }],
"metrics": [{ "name": "activeUsers" }]
}
Also at this time i don't think it supports segments.

Google Storage AuditLogs - finding who is trying to access

I have a google storage bucket with Audit Logs enabled. Every one\two days I getting logs about PERMISSION DENIED. The log is specifying what kind of access the requestor is asking for. But, not give me enough information to answer the question - who is requesting?
This is the log message:
{
"insertId": "rr6wsd...",
"logName": "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access",
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {},
"authorizationInfo": [
{
"permission": "storage.buckets.get",
"resource": "projects//buckets/BUCKET_NAME",
"resourceAttributes": {}
}
],
"methodName": "storage.buckets.get",
"requestMetadata": {
"callerSuppliedUserAgent": "Blob/1 (cr/340918833)",
"destinationAttributes": {},
"requestAttributes": {
"auth": {},
"reason": "8uSywAZKWkhOZWVkZWQg...",
"time": "2021-01-20T03:43:38.405230045Z"
}
},
"resourceLocation": {
"currentLocations": [
"us-central1"
]
},
"resourceName": "projects//buckets/BUCKET_NAME",
"serviceName": "storage.googleapis.com",
"status": {
"code": 7,
"message": "PERMISSION_DENIED"
}
},
"receiveTimestamp": "2021-01-20T03:43:38.488787956Z",
"resource": {
"labels": {
"bucket_name": "BUCKET_NAME",
"location": "us-central1",
"project_id": "PROJECT_ID"
},
"type": "gcs_bucket"
},
"severity": "ERROR",
"timestamp": "2021-01-20T03:43:38.399417759Z"
}
As you can see, the only information who talking about "who is trying to access" is
"callerSuppliedUserAgent": "Blob/1 (cr/340918833)",
But what that means? mean nothing to me.
How I can understand who is trying to access this permission?
The callerSuppliedUserAgent can be anything the client application puts in their request headers - Ignore it as this header can be faked. Only legitimate applications put anything meaningful in the header.
This is an unauthenticated request. There is no identity to record. Most likely a troll scanning the Internet looking for open buckets.
Notice that the auth key is empty. No authorization was provided in the request.
"requestAttributes": {
"auth": {},
"reason": "8uSywAZKWkhOZWVkZWQg...",
"time": "2021-01-20T03:43:38.405230045Z"
}

What thresholds should be set in Service Fabric Placement / Load balancing config for Cluster with large number of guest executable applications?

What thresholds should be set in Service Fabric Placement / Load balancing config for Cluster with large number of guest executable applications?
I am having trouble with Service Fabric trying to place too many services onto a single node too fast.
To give an example of cluster size, there are 2-4 worker node types, there are 3-6 worker nodes per node type, each node type may run 200 guest executable applications, and each application will have at least 2 replicas. The nodes are more than capable of running the services while running, it is just startup time where CPU is too high.
The problem seems to be the thresholds or defaults for placement and load balancing rules set in the cluster config. As examples of what I have tried: I have turned on InBuildThrottlingEnabled and set InBuildThrottlingGlobalMaxValue to 100, I have set the Global Movement Throttle settings to be various percentages of the total application count.
At this point there are two distinct scenarios I am trying to solve for. In both cases, the nodes go to 100% for an amount of time such that service fabric declares the node as down.
1st: Starting an entire cluster from all nodes being off without overwhelming nodes.
2nd: A single node being overwhelmed by too many services starting after a host comes back online
Here are my current parameters on the cluster:
"Name": "PlacementAndLoadBalancing",
"Parameters": [
{
"Name": "UseMoveCostReports",
"Value": "true"
},
{
"Name": "PLBRefreshGap",
"Value": "1"
},
{
"Name": "MinPlacementInterval",
"Value": "30.0"
},
{
"Name": "MinLoadBalancingInterval",
"Value": "30.0"
},
{
"Name": "MinConstraintCheckInterval",
"Value": "30.0"
},
{
"Name": "GlobalMovementThrottleThresholdForPlacement",
"Value": "25"
},
{
"Name": "GlobalMovementThrottleThresholdForBalancing",
"Value": "25"
},
{
"Name": "GlobalMovementThrottleThreshold",
"Value": "25"
},
{
"Name": "GlobalMovementThrottleCountingInterval",
"Value": "450"
},
{
"Name": "InBuildThrottlingEnabled",
"Value": "false"
},
{
"Name": "InBuildThrottlingGlobalMaxValue",
"Value": "100"
}
]
},
Based on discussion in answer below, wanted to leave a graph-image: if a node goes down, the act of shuffling services on to the remaining nodes will cause a second node to go down, as noted here. Green node goes down, then purple goes down due to too many resources being shuffled onto it.
From SF's perspective, 1 & 2 are the same problem. Also as a note, SF doesn't evict a node just because CPU consumption is high. So: "The nodes go to 100% for an amount of time such that service fabric declares the node as down." needs some more explanation. The machines might be failing for other reasons, or I guess could be so loaded that the kernel level failure detectors can't ping other machines, but that isn't very common.
For config changes: I would remove all of these to go with the defaults
{
"Name": "PLBRefreshGap",
"Value": "1"
},
{
"Name": "MinPlacementInterval",
"Value": "30.0"
},
{
"Name": "MinLoadBalancingInterval",
"Value": "30.0"
},
{
"Name": "MinConstraintCheckInterval",
"Value": "30.0"
},
For the inbuild throttle to work, this needs to flip to true:
{
"Name": "InBuildThrottlingEnabled",
"Value": "false"
},
Also, since these are likely constraint violations and placement (not proactive rebalancing) we need to explicitly instruct SF to throttle those operations as well. There is config for this in SF, although it is not documented or publicly supported at this time, you can see it in the settings. By default only balancing is throttled, but you should be able to turn on throttling for all phases and set appropriate limits via something like the below.
These first two settings are also within PlacementAndLoadBalancing, like the ones above.
{
"Name": "ThrottlePlacementPhase",
"Value": "true"
},
{
"Name": "ThrottleConstraintCheckPhase",
"Value": "true"
},
These next settings to set the limits are in their own sections, and are a map of the different node type names to the limit you want to throttle for that node type.
{
"name": "MaximumInBuildReplicasPerNodeConstraintCheckThrottle",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
},
{
"name": "MaximumInBuildReplicasPerNodePlacementThrottle",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
},
{
"name": "MaximumInBuildReplicasPerNodeBalancingThrottle",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
},
{
"name": "MaximumInBuildReplicasPerNode",
"parameters": [
{
"name": "YourNodeTypeNameHere",
"value": "100"
},
{
"name": "YourOtherNodeTypeNameHere",
"value": "100"
}
]
}
I would make these changes and then try again. Additional information like what is actually causing the nodes to be down (confirmed via events and SF health info) would help identify the source of the problem. It would probably also be good to verify that starting 100 instances of the apps on the node actually works and whether that's an appropriate threshold.

Can not create new layer (featuretype) in GeoServer using REST API

So I just used 2 working days trying to figure this out. We are automatic rendering process for maps. All the data is given in SQL base and my job is to write "wrapper" so we can implement this in our in-house framework. I managed all but one needed requests.
That request is POST featuretype since this is a way of creating a layer that can later be rendered.
I have all requests saved in postman for pre-testing on example data given by geoserver itself. I can't even get response with status code 201 and always get 500 internal server error. This status is described as possible syntax error in sytax. But I actually just copied and pasted exampled and used geoserver provided data.
This is the requst: http://127.0.0.1:8080/geoserver/rest/workspaces/tiger/datastores/nyc/featuretypes
and its body:
{
"name": "poi",
"nativeName": "poi",
"namespace": {
"name": "tiger",
"href": "http://localhost:8080/geoserver/rest/namespaces/tiger.json"
},
"title": "Manhattan (NY) points of interest",
"abstract": "Points of interest in New York, New York (on Manhattan). One of the attributes contains the name of a file with a picture of the point of interest.",
"keywords": {
"string": [
"poi",
"Manhattan",
"DS_poi",
"points_of_interest",
"sampleKeyword\\#language=ab\\;",
"area of effect\\#language=bg\\;\\#vocabulary=technical\\;",
"Привет\\#language=ru\\;\\#vocabulary=friendly\\;"
]
},
"metadataLinks": {
"metadataLink": [
{
"type": "text/plain",
"metadataType": "FGDC",
"content": "www.google.com"
}
]
},
"dataLinks": {
"org.geoserver.catalog.impl.DataLinkInfoImpl": [
{
"type": "text/plain",
"content": "http://www.google.com"
}
]
},
"nativeCRS": "GEOGCS[\"WGS 84\", \n DATUM[\"World Geodetic System 1984\", \n SPHEROID[\"WGS 84\", 6378137.0, 298.257223563, AUTHORITY[\"EPSG\",\"7030\"]], \n AUTHORITY[\"EPSG\",\"6326\"]], \n PRIMEM[\"Greenwich\", 0.0, AUTHORITY[\"EPSG\",\"8901\"]], \n UNIT[\"degree\", 0.017453292519943295], \n AXIS[\"Geodetic longitude\", EAST], \n AXIS[\"Geodetic latitude\", NORTH], \n AUTHORITY[\"EPSG\",\"4326\"]]",
"srs": "EPSG:4326",
"nativeBoundingBox": {
"minx": -74.0118315772888,
"maxx": -74.00153046439813,
"miny": 40.70754683896324,
"maxy": 40.719885123828675,
"crs": "EPSG:4326"
},
"latLonBoundingBox": {
"minx": -74.0118315772888,
"maxx": -74.00857344353275,
"miny": 40.70754683896324,
"maxy": 40.711945649065406,
"crs": "EPSG:4326"
},
"projectionPolicy": "REPROJECT_TO_DECLARED",
"enabled": true,
"metadata": {
"entry": [
{
"#key": "kml.regionateStrategy",
"$": "external-sorting"
},
{
"#key": "kml.regionateFeatureLimit",
"$": "15"
},
{
"#key": "cacheAgeMax",
"$": "3000"
},
{
"#key": "cachingEnabled",
"$": "true"
},
{
"#key": "kml.regionateAttribute",
"$": "NAME"
},
{
"#key": "indexingEnabled",
"$": "false"
},
{
"#key": "dirName",
"$": "DS_poi_poi"
}
]
},
"store": {
"#class": "dataStore",
"name": "tiger:nyc",
"href": "http://localhost:8080/geoserver/rest/workspaces/tiger/datastores/nyc.json"
},
"cqlFilter": "INCLUDE",
"maxFeatures": 100,
"numDecimals": 6,
"responseSRS": {
"string": [
4326
]
},
"overridingServiceSRS": true,
"skipNumberMatched": true,
"circularArcPresent": true,
"linearizationTolerance": 10,
"attributes": {
"attribute": [
{
"name": "the_geom",
"minOccurs": 0,
"maxOccurs": 1,
"nillable": true,
"binding": "com.vividsolutions.jts.geom.Point"
},
{},
{},
{}
]
}
}
So it is example case and I can't get any useful response from the server. I get the code 500 with body name (the first item in json). Similarly I get same code with body FeatureTypeInfo when trying with xml body(first tag).
I already tried the request in new instance of geoserver in Docker (changed the port) and still no success.
I check if datastore, workspace is available and that layer "poi" doesn't yet exists.
Here are also some logs of request (similar for xml body):
2018-08-03 07:35:02,198 ERROR [geoserver.rest] -
com.thoughtworks.xstream.mapper.CannotResolveClassException: name at
com.thoughtworks.xstream.mapper.DefaultMapper.realClass(DefaultMapper.java:79)
at .....
Does anyone know the solution to this and got it working. I am using GeoServer 2.13.1
So i was still looking for the answer and using this post (https://gis.stackexchange.com/questions/12970/create-a-layer-in-geoserver-using-rest) got to the right content to POST featureType and hence creating a layer in GeoServer.
The documentation is off in REST API docs.
Using above link I found out that when using JSON there is a missing insertion in JSON. For API to work here we need to add:
{featureType:
name: "...",
nativeName: "...",
.
.
.}
So that it doesn't start with "name" attribute but it is contained in "featureType".
I didn't try that for XML also but I guess it could be similar.
Hope this helps someone out there struggling like I did.
Blaz is correct here, you need an outer object of FeatureType and then an inner object with your config. So;
{
"featureType": {
"name": "layer",
"nativeName": "poi",
"your config": "stuff"
}
I find though that using a post request I get very little if any response and its not obvious if the layer creation worked. But you can call http://IP:8080/geoserver/rest/layers.json to check if your new layer is there.
It costs me a lot of time to create FeatureTypes using REST API. Use Json like this really works:
{
"featureType": {
"name": "layer",
"nativeName": "poi"
"otherProperties...":"values..."
}
And use Json below to create Workspace:
{
"workspace": {
"name": "test_workspace"
}
}
The REST API is out of date now. That's disappointing. Is there anyone knows how to get the lastest REST API document?

Could I request volume to get encrypted volume through Softlayer API

Currently, I would call 'SoftLayer_Virtual_Guest/getUpgradeItemPrices' to get local or san disk and call 'SoftLayer_Product_Package/id' (which block is based on '222') to get external disks. And I noticed that SoftLayer portal could provision an encrypted file/block volume.
And my question is that how could I request an encrypted disk by Softlayer API through these methods.
Thank you. :)
UPDATE
The encryption will be set automatically once it has completed the provision.
Note that encryption is only available in data centers with an asterisk (so called Upgraded Data centers).You may use the SoftLayer_Network_Storage::getFileBlockEncryptedLocations method to identify which they are.
Try the following REST requests:
For Block storage:
https://[username]:[apiKey]#api.softlayer.com/rest/v3/SoftLayer_Product_Order/verifyOrder
method: POST
{
"parameters":[
{
"complexType": "SoftLayer_Container_Product_Order_Network_Storage_AsAService",
"location": 449494,
"packageId": 759,
"volumeSize": 500,
"prices": [
{
"id": 189433
},
{
"id": 189443
},
{
"id": 193373
},
{
"id": 194633
},
{
"id": 193433
}],
"osFormatType": {
"keyName": "LINUX"
}
}
]
}
For File Storage:
https://[username]:[apiKey]#api.softlayer.com/rest/v3/SoftLayer_Product_Order/verifyOrder
method: POST
{
"parameters":[
{
"complexType": "SoftLayer_Container_Product_Order_Network_Storage_AsAService",
"location": 449600,
"packageId": 759,
"volumeSize": 250,
"prices": [
{
"id": 189433
},
{
"id": 189453
},
{
"id": 192043
},
{
"id": 193013
},
{
"id": 192053
}
],
"osFormatType": {
"keyName": "LINUX"
}
}
]
}
For more information please see below:
https://knowledgelayer.softlayer.com/procedure/migrate-file-storage-encrypted-file-storage
https://knowledgelayer.softlayer.com/faqs/1483#7277