If "min_index_age" is determined using the index creation time as reference what is the significance of #timestamp field in the datastream[Opensearch] - opensearch

I am currently migrating some timeseries data stored in a regular index to a datastream managed using ISM policies. The policy definition is provided below
{
"policy": {
"policy_id": "policy1",
"description": "policy for ingesting sensor data",
"default_state": "hot",
"states": [{
"name": "hot",
"actions": [{
"rollover": {
"min_size": "32gb"
}
}],
"transitions": [{
"state_name": "delete",
"conditions": {
"min_index_age": "30d"
}
}]
},
{
"name": "delete",
"actions": [{
"delete": {}
}],
"transitions": []
}
],
"ism_template": [{
"index_patterns": [
"sensordata_*_history*"
],
"priority": 100
}]
}
}
I would have assumed that the deletion would happen based on the #timestamp field, i.e. if an index contains all data that is older than 30d, then the index would be deleted.
But it seems the index creation time is used as the reference time, then what is the purpose of the #timestamp field in the datastream ?

Yes, ISM policy will trigger the time based action with reference to index creation time.
#timestamp filed is most likely to use by the Kibana to plot the time based histogram.

Related

GA4 (Google Analytics) sessions based on UTM params

I am trying to fetch sessions from GA4 which are relevant to specific UTM params.
In GA3 we were able to use segments (sessions::condition::ga:source==X;ga:medium==Y) but I can not find a way to do this on GA4.
POST https://analyticsdata.googleapis.com/v1beta/#{property}:runReport`
Payload like this:
body = {
"metrics": [
{
"name": "sessions::condition::ga:source==X;ga:medium==Y"
}
],
"dimensions": [
{
"name": "date"
}
],
"dateRanges": [
{
"startDate": '2022-01-01',
"endDate": '2022-01-30',
"name": "current_year"
}
]
}
Returns: Field sessions::condition::ga:source==X;ga:medium==Y is not a valid metric.. Is there a way to do this via new API?
Should I use dimension filter to achieve that? I need to query on both source&medium but it is not clear how do I do this?
"dimensionFilter": {
"filter": {
"fieldName": "firstUserMedium",
"stringFilter": {
"value": "Y"
}
}
}
A dimension filter on sessionSource & sessionMedium returns sessions that have those specific utm_source & utm_medium values. See the dimensions & metrics page for a description of these and other dimensions & metrics.
The needed dimension filter is similar to the following. See Dimension Filters in Creating a Report for more info.
"dimensionFilter": {
"andGroup": {
"expressions": [
{
"filter": {
"fieldName": "sessionSource",
"stringFilter": {
"value": "X"
}
}
},
{
"filter": {
"fieldName": "sessionMedium",
"stringFilter": {
"value": "Y"
}
}
}
]
}
},
Segments are not yet available today in the GA4 Data API.
I think you should check the dimensions and metrcis list for GA4 they dont start with ga
POST https://analyticsdata.googleapis.com/v1beta/properties/GA4_PROPERTY_ID:runReport
{
"dateRanges": [{ "startDate": "2020-09-01", "endDate": "2020-09-15" }],
"dimensions": [{ "name": "country" }],
"metrics": [{ "name": "activeUsers" }]
}
Also at this time i don't think it supports segments.

Acumatica Web Services - Get data from Generic Enquiry

We are following the below Blog link. However the Results does not populate. In addition, adding fields to the current Results screen does not pull through onto the Service Body.
https://www.acumatica.com/blog/contract-based-apis-in-generic-inquiries/
enter image description here
{
"id": "30225908-c2b0-4013-9500-93606424f85a",
"rowNumber": 1,
"note": null,
"ResultFilter": [
{
"id": "c4bfa8f3-ad41-ea11-a821-000d3a4721ed",
"rowNumber": 1,
"note": null,
"CurrentPrice": {
"value": 0.000000
},
"InventoryID": {
"value": "420000013000"
},
"LastCost": {
"value": 0.0
},
"PurchaseUnit": {
"value": "METRE"
},
"QtyDisbursed": {},
"QtyOnHand": {
"value": "0.000000"
},
"WarehouseID": {
"value": "PRD-FINN"
},
"custom": {},
"files": []
}}
looking at your mapping of the fields, you are missing the Mapped object column.
Please add the field using the populate button. that way the field will get mapped properly.
For GI Enquiry Access, to retrieve and modify the service point that pulls from the GI, the user's role needs to have screen EDIT access. the 'Not Set' will be seen as revoked.
enter image description here

How to measure per user bandwidth usage on google cloud storage?

We want to charge users based on the amount of traffic their data has. Actually the amount of downstream bandwidth their data is consuming.
I have exported google cloud storage access_logs. From the logs, I can count the number of times a file is accessed. (filesize * count will be the bandwidth usage)
But the problem is that this doesn't work well with cached content. My calculated value is much more than the actual usage.
I went with this method because our traffic will be new and won't use the cache, which means that the difference won't matter. But in reality, it seems like it is a real problem.
This is a common use case and I think there should be a better way to solve this problem with google cloud storage.
{
"insertId": "-tohip8e1vmvw",
"logName": "projects/bucket/logs/cloudaudit.googleapis.com%2Fdata_access",
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"authenticationInfo": {
"principalEmail": "firebase-storage#system.gserviceaccount.com"
},
"authorizationInfo": [
{
"granted": true,
"permission": "storage.objects.get",
"resource": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"resourceAttributes": {}
},
{
"granted": true,
"permission": "storage.objects.getIamPolicy",
"resource": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"resourceAttributes": {}
}
],
"methodName": "storage.objects.get",
"requestMetadata": {
"destinationAttributes": {},
"requestAttributes": {
"auth": {},
"time": "2019-07-02T11:58:36.068Z"
}
},
"resourceLocation": {
"currentLocations": [
"eu"
]
},
"resourceName": "projects/_/bucket/bucket.appspot.com/objects/users/2y7aPImLYeTsCt6X0dwNMlW9K5h1/somefile",
"serviceName": "storage.googleapis.com",
"status": {}
},
"receiveTimestamp": "2019-07-02T11:58:36.412798307Z",
"resource": {
"labels": {
"bucket_name": "bucket.appspot.com",
"location": "eu",
"project_id": "project-id"
},
"type": "gcs_bucket"
},
"severity": "INFO",
"timestamp": "2019-07-02T11:58:36.062Z"
}
An entry of the log.
We are using a single bucket for now. Can also use multiple if it helps.
One possibility is to have a separate bucket for each user and get the bucket's bandwidth usage through timeseries api.
The endpoint for this purpose is:
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.timeSeries/list
And following are the parameters to achieve bytes sent for one hour (we can specify time range above 60s) whose sum will be the total bytes sent from the bucket.
{
"dataSets": [
{
"timeSeriesFilter": {
"filter": "metric.type=\"storage.googleapis.com/network/sent_bytes_count\" resource.type=\"gcs_bucket\" resource.label.\"project_id\"=\"<<<< project id here >>>>\" resource.label.\"bucket_name\"=\"<<<< bucket name here >>>>\"",
"perSeriesAligner": "ALIGN_SUM",
"crossSeriesReducer": "REDUCE_SUM",
"secondaryCrossSeriesReducer": "REDUCE_SUM",
"minAlignmentPeriod": "3600s",
"groupByFields": [
"resource.label.\"bucket_name\""
],
"unitOverride": "By"
},
"targetAxis": "Y1",
"plotType": "LINE",
"legendTemplate": "${resource.labels.bucket_name}"
}
],
"options": {
"mode": "COLOR"
},
"constantLines": [],
"timeshiftDuration": "0s",
"y1Axis": {
"label": "y1Axis",
"scale": "LINEAR"
}
}

Get all vertices having a labelname

I am using ibm graph in bluemix and new to this.
I created a graph named 'test' using the GUI provided by bluemix and uploaded the sample data 'Music Festival' provided by ibm in that graph.
Now I am trying to query all the vertices having label 'attendee' using below query.
def gt = graph.traversal();
gt.V().hasLabel("attendee");
But I am getting error as
Error: Error encountered evaluating script def gt = graph.traversal();gt.V().hasLabel("attendee"); with reason com.thinkaurelius.titan.core.TitanException: Could not find a suitable index to answer graph query and graph scans are disabled: [(~label = attendee)]:VERTEX
Not sure what I am doing wrong.
Can somebody tell where am i going wrong?
How can i get rid of this error and get the expected output?
Thanks
#Radhika, Your Gremlin query is a valid Gremlin query. However, some vendors (such as IBM Graph and Titan) chose to only allow users to start their queries with a query that is indexed.This is to make sure you get the performance of your queries. Calling hasLabel() by itself will give you the Could not find a suitable index... error as you can't create indexes for labels. What you need to do is follow this step with a step that uses a indexed property as in this query :
graph.traversal();gt.V().hasLabel("band").has("genre","pop");
An index for genre has been created in the schema for the sample music festival data as you can see below
{
"propertyKeys": [
{ "name": "name", "dataType": "String", "cardinality": "SINGLE" },
{ "name": "gender", "dataType": "String", "cardinality": "SINGLE" },
{ "name": "age", "dataType": "Integer", "cardinality": "SINGLE" },
{ "name": "genre", "dataType": "String", "cardinality": "SINGLE" },
{ "name": "monthly_listeners", "dataType": "String", "cardinality": "SINGLE" },
{ "name":"date","dataType":"String","cardinality":"SINGLE" },
{ "name":"time","dataType":"String","cardinality":"SINGLE" }
],
"vertexLabels": [
{ "name": "attendee" },
{ "name": "band" },
{ "name": "venue" }
],
"edgeLabels": [
{ "name": "bought_ticket", "multiplicity": "MULTI" },
{ "name":"advertised_to","multiplicity":"MULTI" },
{ "name":"performing_at","multiplicity":"MULTI" }
],
"vertexIndexes": [
{ "name": "vByName", "propertyKeys": ["name"], "composite": true, "unique": false },
{ "name": "vByGender", "propertyKeys": ["gender"], "composite": true, "unique": false },
{ "name": "vByGenre", "propertyKeys": ["genre"], "composite": true, "unique": false}
],
"edgeIndexes" :[
{ "name": "eByBoughtTicket", "propertyKeys": ["time"], "composite": true, "unique": false }
]
That's why the above query works and you need to do the same.
If you don't have a schema, create one. You can model it after the
one above or follow the API
doc
Create an (Vertex/Label) index for the properties that you'll start
your traversals from. In this example, Name, Gender and Genre for
vertex properties and name for the edge properties.
Call the schema
endpoint
to add your schema to your graph
It's recommended to create your schema before adding any data to
your graph so that you don't have to reindex later. That'll save you
a lot of time.
Once you create your schema, you can't modify what you created
already, but you can add new properties/indexes later on.
Look at the following code samples for Java and Nodejs for the exact code to use.
I hope that helps

How to update the Embedded Data which is inside of another Embedded Data?

I have document like below in MongoDB:
{
"_id": "test",
"tasks": [
{
"Name": "Task1",
"Parameter": [
{
"Name": "para1",
"Type": "String",
"Value": "*****"
},
{
"Name": "para2",
"Type": "String",
"Value": "*****"
}
]
},
{
"Name": "Task2",
"Parameter": [
{
"Name": "para1",
"Type": "String",
"Value": "*****"
},
{
"Name": "para2",
"Type": "String",
"Value": "*****"
}
]
}
]
}
There is Embedded Data Structure (Parameter) inside of another Embedded Data Structure (Tasks). Now I want to update the para1 in Task1's Parameter.
I have tried many ways but I can only use query tasks.Parameter.name to find the para1 but cannot update it. the example in the doc are using .$. to update the value in a Embedded Data Structure but it doesn't work in my case.
Anyone have any ideas ?
MongoDB currently only supports the positional operator once, and only for the top level array. There is a ticket SERVER-831 to change this behavior for your use case. You can follow the issue there and up vote it.
However, you might be able to change your approach to accomplish what you want to do. One way is to change your schema. Collapse the tasks name into the array so the document looks like this:
{
_id:test,
tasks:
[
{
Task:1
Name:para1,
Type:String,
Value:*****
},
{
Task:1
Name:para2,
Type:String,
Value:*****
},
{
Task:2
Name:para1,
Type:String,
Value:*****
},
{
Task:2
Name:para2,
Type:String,
Value:*****
}
]
}
Another approach that may work for you is to use $pull and $push. For instance something like this to replace a task (this assumes that tasks.Parameter.Name is unique to an array of Parameters):
db.test2.update({$and: [{"tasks.Name": "Task3"}, {"tasks.Parameter.Name":"para1"}]}, {$pull: {"tasks.$.Parameter": {"Name": "para1"}}})
db.test2.update({"tasks.Name": "Task3"}, {$push: {"tasks.$.Parameter": {"Name": "para3", Type: "String", Value: 1}}})
With this solution you need to be careful with regard to concurrency, as there will be a brief moment where the document doesn't exist.