I have a log stream that has a UserID= label. I'm trying to count the number of unique users within an hour.
Here's an example of my log stream:
ts=2022-09-16T10:52:54.21344541Z level=INFO UserID=65166 elapsed=2.364015ms
ts=2022-09-16T10:52:51.580617785Z level=INFO UserID=24413 elapsed=2.324235ms
ts=2022-09-16T10:52:48.947248244Z level=INFO UserID=65166 elapsed=2.818146ms
ts=2022-09-16T10:52:41.51854716Z level=INFO UserID=24413 elapsed=2.633352ms
ts=2022-09-16T10:51:14.584272582Z level=INFO UserID=24413 elapsed=2.04884ms
ts=2022-09-16T10:51:14.45564065Z level=INFO UserID=65166 elapsed=4.889566ms
The closest thing I've managed to achieve is count the number of requests for each user, but I just need to know the number of unique users in a given time range. Here's what I have:
count(count_over_time({app="app"} | logfmt [1h])) by (UserID)
After playing around a bit more, I realized I could just wrap the above query in another count() and get the number of unique users that way. For completeness sake, here's the query:
count(count(count_over_time({app="app"} | logfmt [1h])) by (UserID))
And this is what I get:
Related
I'm trying to query a Prometheus database to determine how many customers have recorded data for one metric with a specific label filter, but not another metric with a different label filter. I.e. all the customer_id's that show up in
sum(usage{usage_type="type_b"}) by (customer_id)
but not in
count(service_plan{plan_type=~".*plan_b.*"}) by (customer_id)
I could run each and just mash them together outside Prometheus, but I want to do this either in a single query in Prometheus, or with some fancy transformation tricks in Grafana.
You need unless operator - see these docs. The following query should return customer ids, which exist in the first query and are missing in the second query:
sum(usage{usage_type="type_b"}) by (customer_id)
unless
count(service_plan{plan_type=~".*plan_b.*"}) by (customer_id)
I have an OData Query that I am using to pull data into PowerBI that I am trying to make more efficient. I am doing a report from Azure DevOps and pulling data in from the WorkItemRevisions resource. Currently, I am pulling all the data for a Work Item and then filtering in PowerBI to only get when the State has changed. I would like to move this filtering to the Odata query so that I can minimize the data that I am pulling into the report.
Currently, I have a query like the following (simplified example used for this question)
https://analytics.dev.azure.com/{Organization}/{Project}/_odata/v3.0-preview/WorkItemRevisions?
$select=Revision,WorkItemId,WorkItemType,Title,State,ChangedDate,LeadTimeDays,ParentWorkItemId
How can this be updated so that only Revisions where the State has changed (from New to Active, Active to Done, etc) are returned?
How can this be updated so that only Revisions where the State has changed (from New to Active, Active to Done, etc) are returned?
I am afraid that OData Query could not perfectly achieve what we need.
There is a feature Revisions/any(r:r/state eq '{state}') to filter the work item has a set state in the past.
For example:
https://analytics.dev.azure.com/<Organization>/<Project>/_odata/v2.0//WorkItems?
$filter=State eq 'Closed' and Revisions/any(r:r/State eq 'Active')
This query is similar to a Work Item query that uses the Was Ever operator.
As I said, this may not be a perfect solution. That because it can only filter whether the work item has ever had a specified states, but cannot accurately determine the states of the work item must be from New to Active, Active to Done. If we change the state of the workitem from Active to Resolved, then change it from Resolved to Closed. Then this work item will appear in the query results.
In addition, even if you use the UI query, we cannot accurately query the result of the work item status changing from A to B. To achieve this goal, we need to use REST API.
So, we could use the feature Revisions/any(r:r/state eq '{state}') to reduce the data pulled into the report to a certain extent.
You can use below query to achieve what you want.
/_odata/v4.0-preview/WorkItemRevisions?&$apply=filter(((WorkItem/WorkItemType
eq 'Bug' or WorkItem/WorkItemType eq 'Product Backlog Item') and
((WorkItem/ChangedDate ge 2022-10-05Z and WorkItem/ChangedDate le
2022-11-04Z) or (WorkItem/CreatedDate ge 2022-10-05Z and
WorkItem/CreatedDate le 2022-11-04Z))))/groupby((WorkItemId,State),
aggregate(ChangedDate with min as MinChangedDate))
To filter the group by data you need to encapsulate it under $apply as it is shown above.
Above URL will return all states and their changed dates for Bug and PBI Work Item types which are added or updated with a given date range.
Hope it helps!
if you are able to use Analytics Views instead of OData, there is a dedicated field available in Analytics Views Fields setting called "State Changed Date"
I write the query below to get the up time for the microvices.
base_jvm_uptime_seconds{kubernetes_name="namepspce1"}
However, it returns multiple values, so the grafana returns "Only queries that return single series/table is supported". I am wondering how can I the first velus from the query result?
I tried base_jvm_uptime_seconds{kubernetes_name="namepspce1"}[0], but it doesn't work..
Thanks!
I suggest you first inspect the label values of these multiple time series by running the query in Prometheus Graph console.
The you'll need to decide which one you want to display. Random first usually isn't the best idea.
But you can always do topk(1,query) if it helps. Just turn the Instant mode on in the Grafana Query editor.
Scenario:
I am displaying a table of records. It initially displays the first 500 with "show more" at the bottom, which returns the next 500.
Issue:
If between initial display and clicking "show more" 1 record is added, that will cause "order by date, offset 500, limit 500" to overlap by 1 row.
I'd like to "order by date, offset until 'id of last row shown', limit 500"
My row IDs are UUIDs. I am open to alternative approaches that achieve the same result.
If you can order by ID, you can paginate using
where id > $last_seen_id limit 500
but that's not going to be useful where you're sorting by date.
Sort stability!
I really hope that "date" actually means "timestamp" though, otherwise your ordering will be unstable and you can miss rows in pagination; you'll have to order by date, id to get stable ordering if it's really a date, and should probably do so even for timestamp.
State on client
One option is to push the state out to the client. Have the client remember the last-seen (date,id) tuple, and use:
where date > $last_seen_date and id > $last_seen_id limit 500
Cursors
Do you care about scalability? If not, you can use a server-side cursor. Declare the cursor for the full query, without the LIMIT. Then FETCH chunks of rows as requested. To do this your app must have a way to consistently bind a connection to a specific user's requests, though, and not to reset that connection or return it to the pool between requests. This might not be practical with your pool/framework, but is probably the best solution if you can do it.
Temp tables
Another even less scalable option is to CREATE TABLE sessiondata.myuser_myrequest_blah AS SELECT .... then paginate that table. It's guaranteed not to change. This avoids the difficulty of needing to keep a consistent connection across requests, but will have a very slow first-request response time and is completely impractical for large user counts or large amounts of data.
Related questions
Handling paging with changing sort orders
Using "Cursors" for paging in PostgreSQL
How to provide an API client with 1,000,000 database results?
i think you can use a subquery in the where to accomplish this.
e.g. given you're paginating through a users table, and you want the records after a given user:
SELECT *
FROM users
WHERE created_at > (
SELECT created_at
FROM users
WHERE users.id = '00000000-1111-2222-3333-444444444444'
LIMIT 1
)
ORDER BY created_at DESC limit 5;
With the main purpose of posting Tasks, displayed as either 'to-do' or 'done', how would one better structure a NoSQL DB of the following objects:
Datetime Created Not Null
Task ID Not Null
Task ID as a Str Not Null
Task Title Not Null
Task Description
Time &/or Date Due
User Not Null
ID Not Null
ID as a Str Not Null
Name Not Null
Username Not Null
Location
Contacts Count
Date Created Not Null
UTC Offset Not Null
Time Zone Not Null
Geo-Enabled Not Null
Verified
Task Count Not Null
Language Not Null
Geo-Location
Coordinates
Place
Shared with Whom
?
Task Status
Marked Done
Auto-Moved to Done (because datetime-due is passed)
Labeled (True/False)
Edited
Edit Count
Edit Datetime
Deleted
Users can post an unlimited number of Tasks, and Tasks can be shared between users. How is this relationship best captured?
Tasks can be manually 'marked done', or 'auto-labeled' and 'auto-moved-to-done' because the date-time due is passed.
Edits & Deletes are to be recorded as well.
As a starting place, what are the upsides &/or downsides of the following schema (with scalability a prime focus):
{
"created_at":"Day Mon ## 00:00:00 +0000 20##",
"id":#####,
"id_str":"#####",
"title":"This is a title",
"description":"The description goes here..",
"date_due":"Day Mon ## 00:00:00 +0000 20##",
"user":{
"id":####,
"id_str":"####",
"name":"Full Name",
"user_name":"Username",
"location":"",
"contacts_count":101,
"created_at":"Day Mon ## 00:00:00 +0000 20##",
"utc_offset":####,
"time_zone":"Country",
"geo_enabled":true,
"verified":false,
"task_count":101,
"lang":"en",
},
"geo":?,
"coordinates":?,
"place":?,
"shared_with":?,
"moved_done":false,
"marked_done":false,
"edited":false,
"deleted":false,
}
Edits & Deletes are to be recorded as well.
Do you only need to know that a task was altered, not how or by whom?
Otherwise, that will probably require versioning, i.e. for every Task there can be a number of TaskVersions. Alternatively, you could store the modification only - it depends on your needs. In particular, having many writers isn't easy because of locking - what if two people try to change the same object at 'the same time'? You might want to consider optimistic vs. pessimistic locking or mvcc. Be warned that the "Tasks can be shared between users" requirement must be designed carefully.
As a starting place, what are the upsides &/or downsides of the following schema (with scalability a prime focus):
I guess that user refers to the user who logs in. I wouldn't denormalize that information. Suppose a user created a thousand tasks and adds a new contact. Now the contacts_count of 1000 documents must be updated or it will be wrong. Denormalize only what's really necessary, e.g. the user_name.
Depending on what kind of lists you show, you can also choose to only store the user id and actually fetch the user object whenever you need to display the user name. While complex joins aren't supported, doing a $in query on, say 50 or 100 ids (like you would have to query in a list of tasks) is pretty fast.