Variable-length path query runs forever but executes immediately when edge is bidrectional - postgresql

Problem
We have a graph in which locations are connected by services. Services that have common key-values are connected by service paths. We want to find all the service paths we can use to get from Location A to Location Z.
The following query matches services that go directly from A to Z and service paths that take one hop to get from A to Z:
MATCH p=(origin:location)-[:route]->(:service)-[:service_path*0..1]->
(:service)-[:route]->(destination:location)
WHERE origin.name='A' AND destination.name='Z'
RETURN p;
and runs fine.
But if we expand the search to service paths that may take two hops between A and Z:
MATCH p=(origin:location)-[:route]->(:service)-[:service_path*0..2]->
(:service)-[:route]->(destination:location)
WHERE origin.name='A' AND destination.name='Z'
RETURN p;
the query runs forever.
It never times out or crashes the server - it just runs continuously.
However, if we make the variable-length part of the query bidirectional:
MATCH p=(origin:location)-[:route]->(:service)-[:service_path*0..2]-
(:service)-[:route]->(destination:location)
WHERE origin.name='A' AND destination.name='Z'
RETURN p;
The same query that ran forever now executes instantaneously (~30ms on a dev database with default Postgres config).
More info
Behavior in different versions
This problem occurs in AgensGraph 2.2dev (cloned from GitHub). In Agens 2.1.0 , the first query -[:service_path*0..1]-> still works fine, but the broken query -[:service_path*0..2]-> AND the version that works in 2.2dev, -[:service_path*0..2]- result in an error:
ERROR: btree index keys must be ordered by attribute
This leads us to believe that the problem is related to this commit, which was included as a bug fix in Agens 2.1.1
fix: Make re-scanning inner scan work
VLE threw "btree index keys must be ordered by attribute" because
re-scanning inner scan is not properly done in some cases. A regression
test is added to check it.
Duplicate paths returned, endlessly
In AgensBrowser v1.0, we are able to get results out using LIMIT on the end of the broken query. The query always returns the maximum number of rows, but the resulting graph is very sparse. Direct paths and paths with one hop show up, but only one longer path appears.
In the result set, the shorter paths are returned in individual records as expected, but first occurrence of a two-hop path is duplicated for the rest of the rows.
If we return some collection along with the paths, like RETURN p, nodes(p) LIMIT 100, the query again runs infinitely.
(Interestingly, this row duplication also occurs for a slightly different query in which we used the bidirectional fix, but the entire expected graph was returned. This may deserve its own post.)
The EXPLAIN plans are identical for the one-hop and two-hop queries
We could not compare EXPLAIN ANALYZE (because you can't ANALYZE a query that never finishes running) but the query plans were exactly identical between all of the queries - those that ran and those that didn't.
Increased logging revealed nothing
We set the logging level for Postgres to DEBUG5, the highest level, and ran the infinitely-running query. The logs showed nothing amiss.
Is this a bug or a problem with our data model?

Related

Race condition in amplify datastore

When updating an object, how can I handle race condition?
final object = await Amplify.Datastore.query(Object.classtype, where: Object.ID.eq('aa');
Amplify.Datastore.save(object.copywith(count: object.count + 1 ));
user A : execute first statement
user B : execute first statement
user A : execute second statement
user B : execute second statement
=> only updated + 1
Apparently the way to resolve this is to either
1 - use conflict resolution, available from Datastore 0.5.0
One of your users (whichever is slowest) gets sent back the rejected version plus the latest version from server, you get both objects back to resolve discrepancies locally and retry update.
2 - Use a custom resolver
here..
and check ADD expressions
You save versions locally and your vtl is configured to provide additive values to the pipeline instead of set values.
This nice article might also help to understand that
Neither really worked for me, one of my devices could be offline for days at a time and i would need multiple updates to objects to be performed in order, not just the last current version of the local object.
What really confuses me is that there is no immediate way to just increment values, and keep all incremented objects' updates in the outbox instead of just the latest object, then apply them in order when connection is made..
I basically wrote in a separate table to do just that to solve my problem, but of course with more tables and rows, comes more reads and writes and therefore more expense.
Have a look at my attempts here if you want the full code lmk
And then i guess hope for an update to amplify that includes increment values logic to update values atomically out of the box to avoid these common race conditions.
Here is some more context

MySQL Workbench Results

How do disable the Result in workbench?
The reason I am asking is that after certain number of results displayed, MySQL pops up with a message "The maximum result is reached for results. Do you want to continue?" something along those line.
You don't disable results or result sets as these are the meat of any database client tool. A result is what you get when you query your database. Disabling that would mean you don't do anything meaningful anymore with the database.
The error you get, however, is shown because you got too many results to display. Each query, which returns a result set, creates a tab in the result area. Creating too many tabs no only makes no sense, since you cannot really see them, but also consumes a lot of system resources. So there's this sanity check in MySQL Workbench to avoid creating too many.
Solution: make sure your scripts/queries don't return more result sets than what you have set in your settings:
or increase that amount via preferences -> SQL Editor -> Query Editor -> Max number of result sets, but use a sane value there!

Can you calculate active users using time series

My atomist client exposes metrics on commands that are run. Each command is a metric with a username element as well a status element.
I've been scraping this data for months without resetting the counts.
My requirement is to show the number of active users over a time period. i.e 1h, 1d, 7d and 30d in Grafana.
The original query was:
count(count({Username=~".+"}) by (Username))
this is an issue because I dont clear the metrics so its always a count since inception.
I then tried this:
count(max_over_time(help_command{job=“Application
Name”,Username=~“.+“}[1w]) -
max_over_time(help_command{job=“Application name”,Username=~“.+“}[1w]
offset 1w) > 0)
which works but only for one command I have about 50 other commands that need to be added to that count.
I tried the:
"{__name__=~".+_command",job="app name"}[1w] offset 1w"
but this is obviously very expensive (timeout in browser) and has issues with integrating max_over_time which doesn't support it.
Any help, am I using the metric in the wrong way. Is there a better way to query... my only option at the moment is the count (format working above for each command)
Thanks in advance.
To start, I will point out a number of issues with your approach.
First, the Prometheus documentation recommends against using arbitrarily large sets of values for labels (as your usernames are). As you can see (based on your experience with the query timing out) they're not entirely wrong to advise against it.
Second, Prometheus may not be the right tool for analytics (such as active users). Partly due to the above, partly because it is inherently limited by the fact that it samples the metrics (which does not appear to be an issue in your case, but may turn out to be).
Third, you collect separate metrics per command (i.e. help_command, foo_command) instead of a single metric with the command name as label (i.e. command_usage{commmand="help"}, command_usage{commmand="foo"})
To get back to your question though, you don't need the max_over_time, you can simply write your query as:
count by(__name__)(
(
{__name__=~".+_command",job=“Application Name”}
-
{__name__=~".+_command",job=“Application name”} offset 1w
) > 0
)
This only works though because you say that whatever exports the counts never resets them. If this is simply because that exporter never restarted and when it will the counts will drop to zero, then you'd need to use increase instead of minus and you'd run into the exact same performance issues as with max_over_time.
count by(__name__)(
increase({__name__=~".+_command",job=“Application Name”}[1w]) > 0
)

Can I debug a PostgreSQL query sent from an external source, that I can't edit?

I see how to debug queries stored as Functions in the database. But my problem is with an external QGIS plugin that connects to my Postgres 10.4 via network and does a complex query and calculations, and stores the results back into PostGIS tables:
FOR r IN c LOOP
SELECT
(1 - ST_LineLocatePoint(path.geom, ST_Intersection(r.geom, path.geom))) * ST_Length(path.geom)
INTO
station
(continues ...)
When it errors, it just returns that line number as the failing location, but no clue where it was in the loop through hundreds of features. (And any features it has processed are not stored to the output tables when it fails.) I totally don't know enough about the plugin and about SQL to hack the external query, and I suspect if it was a reasonable task the plugin author would have included more revealing debug messages.
So is there some way I could use pgAdmin4 (or anything) from the server side to watch the query process? Even being able to see if it fails the first time through the loop or later would help immensely. Knowing the loop count at failure would point me to the exact problem feature. Being able to see "station" or "r.geom" would make it even easier.
Perfectly fine if the process is miserably slow or interferes with other queries, I'm the only user on this server.
This is not actually a way to watch the RiverGIS query in action, but it is the best I have found. It extracts the failing ST_Intersects() call from the RiverGIS code and runs it under your control, where you can display any clues you want.
When you're totally mystified where the RiverGIS problem might be, run this SQL query:
SELECT
xs."XsecID" AS "XsecID",
xs."ReachID" AS "ReachID",
xs."Station" AS "Station",
xs."RiverCode" AS "RiverCode",
xs."ReachCode" AS "ReachCode",
ST_Intersection(xs.geom, riv.geom) AS "Fraction"
FROM
"<your project name>"."StreamCenterlines" AS riv,
"<your project name>"."XSCutLines" AS xs
WHERE
ST_Intersects(xs.geom, riv.geom)
ORDER BY xs."ReachID" ASC, xs."Station" DESC
Obviously replace <your project name> with the QGIS project name.
Also works for the BankLines step if you replace "StreamCenterlines" with "BankLines". Probably could be adapted to other situations where ST_Intersects() fails without a clue.
You'll get a listing with shorter geometry strings for good cross sections and double-length strings for bad ones. Probably need to widen your display column a lot to see this.
Works for me in pgAdmn4, or in QGIS3 -> Database -> DB Manager -> (click the wrench icon). You could select only bad lines, but I find the background info helpful.

Preserve everything count and get filtered results in t-sql?

I have created a complex sql server 2008/coldfusion search page, that searches thru a variety of tables.
On the left is a list of the categories, plus an everything category, by each category or type of result is a total number of results of that type found in the current search result.
I have everything fine, but I am hoping there is a more optimal approach.
Because everytime i filter the search to a specific category, i still have to get all the results, so as to make sure the everything category has the correct totals.
And because of this, I have realized this is a problem I've had in lots of other programs in coldfusion/sql.
Where you want to reduce the number of results by some field in the select, but you need to keep the original recordcount total.
But you really don't want to re-run the whole massive query everytime, when you just need to get the trimmed results.
This program is 1 cfc, 1 cfm, 1 stored procedure, and jquery/ajax inside the cfm to call the cfc.
The cfm calls the cfc when it originally get's a form submitted search request, and then any filtering does the same thing.
However if there are more than 20 results then it show's a button at the bottom to do via ajax get 20 more records.
My main goal is to improve performance, make sure i keep an accurate record of what the record count is before any filtering is done, without having to rerun the unfiltered query every time.
This is a kind of complex problem, so there might not be any answers...
Thank you all for trying..
I would run the "big" query once, then pop it into a SESSION variable. Then I'd use Query-of-Query to return subsets based on filters.
The main query always exists, so you can query against that or use metadata like bigQuery.recordCount. Your QofQ is a smaller set of data you can use for display. And you can re-apply filters without having to return to the database.
Well you need to run the query (or a count(*)) at least once to get the total number. You could:
Cache this query and refer to the
cached query's recordcount again
and again
Store the record count in the session scope until the next time it is run for this user