Multi-Select Query Through PHRETS RETS System - forms

I've got a system running RETS through the PHRETS system. I have a form, that runs through a query to pull out results, and we're adding in multi-select boxes.
So far, my code looks like this for the query: (SUB_AREA_NAME=|AreaA,AreaB,AreaC,AreaD)
This works for allowing many results to come up. Problem is this:
For some reason, the system is doing a 'and' operation instead of an 'or' operation. So anytime we search up more then one place, if any of the results come up empty, they will all come up empty.
For example:
Lets say AreaA has 3 houses. AreaB has 0 houses, AreaC has 10 houses, and AreaD has 1 house.
If you look up:
AreaA + AreaC you will get 13 results.
AreaA + AreaC + AreaD you will get 14 results.
AreaD by itself you will get 1 result.
AreaA + AreaB you will get 0 results.
AreaA + AreaB + AreaC + AreaD you will get 0 results.
Basically, because AreaB has no results, if you query that area with any other area that does have results, it will still come up as no results.
I need to know how to query multiple selections from one category, while showing all the results even if one area doesn't have any.
Thanks.

Some (most) RETS server implementations are not done correctly. Your query is right according to RETS specs. You just need to find out what will work for your particular situation.
For example, you could try ((SUB_AREA_NAME=AreaA)|(SUB_AREA_NAME=AreaB)|(SUB_AREA_NAME=AreaC)|(SUB_AREA_NAME=AreaD)) and see if that works.
In some cases I've seen this work, notice I removed the pipe even though that is the OR conjunction, (SUB_AREA_NAME=AreaA,AreaB,AreaC,AreaD)
Other times it won't work with the commas and you need to use 4 seperate queries.
And even other times I have see the server foul up and not encode the commas properly so you have to do something like this (SUB_AREA_NAME=|AreaA%2CAreaB%2CAreaC%2CAreaD)

Related

Overpass query. Absence of [maxsize] returns significantly smaller results

I have two overpass queries.
node(33.68336,-117.89466,34.14946,-117.03498);
way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service"](bn);
(._;>;);
out;
The query above returns an osm.xml file that is 167.306 kb big.
[out:xml][maxsize:2000000000];
(
node(33.68336,-117.89466,34.14946,-117.03498);
way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|seconda ry_link|tertiary|tertiary_link|road|residential|service"](bn);
(._;>;);
);
out;
The second query returns a file that is 618.994 kb big. Why does the second query return a significantly bigger result? Does the first query not give me the full dataset? Is there a way to get the same result with both queries? (The absence of [maxsize] sometimes leads to an error…)
I feel that there is something missing about your query:
node(33.68336,-117.89466,34.14946,-117.03498); should return all the nodes in this area,which is a lot of data.
then the second line:
way"highway"~“motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service”;
gives an error, as it should be written with brackets and straight quotes as so:
way["highway"~"motorway|motorway_link|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|road|residential|service"];
but this looks for all the roads in the world, and your first query is not used any more, as your output is only the second query. But that is a huge amount of data, probably in the GB range.
So I don't see how you would get only 167 kB. I assume you must a bounding box or some other filter that you did not mention.
But in your second example, you make an union of the two queries, as you put them in brackets:
(... ; ...;); out; so you would get all the nodes in the area and all the roads in the world. And again, if you have an extra bounding box or filter, you might get only 619 kB. Supposing that there are a lot of non-road nodes, it makes sense that you get more data, as you get the union of the two searches (all nodes + nodes from the roads)

Sphinx Mulit-Level Sort with Randomize

Here is my challenge with Sphinx Sort where I have Vendors who pay for premium placement and those who don't:
I already do a multi-level order including the PaidVendorStatus which is either 0 or 1 as:
order by PaidVendorStatus,Weight()
So in essence I end up with multiple sort groups:
PaidVendorStatus=1, Weight1
....
PaidVendorStatus=1, WeightN
PaidVendorStatus=0, Weight1
...
PaidVendorStatus=0, WeightN
The problem is I have three goals:
Randomly prioritize each vendor in any given sort group
Have each vendor's 'odds' of being randomly assigned top position be equal regardless of how many records they have returned in the group (so if Vendor A has 50 results and VendorB has 2 results they still both have 50% odds of being randomly assigned any given spot)
Ideally, maintain the same results order in any given search (so that if the user searches again the same order will be displayed
I've tried various solutions:
Select CRC32(Vendor) as RANDOM...Order by PaidVendorStatus,Weight(),RANDOM
which solves 2 and 3 except due to the nature of CRC32 ALWAYS puts the same vendor first (and second, third, etc.) so in essence does not solve the issue at all.
I tried making a sphinx sql_attr_string in my Sphinx Configuration which was a concatenation of Vendor and the record Title (Select... concat(Vendor,Title) as RANDOMIZER..)` and then used that to randomize
Select CRC32(RANDOMIZER) as RANDOM...
which solves 1 and 3 as now the Title field gets thrown in the randomization mis so that the same Vendor does not always get first billing. However, it fails at 2 since in essence I am only sorting by Title and thus Vendor B with two results now has a very low change of being sorted first.
In an ideal world naturally I could just order this way;
Order by PaidVendorStatus,Weight(),RAND(Vendor)
but that is not possible.
Any thoughts on this appreciated. I did btw check out as per Barry Hunter's suggestion this thread on UDF but unless I am not understanding it at all (possible) it does not seem to be the solution for this problem.
Well one idea is:
SELECT * FROM (
SELECT *,uniqueserial(vendor_id) AS sorter FROM index WHERE MATCH(...)
ORDER BY PaidVendorStatus DESC ,Weight() DESC LIMIT 1000
) ORDER BY sorter DESC, WEIGHT() DESC:
This exploits SPhixnes 'multiple sort' function with pysudeo subquery.
This works wors becasuse the inner query is sorted by PaidVendor first, so their items are fist. Which works to affect the ordr that unqique serial is called in.
Its NOT really 'randomising' the results as such, seems you jsut randomising them to mix up the vendors (so a single vendor doesnt domninate results. Uniqueserial works by 'spreading' the particular vendors results out - the results will tend to cycle through the vendors.
This is tricky as it exploits a relative undocumented sphinx feature - subqueries.
For the UDF see http://svn.geograph.org.uk/svn/modules/trunk/sphinx/
Still dont have an answer for your biased random (as in 2.)
but just remembered another feature taht can help with 3. - can supply s specific seed to the random number. Typically random generators are seeded from the current time, which gives ever changing values, But using a specific seed.
Seed is however a number, so need a predictable, but changing number. Could CRC the query?
... sphinx doesnt support expressions in the OPTION so would have to caculate the hash in the app.
<?php
$query = $db->Quote($_GET['q']);
$crc = crc32($query);
$sql = "SELECT id,IDIV(WEIGHT(),100) as i,RAND() as r FROM index WHERE MATCH($query)
ORDER BY PaidVendorStatus DESC,i DESC,r ASC OPTION random_seed=$crc";
If wanted the results to only slowly evolve, add the current date, so each day is a new selection...
$crc = crc32($query.date('Ymd'));

Gremlin query to find the count of a label for all the nodes

Sample query
The following query returns me the count of a label say
"Asset " for a particular id (0) has >>>
g.V().hasId(0).repeat(out()).emit().hasLabel('Asset').count()
But I need to find the count for all the nodes that are present in the graph with a condition as above.
I am able to do it individually but my requirement is to get the count for all the nodes that has that label say 'Asset'.
So I am expecting some thing like
{ v[0]:2
{v[1]:1}
{v[2]:1}
}
where v[1] and v[2] has a node under them with a label say "Asset" respectively, making the overall count v[0] =2 .
There's a few ways you could do it. It's maybe a little weird, but you could use group()
g.V().
group().
by().
by(repeat(out()).emit().hasLabel('Asset').count())
or you could do it with select() and then you don't build a big Map in memory:
g.V().as('v').
map(repeat(out()).emit().hasLabel('Asset').count()).as('count').
select('v','count')
if you want to maintain hierarchy you could use tree():
g.V(0).
repeat(out()).emit().
tree().
by(project('v','count').
by().
by(repeat(out()).emit().hasLabel('Asset')).select(values))
Basically you get a tree from vertex 0 and then apply a project() over that to build that structure per vertex in the tree. I had a different way to do it using union but I found a possible bug and had to come up with a different method (actually Gremlin Guru, Daniel Kuppitz, came up with the above approach). I think the use of project is more natural and readable so definitely the better way. Of course as Mr. Kuppitz pointed out, with project you create an unnecessary Map (which you just get rid of with select(values)). The use of union would be better in that sense.

Combine data from several queries

We are looking into a more powerful way of collecting and processing data to be processed in our reports. For one advanced report on a big database, we need to run two indepedent SQL queries (on the same data source) and combine them afterwards.
Query1 returns:
user id#1 ... 3 columns
user id#2 ... 3 columns
user id#4 ... 3 columns
Query 2 returns:
user id#1 ... 5 columns
user id#3 .. 5 columns
user id#4 ... 5 columns
What we want to show:
user id#1 ... 3 columns + 5 columns
user id#2 ... 3 columns
user id#3 ... 5 columns
user id#4 ... 3 columns + 5 columns
Although it's counter-intuitive, we found that combining the results from both queries in SQL leads to considerably worse runtime of the SQL query.
We have looked at subdatasets, but from my understanding it's not possible to mix the data from two subdatasets (or the main data+one subdataset) in a single table.
We have looked at subreports, but from my understanding a subreport will call the query once for each row in the report, if I put the subreport in the Details area as we intend to. But for performance reasons we want to run the two queries that we prepared, and each only once.
We think the most reasonnable approach is for us to write such advanced reports in Java, and it's possible, however the JavaBean data source cannot access the report parameters. Our database is huge and therefore we can't just make queries without where and filter afterwards, the Java code needs access to the report parameters.
We are currently looking into implementing JRQueryExecutor as recommended there and there (last comment), or even taking advantage of scriptlets.
But it sounds really quite advanced and we are wondering are we thinking the wrong way or heading in the wrong direction? And if JRQueryExecutor is the correct way any example or documentation would be welcome.
We are also considering trying to refactor our SQL to achieve the result with only one query, but we do feel that the reporting system ought to allow us to manipulate the data also in Java.
In the end we made it with a scriptlet. In afterReportInit, inheriting JRDefaultScriptlet you get the parameters and the data source from parametersMap, and you can then fill in the data source from Java.

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.