parsing of where clause in phoenix - apache-phoenix

I want to know the complete lifecycle of where clause condition in phoenix.
How does the value of where condition get parse from different classes and intermediate value .
Where is exactly the where clause logic and scan object reside in code
Executing command
select ID,NAME from "table_name"where salary >= 45678;
OR
select ID,SALARY from "table_name" where name like '%abcd%';
is their some kind of reference.
thanx in advance.

you can look for WhereCompiler.compile(...) in QueryCompiler.compileSingleFlatQuery() function
usually trace path for normal select query is:-
PhoenixStatement.executeQuery() -> ExecutableSelectStatement.compilePlan() ->
QueryCompiler.compile() -> compileSelect() -> compileSingleQuery()->
compileSingleFlatQuery()
Scan is set in a context and moved around from various compilers and Iterators to properly set. You can look for WhereCompiler code where filters are pushed in a scan.

Related

Cannot use Named Parameters with SSRS and PostgreSQL

I'm trying to add named parameters to a dataset query in a SSRS report (I'm using Report Builder), but I have had no luck discovering the correct syntax. I have tried #parameter, $1, $parameter and others, all without success. I suspect the syntax is just different for PostgreSQL versus normal SQL.
The only success I have had with passing parameters was based on this answer.
It involves using ? for every single parameter.
My query might look something like this:
SELECT address, code, remarks FROM table_1 WHERE date BETWEEN ? AND ? AND apt_num IS NULL AND ADDRESS = ?
This does work, but in the case of a query where I pass the same parameter to more than one part of the SELECT statement, I have to add the same parameter to the list multiple times as shown here. They are passed in this order, so adding a new parameter to an existing query results in having to reshuffle, and sometimes completely rebuild, the query parameters tab.
What are the proper syntax and naming requirements for adding named Parameters when using a PostgreSQL data source in SSRS?
From my comment, this is what it would look like with a regular join:
with inparms as (
select ? as from_date, ? as to_date, ? as address
)
select t.address, t.code, t.remarks
from inparms i
join table_1 t
on t.date between i.from_date and i.to_date
and t.apt_num is null
and t.address = i.address;
I said cross join in my comment because it is sometimes quicker when retrofitting somebody else's SQL instead of trying to untangle things (thinking of a friend who uses right join sometimes just to ruin my day).

Sort data within a subquery with another subquery?

I am trying to sort the OUN.note column by using the OUN.outcomeKey, since
the way it it is working right now is putting the notes in the wrong order (sorting alphabetically). Any idea on how to go about this? I've been trying to sort the data using another sub-query within, but I haven't had much luck (I don't have a plethora of experience).
Here's my current query:
SELECT DISTINCT OC.outcomeKey [Outcome Key], OC.outcome [Result],
STUFF((SELECT ','+' '+ OUN.note
FROM
Outcome AS OUT
JOIN OutcomeNote AS OUN
ON OUT.outcomeKey = OUN.outcomeKey
WHERE OUN.outcomeKey = OC.outcomeKey
GROUP BY OUN.note
FOR XML PATH ('')), 1, 1, '') [Outcome Note]
FROM Outcome AS OC
Any help or tips would be greatly appreciated! Also, please let me know if any more info is needed.
You may replace the line
GROUP BY OUN.note
with the line
ORDER BY OUN.outcomeKey
Also, because the concatenation starts with ', ', you may want to use 1, 2, '' as the additional arguments of the STUFF function. Otherwise, the values in your [Outcome note] column always start with a space.
Edit:
By the way, sorting the notes by outcomeKey in the subquery that generates the values for the [Outcome note] column has no effect... since all the notes in each subquery result will have the same outcomeKey value...
But you may sort on any column you want, of course. Perhaps there are other columns in your OutcomeNotes table that can serve as a useful sorting column of your outcome notes.
If I misunderstood your question, please provide definitions of the Outcome and OutcomeNote tables, together with a demo population of those tables and the desired/expected query result, please.
Edit 2:
Starting with SQL Server 2017, Transact-SQL contains a function called STRING_AGG, which seems to be functionally equivalent (more or less) to MySQL's GROUP_CONCAT function. Using this function, your query would become something like this:
SELECT
OUN.outcomeKey [Outcome Key],
OC.outcome [Result],
STRING_AGG(OUN.[Note], ', ') WITHIN GROUP (ORDER BY OUN.outcomeKey) [Outcome Note]
FROM
Outcome AS OC
JOIN OutcomeNote AS OUN ON OUN.outcomeKey = OC.outcomeKey
GROUP BY
OUN.outcomeKey,
OC.outcome;
When using SQL Server 2017 or SQL Azure, this might be a more fitting choice, since it does not only make the query more readable, but it also eliminates the use of (way less efficient) XML-functions in your query.
I too have used the XML-functionality for field concatenation (the way you use it) intensively in the past, but I noticed a considerable drop in performance of my queries (which sometimes contained up to 10 columns with concatenated data). Since then, I tend to go for recursive common table expressions or scalar UDF with recursion approaches in pre SQL Server 2017 environments.

Construct SQL where clause to pull data within Power Query

Basically I want to retrieve rows of data that meet my clause conditions using Power Query.
I got 400 rows of lookup values in my spreadsheet.
Each row represent 1 lookup code for example, code AAA1, AAB2 and so on
So lets say I have a select statement and I want to construct the where clauses using the above codes so my end sql statement will look like
select * from MyTable where Conditions in ('AA1', 'AAB2')
so so far I have this
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID",
Int64.Type}}),
test = Sql.Database("myserver", "myDB", [Query="SELECT * FROM myTable where" & #"Changed Type" & "])"
in
test
Obviously that didnt work but thats my pseduo scenario anyway.
Please could you advice what to do?
Thank you
Peddie
I would create a "lookup" Power Query based on the Excel table. I would set the "Load To" properties to "Only Create Connection".
Then I would start the main Query by connecting to the SQL server using the Navigator to select "MyTable". Then I would add a Merge step to the main Query, to join to the "lookup" Query, matching the "Conditions" column to the "lookup" code. I would set the Join Type to "Inner". The Merge properties window will show you visually if the 2 columns you select actually contain matching data.
This approach does not require any coding, and is easier to build, extend and maintain.
Mike Honey's join is best for your problem, but here's a more general solution if you find yourself needing other logic in your where clause.
Normally Power query only generates row filters on an equality expression, but you can put any code you want in a Table.SelectRows filter, like each List.Contains({"AA1", "AAB2"}, [Conditions])
So for your table, your query would look something like:
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Form ID", Int64.Type}}),
test = Sql.Database("myserver", "myDB"),
yourTable = test{[Name="myTable"]}[Data],
filtered = Table.SelectRows(yourTable, each List.Contains(#"Changed Type"[Form ID], [Conditions]))
in
filtered
The main downside to using the library functions is that Table.SelectRows only knows how to generate SQL where clauses for specific expression patterns, so the row filter probably runs on your machine after downloading the whole table, instead of having the Sql Server run the filter.

COUNT(field) returns correct amount of rows but full SELECT query returns zero rows

I have a UDF in my database which basically tries to get a station (e.g. bus/train) based on some input data (geographic/name/type). Inside this function i try to check if there are any rows matching the given values:
SELECT
COUNT(s.id)
INTO
firsttry
FROM
geographic.stations AS s
WHERE
ST_DWithin(s.the_geom,plocation,0.0017)
AND
s.name <-> pname < 0.8
AND
s.type ~ stype;
The firsttry variable now contains the value 1. If i use the following (slightly extended) SELECT statement i get no results:
RETURN query SELECT
s.id, s.name, s.type, s.the_geom,
similarity(
regexp_replace(s.name::text,'(Hauptbahnhof|Hbf)','Hbf'),
regexp_replace(pname::text,'(Hauptbahnhof|Hbf)','Hbf')
)::double precision AS sml,
st_distance(s.the_geom,plocation) As dist from geographic.stations AS s
WHERE ST_DWithin(s.the_geom,plocation,0.0017) and s.name <-> pname < 0.8
AND s.type ~ stype
ORDER BY dist asc,sml desc LIMIT 1;
the parameters are as follows:
stype = '^railway'
pname = 'Amsterdam Science Park'
plocation = ST_GeomFromEWKT('SRID=4326;POINT(4.9492530 52.3531670)')
the tuple i need to be returned is:
id name type geom (displayed as ST_AsText)
909658;"Amsterdam Sciencepark";"railway_station";"POINT(4.9482893 52.352904)"
The same UDF returns quite well for a lot of other stations, but this is one (of more) which just won't work. Any suggestions?
P.S. The use of the <-> operator is coming from the pg_trgm module.
Some ideas on how to troubleshoot this:
Break your troubleshooting into steps. Start with the simplest query possible. No aggregates, just joins and no filters. Then add filters. Then add order by, then add aggregates. Look at exactly where the change occurs.
Try reindexing the database.
One possibility that occurs to me based on this is that it could be a corrupted index used in the second query but not the first. I have seen corrupted indexes in the past and usually they throw errors but at least in theory they should be able to create a problem like this.
If this is correct, your query will suddenly return rows if you remove the ORDER BY clause.
If you have a corrupted index, then you need to pay close attention to hardware. Is the RAM ECC? Is the processor overheating? How are you disks doing?
A second possibility is that there is a typo on a join condition of filter statement. Normally this is something I would suspect first but it is easy enough to weed out index problems to start there. If removing the ORDER BY doesn't change things, then chances are it is a typo. If you can't find a typo, then try reindexing.

what's the utility of array type?

I'm totally newbie with postgresql but I have a good experience with mysql. I was reading the documentation and I've discovered that postgresql has an array type. I'm quite confused since I can't understand in which context this type can be useful within a rdbms. Why would I have to choose this type instead of using a classical one to many relationship?
Thanks in advance.
I've used them to make working with trees (such as comment threads) easier. You can store the path from the tree's root to a single node in an array, each number in the array is the branch number for that node. Then, you can do things like this:
SELECT id, content
FROM nodes
WHERE tree = X
ORDER BY path -- The array is here.
PostgreSQL will compare arrays element by element in the natural fashion so ORDER BY path will dump the tree in a sensible linear display order; then, you check the length of path to figure out a node's depth and that gives you the indentation to get the rendering right.
The above approach gets you from the database to the rendered page with one pass through the data.
PostgreSQL also has geometric types, simple key/value types, and supports the construction of various other composite types.
Usually it is better to use traditional association tables but there's nothing wrong with having more tools in your toolbox.
One SO user is using it for what appears to be machine-aided translation. The comments to a follow-up question might be helpful in understanding his approach.
I've been using them successfully to aggregate recursive tree references using triggers.
For instance, suppose you've a tree of categories, and you want to find products in any of categories (1,2,3) or any of their subcategories.
One way to do it is to use an ugly with recursive statement. Doing so will output a plan stuffed with merge/hash joins on entire tables and an occasional materialize.
with recursive categories as (
select id
from categories
where id in (1,2,3)
union all
...
)
select products.*
from products
join product2category on...
join categories on ...
group by products.id, ...
order by ... limit 10;
Another is to pre-aggregate the needed data:
categories (
id int,
parents int[] -- (array_agg(parent_id) from parents) || id
)
products (
id int,
categories int[] -- array_agg(category_id) from product2category
)
index on categories using gin (parents)
index on products using gin (categories)
select products.*
from products
where categories && array(
select id from categories where parents && array[1,2,3]
)
order by ... limit 10;
One issue with the above approach is that row estimates for the && operator are junk. (The selectivity is a stub function that has yet to be written, and results in something like 1/200 rows irrespective of the values in your aggregates.) Put another way, you may very well end up with an index scan where a seq scan would be correct.
To work around it, I increased the statistics on the gin-indexed column and I periodically look into pg_stats to extract more appropriate stats. When a cursory look at those stats reveal that using && for the specified values will return an incorrect plan, I rewrite applicable occurrences of && with arrayoverlap() (the latter has a stub selectivity of 1/3), e.g.:
select products.*
from products
where arrayoverlap(cat_id, array(
select id from categories where arrayoverlap(parents, array[1,2,3])
))
order by ... limit 10;
(The same goes for the <# operator...)