Precedence/weight to a column using FREETEXTTABLE in dymnamic TSQL - tsql

I have dynamic sql that perform paging and a full text search using CONTAINSTABLE which works fine. Problem is I would like to use FREETEXTTABLE but weight the rank of some colums over others
Here is my orginal sql and the ranking weight I would like to integrate
(I have changed names for privacy reasons)
SELECT * FROM
(SELECT TOP 10 Things.ID, ROW_NUMBER()
OVER(ORDER BY KEY_TBL.RANK DESC ) AS Row FROM [Things]
INNER JOIN
CONTAINSTABLE([Things],(Features,Description,Address),
'ISABOUT("cow" weight (.9), "cow" weight(.1))') AS KEY_TBL
ON [Properties].ID = KEY_TBL.[KEY]
WHERE TypeID IN (91, 48, 49, 50, 51, 52, 53)
AND
dbo.FN_CalcDistanceBetweenLocations(51.89249, -8.493376,
Latitude, Longitude) <= 2.5
ORDER BY KEY_TBL.RANK DESC ) x
WHERE x.Row BETWEEN 1 AND 10
Here is what I would like to integrate
select sum(rnk) as weightRankfrom
From
(select
Rank * 2.0 as rnk,
[key]
from freetexttable(Things,Address,'cow')
union all
select
Rank * 1.0 as rnk,
[key]
from freetexttable(Things,(Description,Features),'cow')) as t
group by [key]
order by weightRankfrom desc

Unfortunately, the algorithm used by the freetext engine (FREETEXTTABLE) has no way to specify the significance of the various input columns. If this is critical, you may need to consider using a different product for your freetext needs.

You can create a column with the concatenation of:
Less_important_field &
More_important_field & More_important_field (2x)
This might look really stupid, but it's actually what BM25F does to simulate structured documents. The only downside of this hack-implementation is that you can't actually dynamically change the weight. It bloats up the table a bit, but not necessarily the index, which should only need counts.

Related

Finding x rows with distinct value in col1 but same value in col2

Using postgres 11 I have a data table int_state that looks like this:
dev
int
sw01
gi1
sw01
gi2
sw01
gi3
sw01
gi4
sw02
gi3
sw02
gi4
sw02
gi5
sw03
gi3
sw03
gi4
sw03
gi6
sw03
gi7
I need a query that will return a single row for every instance of dev where int is the same across every instance of dev. Using the sample above query should only return "gi3" as the value for int since that value is shared between all instances of dev and result is limited is a single row for every dev. Desired results from sample data above:
dev
int
sw01
gi3
sw02
gi3
sw03
gi3
The number of dev instances (x) to match against will vary, but results should always contain x rows, with a row for every dev asked for in the query. If there are multiple matches against int the query should only return the first one found.
Using a JOIN doesn't seem like it will work for me, since I need x rows of results, not a single row with all results combined. I have been playing around with CTEs and UNION but so far unable to get it working. I thought nesting CTEs would help, but apparently that's not supported in postgres.
I plan on constructing this query on-demand in code ahead of time, so if I have to construct something that contains where dev in ('sw01','sw02','x','y','z') that's okay.
WITH distinctdevs AS (SELECT DISTINCT dev FROM int_state)
SELECT distinctdevs.dev, (SELECT int FROM int_state ints WHERE NOT EXISTS
(SELECT dev FROM distinctdevs WHERE NOT EXISTS (
SELECT * FROM int_state WHERE
int_state.int = ints.int AND int_state.dev = distinctdevs.dev
)
) LIMIT 1) FROM distinctdevs;
You might actually get better performance if you don't reuse the CTE on line 3, as that might trick the optimizer into thinking this is a correlated subquery rather than a constant. The core trick here is the doubly nested WHERE NOT EXISTS, converting "an int that appears for every dev" to the logically equivalent "an int for which there is no dev for which it does not appear."
ok, so if I understand your question, you want to find all the instances of dev (01, 02, 03, etc), and then check that the value for int is the same for all the instances?
You can do a clever workaround I think....
---- get the count of how many instances of dev there are...
WITH distinct_dev AS (
SELECT
dev
, count(*) AS dev_count
FROM
table
GROUP BY
1
)
----- get the count of how many instances of int there are per dev/int
, dev_int AS (
SELECT
t.dev
, t.int
, COUNT(DISTINCT t2.int) as count_dev_int
FROM
original_table t
LEFT JOIN
original_table t2
ON t.dev = t2.dev
AND t.int = t2.int
GROUP BY
1, 2
)
----- then from the first CTE select unique devs, and join the 2nd CTE where the dev envs = the count of int per dev env....
SELECT
dd.dev
, di.int
FROM
distinct_dev dd
JOIN
dev_int di
ON di.dev = dd.dev
AND di.count_dev_int = dd.count_dev
You can build CTE with an array of distinct dev values and another with array of dev values for each int using array_agg. Then JOIN then on matching arrays. (see demo)
with dev_array as
( select array_agg(distinct dev) dev
from data_tab
)
, int_match as
(select int, array_agg(dev) dev
from data_tab
group by int
)
select int, unnest(dev) dev
from (select im.*
from dev_array da
join int_match im
on (im.dev = da.dev)
order by im.int
limit 1
) sq;

Postgres query for report

I'm trying to solve this problem:
I have a query/view that will join ~10 tables to extract some fields for a report (if any). The query doesn't use any grouping function, only joins and cut off some unuseful data.
I have to take this one big view, get the group for the first index, take the max of a date in the second column and take all the information from other fields referring the record of the max value.
I cannot be able to to this in postgres.
As a pseudo code I can give this:
select 1
, max(2)
, 3 referred to the record from max(2)
, 4 referred to the record from max(2)
, ...
, 20 referred to the record from max(2)
from (ViewWithAllJoins) a
group by 1
For privacy and business problem I had to obfuscate some informations, 1/2/3/4... are the name of the column from the view "ViewWithAllJoins", I hope that the problem is still understandable and resolvable!
I've tryied with WINDOW command as reported in Convert keep dense_rank from Oracle query into postgres but I cannot be able to use the group by that I need. Other tryes that I've done was about the dense_rank like shown in Dense_rank first Oracle to Postgresql convert but I can't do any assumption on the order of the data in any of the other fields in exception of 1 and 2, so I can't use any of the aggregate function on them.
Any ideas? Possibly without adding too much subqueryes.
Thank you!
EDIT:
As suggested I'll add some synthetic data to better understand the problem and what I want.
Start:
ID DATE COLUMN1 COLUMN2 COLUMN3
=====================================================================
88888888;"2016-04-02 09:00:00";"aaaaaaaaaaa";"TEXT89" ; 999999999
88888888;"2018-08-21 09:00:00";"a" ;"TEXT1" ; 988888888
88888888;"2017-11-09 09:00:00";"zzzz" ;"TEXT80000" ; 850580582
75858585;"2017-01-31 09:00:00";"~~~~~~~~~~~";"TEXT10" ; 101010101
75858585;"2018-04-02 09:00:00";"eeeeeeeeeee";"TEXT1000" ; 111111111
99999999;"2016-04-02 09:00:00";"8d2ecafd866";"TEXT808911"; 777777777
What I want:
ID DATE COLUMN1 COLUMN2 COLUMN3
===================================================================
88888888;"2018-08-21 09:00:00";"a" ;"TEXT1" ; 988888888
75858585;"2018-04-02 09:00:00";"eeeeeeeeeee";"TEXT1000" ; 111111111
99999999;"2016-04-02 09:00:00";"8d2ecafd866";"TEXT808911"; 777777777
So the group by id, the max of the date and the other fields related to the row of the max date.
-- So you have duplicate records per ID, and for every ID you want to select the record with the most recent date ?
Use NOT EXISTS:
SELECT id,zdate,column1,column2,column3 -- , ...
FROM queryview t
WHERE NOT EXISTS (
SELECT *
FROM queryview x
WHERE x.id=t.id
AND x.zdate > t.zdate
);
Or, use row_number() over a window, and pick only the row with the final date:
SELECT id,zdate,column1,column2,column3 -- , ...
FROM ( SELECT *
, row_number() OVER(PARTITION BY id, ORDER BY zdate DESC) AS rn
FROM queryview
) q
WHERE q.rn = 1
;

How to create two virtual columns by different conditions from repeating row data using SQL?

My database table has three columns: [Date, Client, Value][1].
I want a new table with only three clients in it - "Technics", "Metal Inc", "Sunny Day" and two virtual total columns - "Price" / August, and "Price" / 2017.
What I tried so far and what I get: Click to see
Why does the "For August" total SUM goes down the next row, but not in a new column?
After all, my code says: SELECT SUM(Value) AS 'For August'.
Any ideas?
Union will give you rows instead of columns, for this case maybe you can use subqueries, try something like this
SELECT
t1.Client, SUM(t1.Value) AS 'For 2017', (SELECT SUM(t2.Value)FROM Test GROUP
BY t2.Client)AS 'For August'
FROM Test AS t1
JOIN Test AS t2
ON t1.Client = t2.Client
WHERE
t2.Date LIKE '%2017-08%'AND
(t2.CLient LIKE '%Technics%' OR t2.CLient LIKE
'%Metal Inc%' OR t2.CLient LIKE '%Sunny Day%')
AND
(t1.CLient LIKE '%Technics%'
OR t1.CLient LIKE '%Metal Inc%'
OR t1.CLient LIKE '%Sunny Day%')
GROUP BY t1.Client

Greatest N per group in Open SQL

Selecting the rows from a table by (partial) key with the maximum value in a particular column is a common task in SQL. This question has some excellent answers that cover a variety of approaches to it. Unfortunately I'm struggling to replicate this in my ABAP program.
None of the commonly used approaches seem to be supported:
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons: SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
After trying each of these solutions in turn only to discover that ABAP doesn't seem to support them and being unable to find any equivalents I'm starting to think that I'll have no choice but to dump the data of the subquery to an itab.
What is the best practice for this common programming requirement in ABAP development?
First of all, specific requirement, would give you a better answer. As it happens I bumped into this question when working on a program, that uses 3 distinct methods of pseudo-grouping, (while looking for alternatives) and ALL 3 can be used to answer your question, depending on what exactly you need to do. I'm sure there are more ways to do it.
For instance, you can pull maximum values within a group by simply selecting max( your_field ) and grouping by some fields, if that's all you need.
select bname, nation, max( date_from ) from adrp group by bname, nation. "selects highest "from" date for each bname
If you need to use that max value as a filter condition within a query, you can do it by performing pseudo-grouping using sub-query and max within sub-query like this (notice how I move out the BNAME check into sub query, which means I don't have to check both fields using in (subquery) addition):
select ... from adrp as b_adrp "Pulls the latest person info for a user (some conditions are missing, but this is a part of an actual query)
where b_adrp~date_from in (
select max( date_from ) "Highest date_from where both dates are valid
from adrp where persnumber = b_adrp~persnumber and nation = b_adrp~nation and date_from <= #sy-datum )
The query above allows you to select selects all user info from base query and (where the first one only allows to take aggregated and grouped data).
Finally, If you need to check based on composite key and compare it to multiple agregate function results, the implementation will heavily depend on specifics of your requirement (and since your question has none, I'll provide a generic one). Easiest option is to use exists / not exists instead of in (subquery), in exact same way and form the subquery to check for existance of specific key or condition rather than pull a list ( you can nest subqueries if you have to ):
select * from bkpf where exists ( select 1 from bkpf as b where belnr = bkpf~belnr and gjahr = bkpf~gjahr group by belnr, gjahr having max( budat ) = bkpf~budat ) "Took an available example, that I had in testing program.
All 3 queries will get you max value of a column within a group and in fact, all 3 can use joins to achieve identical results.
please find my answers below your questions.
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Putting the subquery in your where condition should do the work SELECT * FROM X AS x INNER JOIN Y AS y ON x~a = y~b WHERE ( SELECT * FROM y WHERE ... )
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
You have to split your WHERE clause: SELECT * FROM X WHERE key1 IN ( SELECT key1 FROM y ) AND key2 IN ( SELECT key2 FROM y )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons.
Yes, thats right at the moment.
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons:
SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
This is not true. This SELECT is perfectly valid:
SELECT b1~budat
INTO TABLE lt_bkpf
FROM bkpf AS b1
LEFT JOIN bkpf AS b2
ON b2~belnr < b1~belnr
WHERE b1~bukrs <> ''.
And was valid at least since 7.40 SP08, since July 2013, so at the time you asked this question it was valid as well.

T-SQL Query to process data in batches without breaking groups

I am using SQL 2008 and trying to process the data I have in a table in batches, however, there is a catch. The data is broken into groups and, as I do my processing, I have to make sure that a group will always be contained within a batch or, in other words, that the group will never be split across different batches. It's assumed that the batch size will always be much larger than the group size. Here is the setup to illustrate what I mean (the code is using Jeff Moden's data generation logic: http://www.sqlservercentral.com/articles/Data+Generation/87901)
DECLARE #NumberOfRows INT = 1000,
#StartValue INT = 1,
#EndValue INT = 500,
#Range INT
SET #Range = #EndValue - #StartValue + 1
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable;
SELECT TOP (#NumberOfRows)
GroupID = ABS(CHECKSUM(NEWID())) % #Range + #StartValue
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
This will create a table with about 435 groups of records containing between 1 and 7 records in each. Now, let's say I want to process these records in batches of 100 records per batch. How can I make sure that my GroupID's don't get split between different batches? I am fine if each batch is not exactly 100 records, it could be a little more or a little less.
I appreciate any suggestions!
This will result in slightly smaller batches than 100 entries, it'll remove all groups that aren't entirely in the selection;
WITH cte AS (SELECT TOP 100 * FROM (
SELECT GroupID, ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY GroupID) r
FROM #SomeTestTable) a
ORDER BY GroupID, r DESC)
SELECT c1.GroupID FROM cte c1
JOIN cte c2
ON c1.GroupID = c2.GroupID
AND c2.r = 1
It'll select the groups with the lowest GroupID's, limited to 100 entries into a common table expression along with the row number, then it'll use the row number to throw away any groups that aren't entirely in the selection (row number 1 needs to be in the selection for the group to be, since the row number is ordered descending before cutting with TOP).