sphinxsearch Delta index updates - sphinx

I have a problem with Delta-index updates.
If the document id is less than the max_doc_id, is not included in the delta-index, so as long as main-index is not updated, the changes will not apply this data.
Suppose, we have 1000 data.
If fiftieth document is changed, there will be no changes in the delta-index.
How will delta-index include documents changes that their id is less than max_doc_id?
Is there a way that delta-index includes the data are updated so that we do not have to wait main-index run?
CREATE TABLE sph_counter
(
counter_id INTEGER PRIMARY KEY NOT NULL,
max_doc_id INTEGER NOT NULL
);
source main
{
# ...
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
sql_query = SELECT id, title, body FROM documents \
WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}
source delta : main
{
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, title, body FROM documents \
WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

A really simply way I like for this is just to add a timestamp column to automatically track changed documents.
Add a column...
ALTER TABLE documents
ADD updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX(updated);
The default is also important, so it newly created documents are also included.
Then can just use that in queries, with a kill list. The main will include include all documents at time of indexing. But the delta will include new and changed documents. The kill list means the old version in main, is ignored.
CREATE TABLE sph_counter
(
counter_id INTEGER PRIMARY KEY NOT NULL,
max_doc_id INTEGER NOT NULL,
indexing_time DATETIME NOT NULL
);
source main
{
# ...
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id), NOW() FROM documents
sql_query = SELECT id, title, body FROM documents
}
source delta : main
{
sql_query_pre = SET NAMES utf8
sql_query = SELECT id, title, body FROM documents \
WHERE updated > ( SELECT indexing_time FROM sph_counter WHERE counter_id=1 )
sql_query_killlist = SELECT id FROM documents \
WHERE updated > ( SELECT indexing_time FROM sph_counter WHERE counter_id=1 )
}
(as have the kill list, no point filtering the main, duplicates wont matter. Also dont neve need max_doc_id - so sph_counter could be simplified along with the sql_query_pre. In many way its a shame you have to repeat the query in the kill list. Can't just tell sphinx to use all the docs in the index as a kill list)

If you want to track document updates along with insertions, you should have a separate column for a document revision. Revision values should be unique across the document table, so it's a good idea to use global sequence to generate them.
When you update an existing document or insert a new one, you should take the next value from the revision sequence and save it in the document revision column. Sometimes it's a good idea to have DB triggers for automatic revision updates.
Then in sql_query_pre section you can save min and max revision values into sph_counter table and use them to create a proper delta index.

Related

In PostgreSQL how can an array column be filtered and searched for unique values only?

I'm trying to compare values in a history table populated by update trigger to see if certain columns in the old and new values of the JSON fields are equal and if all are equal then be able to create a case when query. Here's what I'm trying to do:
SQL
create table history
(
id serial not null,
ts timestamp default now(),
table_schema text,
table_name text,
operation text,
updated_by text default CURRENT_USER,
new json,
old json
);
With t AS (
select id,
old->>'field1' = new->>'field1' as isMatchField1,
old->>'field2' = new->>'field2' as isMatchField2,
old->>'field3' = new->>'field3' as isMatchField3,
old->>'field4' = new->>'field4' as isMatchField4
from history)
select id, array [isMatchField1, isMatchField2, isMatchField3, isMatchField4] from t
OUTPUT
1, {true, true, false, null}
How do filter out all nulls from the array and do a case when query to see if only true valuesexists. basically I want to do something like:
select id,
case when array field is only true and null then 'no changes made'
else
'changes made'
end as updated
from t

How to increment value in counter table

In my table I have the following scheme:
id - integer | date - text | name - text | count - integer
I want just to count some actions.
I want put 1 when date = '30-04-2019' not exist yet.
I want put +1 when is row already exist.
My idea is:
UPDATE "call" SET count = (1 + (SELECT count
FROM "call"
WHERE date = '30-04-2019'))
WHERE date = '30-04-2019'
But it is not working when row doesn't exist.
It is possible without some extra triggers, etc...
You can use a writeable CTE to achieve this. Additionally the UPDATE statement can be simplified to a simple set count = count + 1 there is no need for a sub-select.
with updated as (
update "call"
set count = count + 1
where date = '30-04-2019'
returning id
)
insert into "call" (date, count)
select '30-04-2019', 1
where not exists (select *
form updated);
If the update did not find a row, the where not exists condition will be true and the insert will be executed.
Note that the above is not safe for concurrent execution from multiple transactions. If you want to make this safe, create a unique index on the date column. Then use an INSERT ... ON CONFLICT instead:
insert into "call" (date, count)
values ('30-04-2019', 1)
on conflict (date)
do update
set count = "call".count + 1;
Again: the above requires a unique index (or constraint) on the date column.
Unrelated to the immediate problem, but: storing dates in a text column is a really, really bad idea. You should change your table definition and change the data type for the "date" column to date.

updatexml for particular rows only

Context: I want to increase the allowance value of some employees from £1875 to £7500, and update their balance to be £7500 minus whatever they have currently used.
My Update statement works for one employee at a time, but I need to update around 200 records, out of a table containing about 6000.
I am struggling to workout how to modify the below to update more than one record, but only the 200 records I need to update.
UPDATE employeeaccounts
SET xml = To_clob(Updatexml(Xmltype(xml),
'/EmployeeAccount/CurrentAllowance/text()',187500,
'/EmployeeAccount/AllowanceBalance/text()',
750000 - (SELECT Extractvalue(Xmltype(xml),
'/EmployeeAccount/AllowanceBalance',
'xmlns:ts=\"http://schemas.com/\", xmlns:xt=\"http://schemas.com\"'
)
FROM employeeaccounts
WHERE id = '123456')))
WHERE id = '123456'
Example of xml column (stored as clob) that I want to update. Table has column ID that hold PK of employees ID EG 123456
<EmployeeAccount>
<LastUpdated>2016-06-03T09:26:38+01:00</LastUpdated>
<MajorVersion>1</MajorVersion>
<MinorVersion>2</MinorVersion>
<EmployeeID>123456</EmployeeID>
<CurrencyID>GBP</CurrencyID>
<CurrentAllowance>187500</CurrentAllowance>
<AllowanceBalance>100000</AllowanceBalance>
<EarnedDiscount>0.0</EarnedDiscount>
<NormalDiscount>0.0</NormalDiscount>
<AccountCreditLimit>0</AccountCreditLimit>
<AccountBalance>0</AccountBalance>
</EmployeeAccount>
You don't need a subquery to get the old balance, you can use the value from the current row; which means you don't need to correlate that subquery and can just use an in() in the main statement:
UPDATE employeeaccounts
SET xml = To_clob(Updatexml(Xmltype(xml),
'/EmployeeAccount/CurrentAllowance/text()',187500,
'/EmployeeAccount/AllowanceBalance/text()',
750000 - Extractvalue(Xmltype(xml),
'/EmployeeAccount/AllowanceBalance',
'xmlns:ts=\"http://schemas.com/\", xmlns:xt=\"http://schemas.com\"')
))
WHERE id in (123456, 654321, ...);

SQL Server - How Do I Create Increments in a Query

First off, I'm using SQL Server 2008 R2
I am moving data from one source to another. In this particular case there is a field called SiteID. In the source it's not a required field, but in the destination it is. So it was my thought, when the SiteID from the source is NULL, to sort of create a SiteID "on the fly" during the query of the source data. Something like a combination of the state plus the first 8 characters of a description field plus a ten digit number incremented.
At first I thought it might be easy to use a combination of date/time + nanoseconds but it turns out that several records can be retrieved within a nanosecond leading to duplicate SiteIDs.
My second idea was to create a table that contained an identity field plus a function that would add a record to increment the identity field and then return it (the function would also delete all records where the identity field is less than the latest saving space). Unfortunately after I got it written, when trying to "CREATE" the function I got a notice that INSERTs are not allowed in functions.
I could (and did) convert it to a stored procedure, but stored procedures are not allowed in select queries.
So now I'm stuck.
Is there any way to accomplish what I'm trying to do?
This script may take time to execute depending on the data present in the table, so first execute on a small sample dataset.
DECLARE #TotalMissingSiteID INT = 0,
#Counter INT = 0,
#NewID BIGINT;
DECLARE #NewSiteIDs TABLE
(
SiteID BIGINT-- Check the datatype
);
SELECT #TotalMissingSiteID = COUNT(*)
FROM SourceTable
WHERE SiteID IS NULL;
WHILE(#Counter < #TotalMissingSiteID )
BEGIN
WHILE(1 = 1)
BEGIN
SELECT #NewID = RAND()* 1000000000000000;-- Add your formula to generate new SiteIDs here
-- To check if the generated SiteID is already present in the table
IF ( ISNULL(( SELECT 1
FROM SourceTable
WHERE SiteID = #NewID),0) = 0 )
BREAK;
END
INSERT INTO #NewSiteIDs (SiteID)
VALUES (#NewID);
SET #Counter = #Counter + 1;
END
INSERT INTO DestinationTable (SiteID)-- Add the extra columns here
SELECT ISNULL(MainTable.SiteID,NewIDs.SiteID) SiteID
FROM (
SELECT SiteID,-- Add the extra columns here
ROW_NUMBER() OVER(PARTITION BY SiteID
ORDER BY SiteID) SerialNumber
FROM SourceTable
) MainTable
LEFT JOIN ( SELECT SiteID,
ROW_NUMBER() OVER(ORDER BY SiteID) SerialNumber
FROM #NewSiteIDs
) NewIDs
ON MainTable.SiteID IS NULL
AND MainTable.SerialNumber = NewIDs.SerialNumber

Create view with fields from another table as column headers

I've got two tables that I'd like to combine into a view. The first table contains the structure:
Template Table
componentID | title
======================
1000 | blue
1001 | red
1002 | orange
The second table contains the actual data that will be stored, and the columns reference the ID of the first table:
Data Table
id | field1000 | field1001 | field1002
======================================
1 | navy | ruby | vermilion
2 | midnight | crimson | amber
What I'd like to get as a result in a view:
Combined Table/View?
id | blue | red | orange
=================================
1 | navy | ruby | vermilion
2 | midnight | crimson | amber
Is this possible? I've been trying to get it to work with pivot tables, but I'm getting hung up on how to use the titles as the columns for the data.
Ok, I went a bit overboard with this one but this will do what you want. This procedure will combine all fields with the proper data table columns, and does not need to know nor care how many columns there are in the data tables.
It does not use cursors, but due to the possibility of many template tables, it does use Dynamic SQL to generate the Select statement for the final return.
Only caveat is it's not a View, it's a stored procedure, because it allows to pass the variable for the data table you want to ultimately select from.
The assumptions:
The template table is static
There is one template table for all data tables
All fields in any data tables must be unique *
All data tables have a PK/identity field with the word 'id' in it that must be ignored
All fields in the data tables have a corresponding title in the template table
All fields in the data table are prefixed with the word 'field' and all of the reference ID's in the template table correspond to those field names with 'field' removed, based on your example
*- It can of course be improved by modifying the template table schema to also have a field for the data table that the field title belongs to, for example, which would remove this assumption #3.
The process:
First we need a mapping of the field names, reference IDs, and column titles. We do this with a table variable and get our info from syscolumns. Then, we update our temp table to get the titles from the TemplateTable table.
Then, we need to build a dynamic Select list from the DataTable (which is a parameter in the SP and therefore requires some dynamic SQL to execute). My preferred method of doing this is by having a bit column in my source table that I can update, something like 'IsCompleted', and then using a regular While loop to get through each row. Inside the While loop, all we do is grab the current "TitleReference" from our temporary table variable, and append to the select list the real field name from syscolumns (from first step above).
Finally, we execute the dynamic SQL statement which has a Select, and when this is inside a stored procedure that is executed, the result is returned as the result of the stored procedure.
The Full Working Code
Create Procedure usp_CombineTables
(
#DataTableName varchar(50)
)
As
-- Test
-- Exec usp_CombineTables 'DataTable'
-- Set up our variables
Declare #DataTableIdFieldName varchar(50), -- The ID field of the data table, dynamic
#IsCompleted bit, -- Used by While loop to know when to exit
#CurrentTitleReference int, -- Used in While loop as the ID from TemplateTable that relates to the real data field name and the desired title
#CurrentDataFieldName varchar(50), -- Used in While loop for the current actual field name in the data table
#CurrentTitle varchar(50), -- Used in While loop for the desired field name in the resulting table of the stored proc
#DynamicSelectQuery varchar(2000) -- Stores the SQL query that is dynamically built and executed for the final result; can increase value if needed
-- Use table variable to correlate the datatable columns, titles, and references
Declare #TitleReferences Table (
TitleReference int,
DataTableColumnName varchar(50),
Title varchar(50),
Completed bit default 0
)
-- Get the info from syscolumns about our datatable; assumes that all of the field names are prefixed with the word 'field' which needs to be removed
Insert Into #TitleReferences (
TitleReference,
DataTableColumnName
)
Select
Replace(name, 'field', '') As TitleReference,
name As DataTableColumnName
From syscolumns
Where id = OBJECT_ID(#DataTableName)
And name Not Like '%id%' -- assumes DataTable will always have a PK with 'id' in it, need to ignore/remove
-- Get the titles -- assumes only one template table for all data tables; all data fields accross tables must be unique
Update #TitleReferences
Set Title = t.Title From TemplateTable As t
Where TitleReference = t.ComponentID
-- Get the ID field of the data table
Set #DataTableIdFieldName = (
Select name From syscolumns
Where id = OBJECT_ID(#DataTableName)
And name Like '%id%')
-- Build a dynamic SQL query to select from the datatable with the right column names
Set #DynamicSelectQuery = 'Select ' + #DataTableIdFieldName + ', ' -- start with the ID
Set #IsCompleted = 0
While (#IsCompleted = 0)
Begin
-- Retrieve the field name and title from the current row based on title reference
Set #CurrentTitleReference = (Select Top 1 TitleReference From #TitleReferences Where Completed = 0)
Set #CurrentDataFieldName = (Select DataTableColumnName From #TitleReferences Where TitleReference = #CurrentTitleReference)
Set #CurrentTitle = (Select Title From #TitleReferences Where TitleReference = #CurrentTitleReference)
-- Append the next select field to the dynamic query
Set #DynamicSelectQuery = #DynamicSelectQuery +
#CurrentDataFieldName + ' As ' + QuoteName(#CurrentTitle)
-- Set up to move past current record in next iteration
Update #TitleReferences Set Completed = 1 Where TitleReference = #CurrentTitleReference
-- Exit loop or add comma for next field
If (Select Count(Completed) From #TitleReferences Where Completed = 0) = 0
Begin
Set #IsCompleted = 1
End
Else
Begin
-- Add comma to select field for next column
Set #DynamicSelectQuery = #DynamicSelectQuery + ','
End
End
-- Now the column list is built, just add the table and exec
Set #DynamicSelectQuery = #DynamicSelectQuery +
' From ' + #DataTableName
Exec(#DynamicSelectQuery)
The Result
Hope this helps, it was fun writing it!
something on these lines
DECLARE #f0 VARCHAR(50)=(SELECT title FROM template WHERE componentID=1000)
DECLARE #f1 VARCHAR(50)=(SELECT title FROM template WHERE componentID=1001)
DECLARE #f2 VARCHAR(50)=(SELECT title FROM template WHERE componentID=1002)
#sql='SELECT field1000 AS ' + quotename(#f0) + ' field1001 AS ' + quotename(#f1) + ' field1002 AS ' + quotename(#f2) + ' FROM data'
exec sp_executesql #sql