sphinxsearch merge indexes on different column names - sphinx

If i have indexA which returns id, doc_id, attrA1, attrA2 and indexB with id, group_id, attrB1, attrB2, then is it possible to merge indexer --merge indexA indexB, so that the merged index will return id, doc_id, attrA1, attrA2, attrB1, attrB2? assuming that doc_id and group_id are equal, just with different column names

No. Dont think indexer can do this directly.
The closest might be to directly edit the .sph files. Ie rename the attributes. eg could rename 'group_id' to 'doc_id' in the original, so will merge correctly.
Not sure what will the seperate attributes in only one or other index. I think only common attributes are preserved.

Related

How to Link 2 Sheets that have the same fields

I am looking for some help with trying to link 2 sheets that have a number of Filters that I have setup but are both sitting in separate tables. The reason this is because I have a number of aggregated columns that are different for the 2 tables and want to keep this separately as I will be building more sheets as I go along.
The filters that are the same within the 2 sheets are the following:
we_date
product
manager
patch
Through the data manager I managed to create an association between the 2 tables for we_date but from reading on this site and other searches on Google I can't make any associations between these tables and this is where I am stuck.
The 2 sheets will now allow me to filter using the we_date, but if I use the filters for product, manager or patch then nothing happens on my 2nd sheet as they are not linked.
Currently in my data load editor I have 2 sections of select queries like the following:
Table1
QUALIFY *;
w:
SELECT
*
FROM
table1
;
UNQUALIFY *;
Table2
QUALIFY *;
w_c:
SELECT
*
FROM
table2
;
UNQUALIFY *;
I would really appreciate if somebody could advise a fix on the issue I am having.
In Qlik, field names of identical values from different tables are automatically associated.
When you're calling Qualify *, you're actually renaming all field names and explicitly saying NOT to associate.
Take a look at the Qlik Sense documentation on Qualify *:
The automatic join between fields with the same name in different
tables can be suspended by means of the qualify statement, which
qualifies the field name with its table name. If qualified, the field
name(s) will be renamed when found in a table. The new name will be in
the form of tablename.fieldname. Tablename is equivalent to the label
of the current table, or, if no label exists, to the name appearing
after from in LOAD and SELECT statements.
We can use as to manually reassign field names.
SELECT customer_id, private_info as "private_info_1", favorite_dog from table1;
SELECT customer_id, private_info as "private_info_2", car from table2;
Or, we can correctly use Qualify. Example:
table1 and table2 have a customer_id field, and private_info field. We want customer_id field to be the associative value, and private_info to not be. We would use QUALIFY on private_info, which Qlik would then rename based on file name.
QUALIFY private_info;
SELECT * from table1;
SELECT * from table2;
The following field names would then be: customer_id (associated), and table1.private_info, and table2.private_info

Sphinx seems to force a Order on ID?

I added a new field to my index (weight) which is an integer based value I want to sort on.
I added it to the select and invoked it as sql_attr_uint
When I call it in my query it shows up. However when I try to sort on it i get strange behavior. It always sorts on the record ID instead. So Order on ID is identical to Order on Weight.
I've checked the Index pretty thoroughly and can't find a reason why, does sphinx auto-sort on record ID somehow?
I know the details are fairly sparse yet I'm hoping there is some basic explanation I'm missing before asking anyone to delve in further.
As an update: I don't believe the ID field sort has been "imposed" on the index in anyway inadvertently since I can Order by a # of other fields, both integer and text and the results are returned independent of the ID values (e.g sort on last name Record #100 Adams will come before Record #1 Wyatt)
Yet ordering on Weight always returns the same order as ordering by ID whether it is asc or desc. No error about the field or index not existing or being sortable, no ignoring the order request (desc and asc work) it just ignores that particular field value and uses the ID instead.
Further Update: The Weight value is indexed via a join to the main table indexed by sphinx in the following manner:
sql_attr_multi = uint value_Weight from ranged-query; \
SELECT j.id AS ID, IF(s.Weight > 0, 1, 0) AS Weight \
FROM Customers j \
INNER JOIN CustomerSources s ON j.customer_id = s.customer_id \
AND j.id BETWEEN $start AND $end \
ORDER BY s.id; \
SELECT MIN(id), MAX(id) FROM Customers
Once indexed sorting on both id and value_Weight return the same sort whereas Weight and ID are unrelated.
Ah yes, from
http://sphinxsearch.com/docs/current/mva.html
Filtering and group-by (but not sorting) on MVA attributes is supported.
Can't sort by a MVA attribute (which as noted in comments makes sense, as MVAs usually contain many values, sorting by many values is rather 'tricky'.
When you try, it simply fails. So sorting is falling back on the 'natural' order of the index, which is usually by ID.
Use sql_attr_unit instead
http://sphinxsearch.com/docs/current/conf-sql-attr-uint.html
(but will proabbly mean rewriting the sql_query to perform the JOIN on CustomerSources )

How to change row order using order & group in PgAdmin, sql?

I would like to re-order rows from two columns in an existing table without creating a new one. I have this script, that works, table name test.table:
SELECT value, variety
FROM test.table
group by value, variety
order by value, variety;
I have tried update and alter table, but I can not get it to work e.g:
update test.table
SELECT value, variety
FROM test.table
group by value, variety
order by value, variety;
How is this done?
I think you should have a look at this qustion and answer for using group by and orde by together.

Create an index for json_array_elements in PostgreSQL

I need to create an index from a query that uses json_array_elements()
SELECT *, json_array_elements(nested_json_as_text::json) as elements FROM my_table
Since the json contains multiple elements, the result is that the original index is now duplicated across rows and no longer unique.
I am not very familiar with creating indices and want to avoid doing anything destructive. What is the best way to create a column of unique integers for this case?
Found an answer:
SELECT *, json_array_elements(nested_json_as_text::json) as elements, row_number() over () as my_index FROM my_table

Detecting duplicate values in a column of a Datatable while traversing through It

I have a Datatable with Id(guid) and Name(string) columns. I traverse through the data table and run a validation criteria on the Name (say, It should contain only letters and numbers) and then adding the corresponding Id to a List If name passes the validation.
Something like below:-
List<Guid> validIds=new List<Guid>();
foreach(DataRow row in DataTable1.Rows)
{
if(IsValid(row["Name"])
{
validIds.Add((Guid)row["Id"]);
}
}
In addition to this validation I should also check If the name is not repeating in the whole datatable (even for the case-sensitiveness), If It is repeating, I should not add the corresponding Id in the List.
Things I am thinking/have thought about:-
1) I can have another List, check for the "Name" in the same, If It exists, will add the corresponding Guild
2) I cannot use HashSet as that would treat "Test" and "test" as different strings and not duplicates.
3) Take the DataTable to another one where I have the disctict names (this I havent tried and the code might be incorrect, please correct me whereever possible)
DataTable dataTableWithDistinctName = new DataTable();
dataTableWithDistinctName.CaseSensitive=true
CopiedDataTable=DataTable1.DefaultView.ToTable(true,"Name");
I would loop through the original datatable and check the existence of the "Name" in the CopiedDataTable, If It exists, I wont add the Id to the List.
Are there any better and optimum way to achieve the same? I need to always think of performance. Although there are many related questions in SO, I didnt find a problem similar to this. If you could point me to a question similar to this, It would be helpful.
EDIT :- The number of records might vary from 2000-3000.
Thanks
if you are looking to prevent duplicates, it may be grueling work, and I don't know how many records your dealing with at at atime... If a small set, I'd consider doing a query before each attempted insert from your LIVE source based on
select COUNT(*) as CountOnFile from ProductionTable where UPPER(name) = UPPER(name from live data).
If the result set CountOnFile > 0, don't add.
If you are dealing with a large dataset, like a bulk import, I would pull all the data into a temp table, then do a query where NOT IN... something like
create table OkToBeAdded as
select distinct upper( TempTable.Name ) as Name, GUID
from TempTable
where upper( TempTable.Name )
NOT IN ( select upper( LiveTable.Name )
from LiveTable
where upper( TempTable.Name ) = upper( LiveTable.Name )
);
insert into LiveTable ( Name, GUID )
select Name, GUID from OkToBeAdded;
Obviously, the SQL is sample and would need to be adjusted based on your specific back-end source
/* I did this entirely in SQL and avoided ADO.NET*/
/*I Pass the CSV of valid object Ids and split that in a table*/
DECLARE #TableTemp TABLE
(
TempId uniqueidentifier
)
INSERT INTO #TableTemp
SELECT cast(Data AS uniqueidentifier )AS ID FROM dbo.Split1(#ValidObjectIdsAsCSV,',')
/*Self join with table1 for any duplicate rows and update the column value*/
UPDATE Table1
SET IsValidated=1
FROM Table1 AS A INNER JOIN #TableTemp AS Temp
ON A.ID=Temp.TempId
WHERE NOT EXISTS (SELECT Name,Count(Name) FROM Table1
WHERE A.Name=B.Name
GROUP BY Name HAVING Count(Name)>1)