PostgreSQL FTS text should be combined into one line - postgresql

I have a table that has columns with title and content.
CREATE TABLE t_items (
item_id varchar PRIMARY KEY,
title varchar,
content varchar);
insert into t_items values ('111', 'Madrid title n1', 'The Madrid papers Open revelled in Reals win, with AS describing Chelsea as shipwrecked having been completely unsettled by Mendys error for');
insert into t_items values ('222', 'test title 22', 'Así fue el Mutua Madrid Open 2019, un torneo inolvidable. En la última edición, Novak Djokovic y Kiki Bertens se proclamaron campeones, David Ferrer se');
insert into t_items values ('333', 'Madrid title 33', 'Real Madrid play at Stamford Bridge for just the second time ever, and the first time with fans in the stands. Its going to be epic');
insert into t_items values ('444', 'test title 44', 'You can change the date of your tickets because of Covid-19 without any problems. A little bit of action · Zoo Aquarium de Madrid | Disfruta de tus animales');
insert into t_items values ('555', 'test title 554', 'In a former electrical substation, Kunsthalle Praha is sparking new energy in the Czech capital with inaugural show 100 year');
insert into t_items values ('661', 'test title 4554', 'Bistro Praha is the place to go if you want a warm atmosphere, authentic goulash, cabbage soup, wiener schnitzel or a range of European style dishes.');
I need to make a query that will return a prepared response for later full text search. But all the results of the found text should be combined into one line. Also, all found document IDs should be combined into one array.
SELECT to_tsvector('english', t_items.content || t_items.title),
array_agg(t_items.item_id) as id_array
FROM t_items
WHERE
to_tsvector(title) || to_tsvector(content)
## plainto_tsquery('Madrid & Open')
GROUP BY t_items.content, t_items.title;
My query incorrectly returns scattered rows.
-----------+----------
'2019':7 '22':27 'así':1 | {222}
'chelsea':12 'complet':17 'describ':11 | {111}
(2 rows)
I need the result of a union that includes all the results in one line:
-----------+----------
'2019':7 '22':27 'así':1 'chelsea':12 'complet':17 'describ':11 | {222, 111}
(1 rows)
Here is a working example:
https://onecompiler.com/postgresql/3xyf3pfq5

Related

Converting rows to columns using crosstab in PostgreSQL not working (relation “table” does not exist)

I need to use the compiled data using CTE and then convert the columns to rows using crosstab(open to other ideas) in the next select statement. Below is the query.
with checked_adgroup AS (
SELECT
ua.new_adgroup,
ua.account,
ua.campaign,
ua.ad_group,
ua."position",
cp.category,
pt.full_value,
FROM unnest_adgroup ua
LEFT JOIN taxonomy_category cp ON ua."position" = cp."position"
LEFT JOIN taxonomy pt ON ua.short_val = pt.short_value AND cp.category = pt.category AND (pt.lob IS NULL OR pt.lob = ua.lob)
)
SELECT *
from crosstab(
'select
cad.account,
cad.campaign,
cad.ad_group,
cad.category,
cad.full_value
FROM checked_adgroup cad
WHERE cad.all_correct AND cad.category IS NOT NULL
ORDER BY 1,2,3')
AS final_result(
account text, campaign text, ad_group text,
division text, lob text, match_type text );
Error message:
ERROR: relation "checked_adgroup" does not exist LINE 7: FROM checked_adgroup cad
Output of checked_adgroup cte looks like below:
enter image description here
Desired output of the final statement is:
enter image description here
Welcome to the community. First off please do not post images, they are useless to work with, and in some instances they are prohibited and cannot be viewed. Instead use formatted text.
I haven't used crosstab functionality all that much, but it does offer a second version which contains 2 queries, the second feeding into the first. There are however a couple errors in your posted query that would need correcting either way. So that first. Look for --<< tag.
with checked_adgroup AS (
SELECT
ua.new_adgroup,
ua.account,
ua.campaign,
ua.ad_group,
ua."position",
cp.category,
pt.full_value,
--<< missing column or ending , above should not be there, assumption missing column see below.
FROM unnest_adgroup ua
LEFT JOIN taxonomy_category cp ON ua."position" = cp."position"
LEFT JOIN taxonomy pt ON ua.short_val = pt.short_value AND cp.category = pt.category AND (pt.lob IS NULL OR pt.lob = ua.lob)
)
SELECT *
from crosstab(
'select
cad.account,
cad.campaign,
cad.ad_group,
cad.category,
cad.full_value
FROM checked_adgroup cad
WHERE cad.all_correct AND cad.category IS NOT NULL
--<< above line has 2 errors:
--<< Incorrectly formatted needs to cad.all_correct is not null AND cad.category IS NOT NULL
--<< column cad.all_correct does not exist (see missing column above
ORDER BY 1,2,3')
AS final_result(
account text, campaign text, ad_group text,
division text, lob text, match_type text);
Now we need to transform the CTE to a second query that crosstab might be ale to use. I have identified each with Postgres $Quoting$, not so much from necessity as standard string quote (') would be sufficant, but more from visibility standing.
select *
from crosstab(
$ct1$select
account,
campaign,
ad_group,
category,
full_value
--<< from checked_adgroup cad
--<< where cad.all_correct and cad.category is not null
--<< moved above lines to second query to avoid reference and removed qualification
order by 1,2,3
$ct1$
, $ct2$select *
from (
select
ua.new_adgroup,
ua.account,
ua.campaign,
ua.ad_group,
ua."position",
cp.category,
pt.full_value,
'mssing from orig posted query' all_correct
from unnest_adgroup ua
left join taxonomy_category cp on ua."position" = cp."position"
left join taxonomy pt on ua.short_val = pt.short_value
and cp.category = pt.category
and (pt.lob is null or pt.lob = ua.lob)
) s
where all_correct is not null and cad.category is not null
--<< move from query1
$ct2$ )
as final_result(
account text, campaign text, ad_group text,
division text, lob text, match_type text );
But at this point I get an error relationship unnest_adgroup does not exist. Which is true as you did not post the definition, not other referenced tables. But that seems to imply the syntax is correct.
Admittedly, this may be way off base if so, so be it, I can always delete later. But, I stuck at home with no other projects at the moment and this seems like an interesting question. Looking forward to the results. Good Luck.

Postgres full text search and spelling mistakes (aka fuzzy full text search)

I have a scenario, where I have data for informal communications that I need to be able to search. Therefore I want full text search, but I also to make sense of spelling mistakes. Question is how do I take spelling mistakes into account in order to be able to do fuzzy full text search??
This is very briefly discussed in Postgres Full Text Search is Good Enough where the article discusses misspelling.
So I have built a table of "documents", created indexes etc.
CREATE TABLE data (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
text TEXT NOT NULL);
I can create an additional column of type tsvector and index accordingly...
alter table data
add column search_index tsvector
generated always as (to_tsvector('english', coalesce(text, '')))
STORED;
create index search_index_idx on data using gin (search_index);
I have for example, some text where the data says "baloon", but someone may search "balloon", so I insert two rows (one deliberately misspelled)...
insert into data (text) values ('baloon');
insert into data (text) values ('balloon');
select * from data;
id | text | search_index
----+---------+--------------
1 | baloon | 'baloon':1
2 | balloon | 'balloon':1
... and perform full text searches against the data...
select * from data where search_index ## plainto_tsquery('balloon');
id | text | search_index
----+---------+--------------
2 | balloon | 'balloon':1
(1 row)
But I don't get back results for the misspelled version "baloon"... So using the suggestion in the linked article I've built a lookup table of all the words in my lexicon as follows...
"you may obtain good results by appending the similar lexeme to your tsquery"
CREATE TABLE data_words AS SELECT word FROM ts_stat('SELECT to_tsvector(''simple'', text) FROM data');
CREATE INDEX data_words_idx ON data_words USING GIN (word gin_trgm_ops);
... and I can search for similar words which may have been misspelled
select word, similarity(word, 'balloon') as similarity from data_words where similarity(word, 'balloon') > 0.4 order by similarity(word, 'balloon');
word | similarity
---------+------------
baloon | 0.6666667
balloon | 1
... but how do I actually include misspelled words in my query?
Isn't this what the article above means?
select plainto_tsquery('balloon' || ' ' || (select string_agg(word, ' ') from data_words where similarity(word, 'balloon') > 0.4));
plainto_tsquery
----------------------------------
'balloon' & 'baloon' & 'balloon'
(1 row)
... plugged into an actual search, and I get no rows!
select * from data where text ## plainto_tsquery('balloon' || ' ' || (select string_agg(word, ' ') from data_words where similarity(word, 'balloon') > 0.4));
select * from data where search_index ## phraseto_tsquery('baloon balloon'); -- no rows returned
I'm not sure where I'm going wrong here - can any shed any light? I feel like I'm super close to getting this going...?
SELECT to_tsquery('balloon |' ||
string_agg(word, ' | ')
)
FROM data_words
WHERE similarity(word, 'balloon') > 0.4;
For anyone looking at this thread, the accepted answer by #laurenz-albe needed a slight modification for me:
It required single quotes around the argument values passed to the string_agg function, which can be done using the format function along with the %L placeholder.
This updated code worked for me:
SELECT to_tsquery('balloon |' ||
string_agg(format('%L', word), ' | ')
)
FROM data_words
WHERE similarity(word, 'balloon') > 0.4;

Inserting values into multiple columns by splitting a string in PostgreSQL

I have the following heap of text:
"BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,
URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,
16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,
1.0,Version,1.0,".
What I'd like to do is extract data from this in the following manner:
BundleSize:155648
DynamicSize:204800
Identifier:com.URLConnectionSample
Name:URLConnectionSample
ShortVersion:1.0
Version:1.0
BundleSize:155648
DynamicSize:16384
Identifier:com.IdentifierForVendor3
Name:IdentifierForVendor3
ShortVersion:1.0
Version:1.0
All tips and suggestions are welcome.
It isn't quite clear what do you need to do with this data. If you really need to process it entirely in the database (looks like the task for your favorite scripting language instead), one option is to use hstore.
Converting records one by one is easy:
Assuming
%s =
BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0
SELECT * FROM each(hstore(string_to_array(%s, ',')));
Output:
key | value
--------------+-------------------------
Name | URLConnectionSample
Version | 1.0
BundleSize | 155648
Identifier | com.URLConnectionSample
DynamicSize | 204800
ShortVersion | 1.0
If you have table with columns exactly matching field names (note the quotes, populate_record is case-sensitive to key names):
CREATE TABLE data (
"BundleSize" integer, "DynamicSize" integer, "Identifier" text,
"Name" text, "ShortVersion" text, "Version" text);
You can insert hstore records into it like this:
INSERT INTO data SELECT * FROM
populate_record(NULL::data, hstore(string_to_array(%s, ',')));
Things get more complicated if you have comma-separated values for more than one record.
%s = BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,1.0,Version,1.0,
You need to break up an array into chunks of number_of_fields * 2 = 12 elements first.
SELECT hstore(row) FROM (
SELECT array_agg(str) AS row FROM (
SELECT str, row_number() OVER () AS i FROM
unnest(string_to_array(%s, ',')) AS str
) AS str_sub
GROUP BY (i - 1) / 12) AS row_sub
WHERE array_length(row, 1) = 12;
Output:
"Name"=>"URLConnectionSample", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.URLConnectionSample", "DynamicSize"=>"204800", "ShortVersion"=>"1.0"
"Name"=>"IdentifierForVendor3", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.IdentifierForVendor3", "DynamicSize"=>"16384", "ShortVersion"=>"1.0"
And inserting this into the aforementioned table:
INSERT INTO data SELECT (populate_record(NULL::data, hstore(row))).* FROM ...
the rest of the query is the same.

Inserting many rows in treelike structure in SQL SERVER 2005

I have a few tables in SQL that are pretty much like this
A B C
ID_A ID_B ID_C
Name Name Name
ID_A ID_B
As you can see, A is linked to B and B to C. Those are basically tables that contains data models. Now, I would need to be able to create date based on those tables. For example, if I have the following datas
A B C
1 Name1 1 SubName1 1 1 SubSubName1 1
2 Name2 2 SubName2 1 2 SubSubName2 1
3 SubName3 2 3 SubSubName3 2
4 SubSubName4 3
5 SubSubName5 3
I would like to copy the 'content' of those tables in others tables. Of course, the auto numeric key that is generated when inserting into the new tables are diffirent that those one and I would like to be able to keep track so that I can copy the entire thing. The structure of the recipient table contains more information that those, but it's mainly dates and other stuff that are easy to get for me.
I would need to this entirely in TRANSACT-SQL (with built-in function if needed). Is this possible and can anyone give me a short example. I manage to do it for one level, but I get confused for the rest.
thanks
EDIT : The info above is just an example, because my actual diagram looks more like this
Model tables :
Processes -- (1-N) Steps -- (1-N) Task -- (0-N) TaskCheckList
-- (0-N) StepsCheckLists
Where as the table I need to fill looks like this
Client -- (0-N) Sequence -- (1-N) ClientProcesses -- (1-N) ClientSteps -- (1-N)ClientTasks -- (0-N) ClientTaskCheckList
-- (0-N)ClientStepCheckLists
The Client already exists and when I need to run the script, I create one sequence, which will contains all processes, which will contains its steps, taks, etc...
Ok,
So I did a lot of trials and error, and here is what I got. It seems to work fine although it sound quite big for something that seemed easy at first.
The whole this is somehow in french and english because our client is french and so are we anyway. It does insert every date in all tables that I needed. The only thing left to this will be the first lines where I need to select the date to insert according to some parameters but this is the easy part.
DECLARE #IdProcessusRenouvellement bigint
DECLARE #NomProcessus nvarchar(255)
SELECT #IdProcessusRenouvellement = ID FROM T_Ref_Processus WHERE Nom LIKE 'EXP%'
SELECT #NomProcessus = Nom FROM T_Ref_Processus WHERE Nom LIKE 'EXP%'
DECLARE #InsertedSequence table(ID bigint)
DECLARE #Contrats table(ID bigint,IdClient bigint,NumeroContrat nvarchar(255))
INSERT INTO #Contrats SELECT ID,IdClient,NumeroContrat FROM T_ClientContrat
DECLARE #InsertedIdsSeq as Table(ID bigint)
-- Séquences de travail
INSERT INTO T_ClientContratSequenceTravail(IdClientContrat,Nom,DateDebut)
OUTPUT Inserted.ID INTO #InsertedIdsSeq
SELECT ID, #NomProcessus + ' - ' + CONVERT(VARCHAR(10), GETDATE(), 120) + ' : ' + NumeroContrat ,GETDATE()
FROM #Contrats
-- Processus
DECLARE #InsertedIdsPro as Table(ID bigint,IdProcessus bigint)
INSERT INTO T_ClientContratProcessus
(IdClientContratSequenceTravail,IdProcessus,Nom,DateDebut,DelaiRappel,DateRappel,LienAvecPro cessusRenouvellement,IdStatutProcessus,IdResponsable,Sequence)
OUTPUT Inserted.ID,Inserted.IdProcessus INTO #InsertedIdsPro
SELECT I.ID,P.ID,P.Nom,GETDATE(),P.DelaiRappel,GETDATE(),P.LienAvecProcessusRenouvellement,0,0,0
FROM #InsertedIdsSeq I, T_Ref_Processus P
WHERE P.ID = #IdProcessusRenouvellement
-- Étapes
DECLARE #InsertedIdsEt as table(ID bigint,IdProcessusEtape bigint)
INSERT INTO T_ClientContratProcessusEtape
(IdClientContratProcessus,IdProcessusEtape,Nom,DateDebut,DelaiRappel,DateRappel,NomListeVeri fication,Sequence,IdStatutEtape,IdResponsable,IdTypeResponsable,ListeVerificationTermine)
OUTPUT Inserted.ID,Inserted.IdProcessusEtape INTO #InsertedIdsEt
SELECT I.ID,E.ID,
E.Nom,GETDATE(),E.DelaiRappel,GETDATE(),COALESCE(L.Nom,''),E.Sequence,0,0,E.IdTypeResponsabl e,0
FROM #InsertedIdsPro I INNER JOIN T_Ref_ProcessusEtape E ON I.IdProcessus = E.IdProcessus
LEFT JOIN T_Ref_ListeVerification L ON E.IdListeVerification = L.ID
-- Étapes : Items de la liste de vérification
INSERT INTO T_ClientContratProcessusEtapeListeVerificationItem
(IdClientContratProcessusEtape,Nom,Requis,Verifie)
SELECT I.ID,IT.Nom,IT.Requis,0
FROM #InsertedIdsEt I
INNER JOIN T_Ref_ProcessusEtape E ON I.IdProcessusEtape = E.ID
INNER JOIN T_Ref_ListeVerificationItem IT ON E.IdListeVerification = IT.IdListeVerification
-- Tâches
DECLARE #InsertedIdsTa as table(ID bigint, IdProcessusEtapeTache bigint)
INSERT INTO T_ClientContratProcessusEtapeTache
(IdClientContratProcessusEtape,IdProcessusEtapeTache,Nom,DateDebut,DelaiRappel,DateRappel,No mListeVerification,Sequence,IdStatutTache,IdResponsable,IdTypeResponsable,ListeVerificationT ermine)
OUTPUT Inserted.ID,Inserted.IdProcessusEtapeTache INTO #InsertedIdsTa
SELECT I.ID,T.ID,
T.Nom,GETDATE(),T.DelaiRappel,GETDATE(),COALESCE(L.Nom,''),T.Sequence,0,0,T.IdTypeResponsabl e,0
FROM #InsertedIdsEt I
INNER JOIN T_Ref_ProcessusEtapeTache T ON I.IdProcessusEtape = T.IdProcessusEtape
LEFT JOIN T_Ref_ListeVerification L ON T.IdListeVerification = L.ID
-- Tâches : Items de la liste de vérification
INSERT INTO T_ClientContratProcessusEtapeTacheListeVerificationItem
(IdClientContratProcessusEtapeTache,Nom,Requis,Verifie)
SELECT I.ID,IT.Nom,IT.Requis,0
FROM #InsertedIdsTa I
INNER JOIN T_Ref_ProcessusEtapeTache T ON I.IdProcessusEtapeTache = T.ID
INNER JOIN T_Ref_ListeVerificationItem IT ON T.IdListeVerification = IT.IdListeVerification

Nested SELECT statement in a CASE expression

Greetings,
Here is my problem.
I need to get data from multiple rows and return them as a single result in a larger query.
I already posted a similar question here.
Return multiple values in one column within a main query but I suspect my lack of SQL knowledge made the question too vague because the answers did not work.
I am using Microsoft SQL 2005.
Here is what I have.
Multiple tables with CaseID as the PK, CaseID is unique.
One table (tblKIN) with CaseID and ItemNum(AutoInc) as the combined PK.
Because each person in the database will likely have more than one relative.
If I run the following, in a SQL query window, it works.
DECLARE #KINList varchar(1000)
SELECT #KINList = coalesce(#KINList + ', ','') + KINRel from tblKIN
WHERE CaseID = 'xxx' and Address = 'yyy'
ORDER BY KINRel
SELECT #KINList
This will return the relation of all people who live at the same address. the results look like this...
Father, Niece, Sister, Son
Now, the problem for me is how do I add that to my main query?
Shortened to relevant information, the main query looks like this.
SELECT DISTINCT
c.CaseID,
c.Name,
c.Address,
Relatives=CASE WHEN exists(select k.CaseID from tblKIN k where c.CaseID = k.CaseID)
THEN DECLARE #KINList varchar(1000)
SELECT #KINList = coalesce(#KINList + ', ','') + KINRel from tblKIN
WHERE CaseID = 'xxx' and Address = 'yyy'
ORDER BY KINRel
SELECT #KINList
ELSE ''
END
FROM tblCase c
ORDER BY c.CaseID
The errors I receive are.
Server: Msg 156, Level 15, State 1, Line 13
Incorrect syntax near the keyword 'DECLARE'.
Server: Msg 156, Level 15, State 1, Line 18
Incorrect syntax near the keyword 'ELSE'.
I tried nesting inside parenthesis from the DECLARE to the end of the SELECT #KINList.
I tried adding a BEGIN and END to the THEN section of the CASE statement.
Neither worked.
The source table data looks something like this. (periods added for readability)
tblCase
CaseID Name Address
10-001 Jim......100 Main St.
10-002 Tom....150 Elm St.
10-003 Abe.....200 1st St.
tblKIN
CaseID ItemNum Name Relation Address
10-001 00001 Steve...Son........100 Main St.
10-002 00002 James..Father....150 Elm St.
10-002 00003 Betty....Niece......150 Elm St.
10-002 00004 Greta...Sister.....150 Elm St.
10-002 00005 Davey..Son........150 Elm St.
10-003 00006 Edgar...Brother...200 1st St.
If I run the query for CaseID = 10-002, it needs to return the following.
CaseID Name Address.......Relatives
10-002 Tom...150 Elm St. ..Father, Niece, Sister, Son
I am sure this is probably a simple fix, but I just don't know how to do it.
Thank you for your time, and I apologize for the length of the question, but I wanted to be clear.
Thanks !!!
When I did something similar I had to create a scalar function to do the coalesce that returns the varchar result. Then just call it in the select.
CREATE FUNCTION GetRelatives
(
#CaseID varchar(10)
)
RETURNS varchar(1000)
AS
BEGIN
DECLARE #KINList varchar(1000)
SELECT #KINList = coalesce(#KINList + ', ','') + KINRel from tblKIN
WHERE CaseID = #CaseID
ORDER BY KINRel
RETURN #KINList
END
Then your select
SELECT DISTINCT
c.CaseID,
c.Name,
c.Address,
database.dbo.GetRelatives(c.CaseID) AS Relatives
FROM tblCase c
ORDER BY c.CaseID
You can create a FUNCTION which takes in the caseID as the arguement and returns true or false.
Since you are calling the nested query multiple times, its definitely a performance hit. A better solution is to execute the query and store the results in a temporary table.
Then pass this temporary table and the caseID to the FUNCTION and check for containment.