How can I order a fulltext query on SQL Server by text occurrence? - sql-server-2008-r2

I've recently started using fulltext indexes in a document searching system I'm developing with SQL Server 2008 R2. I've got the binary in one column and the stripped text from the PDF file in the other; the fulltext index is on the text column.
I've come up empty searching (probably b/c I don't know how to phrase the question), but I need to figure out how I can order the results by text occurrence. For example, if someone is searching for the string "book", I want to order the query based on how many times "book" occurs in the text.

I would recommend using a 3rd party library to handle searching. A good search engine is a lot more than a string finder.
That being said, I understand it may still be useful to you to implement your text finder as you've specified. You can use a function to count the number of times the sub-string occurs.
I modified a custom Split function from Andy Robinson T-SQL split string
CREATE FUNCTION dbo.countOfSubString (#source VARCHAR(MAX), #stringToFind VARCHAR(MAX))
RETURNS INT
AS
BEGIN
DECLARE #name NVARCHAR(255)
DECLARE #pos INT
DECLARE #count INT = 0
WHILE CHARINDEX(#stringToFind, #source) > 0
BEGIN
SELECT #pos = CHARINDEX(#stringToFind, #source)
SELECT #name = SUBSTRING(#source, 1, #pos-1)
SET #count = #count + 1
SELECT #source = SUBSTRING(#source, #pos+1, LEN(#source)-#pos)
END
RETURN #count
END
Then use it like this to find the page that has the most matches:
SELECT TOP 1 page_id
, page_text
FROM pages
ORDER BY dbo.countOfSubString(page_text, search_text) DESC

Related

Running a stored procedure using a resultset as the parameter and insert into a table

I have the question below,
I do have a query which returns 10 rows for example.
SELECT CarId FROM Car
EXEC spQ_GetCar (#CarId)
Also, I have a stored procedure that uses an id from the table above.
My question is, how can I run the stored procedure and use the output of the table as the parameter and then insert that into another temp table.
Would it be possible by using a cursor and dynamic SQL ?, has anyone faced this before ?
NOTES: I cannot create a table type to use it
A fast answer:
Create a table, then you can insert-exec:
insert yourPreparedtable
EXEC spQ_GetCar (#CarId)
The true answer:
Erland Sommarskog's article is the most in-depth resource you will find. Many alternatives with detailed advantages and disadvantages.
I achieved this by following approach below:
CREATE TABLE #TEMP (CarId INT)
DECLARE #CarId INT
DECLARE Car CURSOR
,CarId INT SET Car = CURSOR STATIC
FOR
SELECT CarId
FROM Car
OPEN Car
WHILE 1 = 1
BEGIN
FETCH Car
INTO #CarId
IF ##FETCH_STATUS <> 0
BREAK
INSERT INTO #TEMP
EXEC spQ_GetCar(#CarId)
END
CLOSE #ServiceAgreements
DEALLOCATE #ServiceAgreements
DROP TABLE #TEMP
Notes: Performance was not relevant as it was a one off script.

Process a row with unknown structure in a cursor

I am new to using cursors for looping through a set of rows. But so far I had prior knowledge of which columns I am about to read.
E.g.
DECLARE db_cursor FOR
SELECT Column1, Column2
FROM MyTable
DECLARE #ColumnOne VARCHAR(50), #ColumnTwo VARCHAR(50)
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #ColumnOne, #ColumnTwo
...
But the tables I am about to read into my key/value table have no specific structure and I should be able to process them one row at a time. How, using a nested cursor, can I loop through all the columns of the fetched row and process them according to their type and name?
TSQL cursors are not really designed to read data from tables of unknown structure. The two possibilities I can think of to achieve something in that direction are:
First read the column names of an unknown table from the Information Schema Views (see System Information Schema Views (Transact-SQL)). Then use dynamic SQL to create the cursor.
If you simply want to get any columns as a large string value, you might also try a simple SELECT * FROM TABLE_NAME FOR XML AUTO and further process the retrieved data for your purposes (see FOR XML (SQL Server)).
SQL is not very good in dealing with sets generically. In most cases you must know column names, data types and much more in advance. But there is XQuery. You can transform any SELECT into XML rather easily and use the mighty abilities to deal with generic structures there. I would not recommend this, but it might be worth a try:
CREATE PROCEDURE dbo.Get_EAV_FROM_SELECT
(
#SELECT NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #tmptbl TABLE(TheContent XML);
DECLARE #cmd NVARCHAR(MAX)= N'SELECT (' + #SELECT + N' FOR XML RAW, ELEMENTS XSINIL);';
INSERT INTO #tmptbl EXEC(#cmd);
SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM #tmptbl t
CROSS APPLY t.TheContent.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c)
END;
GO
EXEC Get_EAV_FROM_SELECT #SELECT='SELECT TOP 10 o.object_id,o.* FROM sys.objects o';
GO
--Clean-Up for test purpose
DROP PROCEDURE Get_EAV_FROM_SELECT;
The idea in short
The select is passed into the procedure as string. With the SP we create a statement dynamically and create XML from it.
The very first column is considered to be the Row's ID, if not (like in sys.objects) we can write the SELECT and force it that way.
The inner SELECT will read each row and return a classical EAV-list.

How to call a stored procedure in SQL developer where the argument is a long string with carriage returns

I am trying to run a stored procedure that takes a string with carriage returns as an argument from within a SQL Developer session. Specifically the string is itself a SQL statement which gets picked up by the procedure, processed and stored in a table.
The problem is finding a way to preserve the readable formatting of the statements i.e., text with multiple lines / carriage returns. The closest I have gotten is the following:
1) Create a new table to store the SQL statements:
CREATE TABLE sql_table
(
id NUMBER,
sql_string CLOB
);
2) Create a stored procedure:
CREATE OR REPLACE PROCEDURE
update_sql(var_id NUMBER, var_sql_string IN CLOB) IS
BEGIN
INSERT INTO sql_table (id, sql_string) VALUES (var_id, var_sql_string);
--do other stuff
END; /
3) Run the following command to add a new row with a sql statement to the table:
EXEC update_sql(127,'&input');
4) At the prompt, paste in a multiple line statement such as:
SELECT *
FROM any_table
WHERE a = b;
5) Then query sql_table and copy and paste the content of the column sql_string into a text editor - the carriage returns are now gone:
SELECT * FROM any_table WHERE a=b;
As mentioned, I would like to preserve the carriage returns so that the statements display nicely when extracted from the table.
Any help appreciated.
Thanks,
Dennis
You didn't share any sample text you wanted to store, so I made up a very simple one
DECLARE
VAR_ID NUMBER;
VAR_SQL_STRING CLOB;
BEGIN
VAR_ID := 4;
VAR_SQL_STRING := q'[SELECT
* FROM EMPLOYEES
WHERE FIRST_NAME = 'JEFF']';
UPDATE_SQL(
VAR_ID => VAR_ID,
VAR_SQL_STRING => VAR_SQL_STRING
);
--rollback;
END;
The q feature for string literals is quite nice because it handles quoted strings for you auto-magically, which you'll have all over the place in your SQL. Steven Feuerstein talks about this here in his article covering strings in PL/SQL.
You don't say HOW you're extracting the data...
As mentioned, I would like to preserve the carriage returns so that
the statements display nicely when extracted from the table.
But browsing the table in SQL Developer -

How can I generate a unique string per record in a table in Postgres?

Say I have a table like posts, which has typical columns like id, body, created_at. I'd like to generate a unique string with the creation of each post, for use in something like a url shortener. So maybe a 10-character alphanumeric string. It needs to be unique within the table, just like a primary key.
Ideally there would be a way for Postgres to handle both of these concerns:
generate the string
ensure its uniqueness
And they must go hand-in-hand, because my goal is to not have to worry about any uniqueness-enforcing code in my application.
I don't claim the following is efficient, but it is how we have done this sort of thing in the past.
CREATE FUNCTION make_uid() RETURNS text AS $$
DECLARE
new_uid text;
done bool;
BEGIN
done := false;
WHILE NOT done LOOP
new_uid := md5(''||now()::text||random()::text);
done := NOT exists(SELECT 1 FROM my_table WHERE uid=new_uid);
END LOOP;
RETURN new_uid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;
make_uid() can be used as the default for a column in my_table. Something like:
ALTER TABLE my_table ADD COLUMN uid text NOT NULL DEFAULT make_uid();
md5(''||now()::text||random()::text) can be adjusted to taste. You could consider encode(...,'base64') except some of the characters used in base-64 are not URL friendly.
All existing answers are WRONG because they are based on SELECT while generating unique index per table record. Let us assume that we need unique code per record while inserting: Imagine two concurrent INSERTs are happening same time by miracle (which happens very often than you think) for both inserts same code was generated because at the moment of SELECT that code did not exist in table. One instance will INSERT and other will fail.
First let us create table with code field and add unique index
CREATE TABLE my_table
(
code TEXT NOT NULL
);
CREATE UNIQUE INDEX ON my_table (lower(code));
Then we should have function or procedure (you can use code inside for trigger also) where we 1. generate new code, 2. try to insert new record with new code and 3. if insert fails try again from step 1
CREATE OR REPLACE PROCEDURE my_table_insert()
AS $$
DECLARE
new_code TEXT;
BEGIN
LOOP
new_code := LOWER(SUBSTRING(MD5(''||NOW()::TEXT||RANDOM()::TEXT) FOR 8));
BEGIN
INSERT INTO my_table (code) VALUES (new_code);
EXIT;
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
This is guaranteed error free solution not like other solutions on this thread
Use a Feistel network. This technique works efficiently to generate unique random-looking strings in constant time without any collision.
For a version with about 2 billion possible strings (2^31) of 6 letters, see this answer.
For a 63 bits version based on bigint (9223372036854775808 distinct possible values), see this other answer.
You may change the round function as explained in the first answer to introduce a secret element to have your own series of strings (not guessable).
The easiest way probably to use the sequence to guarantee uniqueness
(so after the seq add a fix x digit random number):
CREATE SEQUENCE test_seq;
CREATE TABLE test_table (
id bigint NOT NULL DEFAULT (nextval('test_seq')::text || (LPAD(floor(random()*100000000)::text, 8, '0')))::bigint,
txt TEXT
);
insert into test_table (txt) values ('1');
insert into test_table (txt) values ('2');
select id, txt from test_table;
However this will waste a huge amount of records. (Note: the max bigInt is 9223372036854775807 if you use 8 digit random number at the end, you can only have 922337203 records. Thou 8 digit is probably not necessary. Also check the max number for your programming environment!)
Alternatively you can use varchar for the id and even convert the above number with to_hex() or change to base36 like below (but for base36, try to not expose it to customer, in order to avoid some funny string showing up!):
PostgreSQL: Is there a function that will convert a base-10 int into a base-36 string?
Check out a blog by Bruce. This gets you part way there. You will have to make sure it doesn't already exist. Maybe concat the primary key to it?
Generating Random Data Via Sql
"Ever need to generate random data? You can easily do it in client applications and server-side functions, but it is possible to generate random data in sql. The following query generates five lines of 40-character-length lowercase alphabetic strings:"
SELECT
(
SELECT string_agg(x, '')
FROM (
SELECT chr(ascii('a') + floor(random() * 26)::integer)
FROM generate_series(1, 40 + b * 0)
) AS y(x)
)
FROM generate_series(1,5) as a(b);
Use primary key in your data. If you really need alphanumeric unique string, you can use base-36 encoding. In PostgreSQL you can use this function.
Example:
select base36_encode(generate_series(1000000000,1000000010));
GJDGXS
GJDGXT
GJDGXU
GJDGXV
GJDGXW
GJDGXX
GJDGXY
GJDGXZ
GJDGY0
GJDGY1
GJDGY2

Split Cell using SSIS

I'm looking to use SSIS to transform the data held from a single source table. One of the cells has a string of characters. For example:
##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\
There's also another cell on the same row which contains a date.
Basically I want a each character within that string to be transferred to a new table as a row on it's own. The first two characters represent the date given in the other cell. The next two characters represent the following day and so on. So as well as having each character on it's own I would also want to increment the data and store that too.
Any idea how I would go about doing this or even if SSIS is the correct tool to be using.
Many Thanks
I wonder if you'd be better running this through a split-string function in SQL first? That way you'l be getting rows for each character along-side the date, and then you can just output it straight to a destination.
I've created a function to facilitate this:
CREATE FUNCTION [dbo].[udf_SplitStringIntoRows](#text varchar(max))
RETURNS #tbl TABLE ([value] char(1) NOT NULL)
AS
BEGIN
WHILE len(#text) > 0
BEGIN
INSERT INTO #tbl
SELECT left(#text,1)
SET #text = RIGHT(#text,len(#text)-1)
END
RETURN
END
Then, to test the data i created a quick temp table with your data in:
DECLARE #source as TABLE([value] varchar(max), [date] datetime)
INSERT INTO #source
SELECT '##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\', getdate()
UNION
SELECT '##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\', getdate()+1
UNION
SELECT '##/\/\/\/\/\##HHHHHHBBBB##/\/\/\/\/\', getdate()+2
Then cross applied the function to this dataset:
SELECT d.[value], s.date
FROM #source s
CROSS APPLY dbo.[udf_SplitStringIntoRows](s.value) d
Which should give you the source dataset you require to further process in SSIS.