Extract specific words from text value - postgresql

I have a query and a returned value that looks like this:
select properties->>'text' as snippet from table where id = 31;
snippet
-----------------------------------
There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable.
(1 row)
This returns as I expect based on my query.
Is there a way that I can slice the returned text to only return words from position 5 to position 8 for example? Or alternatively, slice by character position which I will be able to use as a workaround?
I have tried using:
select properties->>'text'[0:13] as snippet from table where id = 31;
Which I hoped would return:
There are many
But it hasn't worked.
Is this possibly to slice a jsonb text field?

To "slice by character position", you can simply use the substr() function:
select substr(properties->>'text', 1, 15) as snippet
from the_table
where id = 31;
If you really want "words", you can split the text into an array using e.g. regexp_split_to_array. Once you have an array, you can use the slice syntax:
select (regexp_split_to_array(properties->>'text','\s+'))[5:8] as snippet
from the_table
where id = 31;
This returns an array, if you want it as a string, you can use array_to_string()
select array_to_string((regexp_split_to_array(properties->>'text','\s+'))[5:8],' ') as snippet
from the_table
where id = 31;
If you need that frequently, I would wrap it into a function:
create function extract_words(p_input text, p_start int, p_end int)
returns text
as
$$
select array_to_string((regexp_split_to_array(p_input,'\s+'))[p_start:p_end],' ');
$$
language sql
immutable;
Then the query is much easier to read:
select extract_words(properties->>'text', 5, 8) as snippet
from the_table
where id = 31;

Related

How can I select just the first X characters from a text column

On my PostgreSQL DB I have a table that looks like below. The description column can store a string of any size.
What I'm looking for, is a way to select just the first X chars from the content of the description column, or the whole string if X > description.length
CREATE TABLE descriptions (
id uuid NOT NULL,
description text NULL,
);
E.g.: If X = 100 chars and if in the description column I store a string containing 150+ chars, when I run select <some method on description> from descriptions, I just want to get back the first 100 chars from the description column.
Bonus if the approach proposed is extremely fast!
Use a type cast or use substring():
SELECT CAST(description AS varchar(100)), substring(description for 100)
FROM descriptions;
Or if you want to do it the "PostgreSQL way":
SELECT description::varchar(100), substr(description, 1, 100)
FROM descriptions;

How to use dynamic regex to match value in Postgres

SUMMARY: I've two tables I want to derive info out of: family_values (family_name, item_regex) and product_ids (product_id) to be able to update the property family_name in a third.
Here the plan is to grab a json array from the small family_values table and use the column value item_regex to do a test match against the product_id for every row in product_ids.
MORE DETAILS: Importing static data from CSV to table of orders. But, in evaluating cost of goods and market value I'm needing to continuously determine family from a prefix regex (item_regex from family_values) match on the product_id.
On the client this looks like this:
const families = {
FOOBAR: 'Big Ogre',
FOOBA: 'Wood Elf',
FOO: 'Valkyrie'
};
// And to find family, and subsequently COGs and Market Value:
const findFamily = product_id => Object.keys(families).find(f => new RegExp('^' + f).test(product_id));
This is a huge hit for the client so I made a family_values table in PG to include a representative: family_name, item_regex, cogs, market_value.
Then, the product_ids has a list of only the products the app cares about (out of millions). This is actually used with an insert trigger 'on before' to ignore any CSV entries that aren't in the product_ids view. So, I guess after that the product_ids view could be taken out of the equation because the orders, after inserting readonly data, has its own matching product_id. It does NOT have family_name, so I still have the issue of determining that client-side.
PSUEDO CODE: update family column of orders with family_name from family_values regex match against orders.product_id
OR update the product_ids table with a new family column and use that with the existing on insert trigger (used to left pad zeros and normalize data right now). Now I'm thinking this may be just an update as suggested, but not real good with regex in PG. I'm a PG novice.
PROBLEM: But, I'm having a hangup in doing what I thought would be like a JS Array Find operation. The family_values have been sorted on the item_regex so that the most strict match would be on top, and therefor found first.
For example, with sorting we have:
family_values_array = [
{"family_name": "Big Ogre", "item_regex": "FOOBAR"},
{"family_name": "Wood Elf", "item_regex": "FOOBA"},
{"family_name": "Valkyrie", "item_regex": "FOO"}]
So, that the comparison of product_id of ^FOOBA would yield family "Wood Elf".
SOLUTION:
The solution I finally came about using was simply using concat to write out the front-anchored regex. It was so simple in the end. The key line I was missing is:
select * into family_value_row from iol.family_values
where lvl3_id = product_row.lvl3_id and product_row.product_id
like concat(item_regex, '%') limit 1;
Whole function:
create or replace function iol.populate_families () returns void as $$
declare
product_row record;
family_value_row record;
begin
for product_row in
select product_id, lvl3_id from iol.products
loop
-- family_name is what we want after finding the BEST match fr a product_id against item_regex
select * into family_value_row from iol.family_values
where lvl3_id = product_row.lvl3_id and product_row.product_id like concat(item_regex, '%') limit 1;
-- update family_name and value columns
update iol.products set
family_name = family_value_row.family_name,
cog_cents = family_value_row.cog_cents,
market_value_cents = family_value_row.market_value_cents
where product_id = product_row.product_id;
end loop;
end;
$$
LANGUAGE plpgsql;
Use concat as updated above:
select * into family_value_row from iol.family_values
where lvl3_id = product_row.lvl3_id and product_row.product_id
like concat(item_regex, '%') limit 1;

T-SQL Join on foreign key that has leading zero

I need to link various tables that each have a common key (a serial number in this case). In some tables the key has a leading zero e.g. '037443' and on others it doesn't e.g. '37443'. In both cases the serial refers to the same product. To confound things serial 'numbers' are not always just numeric e.g. may be "BDO1234", in these cases there is never a leading zero.
I'd prefer to use the WHERE statement (WHERE a.key = b.key) but could use joins if required. Is there any way to do this?
I'm still learning so please keep it simple if possible. Many thanks.
Based on the accepted answer in this link, I've written a small tsql sample to show you what I meant by 'the right direction':
Create the test table:
CREATE TABLE tblTempTest
(
keyCol varchar(20)
)
GO
Populate it:
INSERT INTO tblTempTest VALUES
('1234'), ('01234'), ('10234'), ('0k234'), ('k2304'), ('00034')
Select values:
SELECT keyCol,
SUBSTRING(keyCol, PATINDEX('%[^0]%', keyCol + '.'), LEN(keyCol)) As trimmed
FROM tblTempTest
Results:
keyCol trimmed
-------------------- --------------------
1234 1234
01234 1234
10234 10234
0k234 k234
k2304 k2304
00034 34
Cleanup:
DROP TABLE tblTempTest
Note that the values are alpha-numeric, and only leading zeroes are trimmed.
One possible drawback is that if there is a 0 after a white space it will not be trimmed, but that's an easy fix - just add ltrim:
SUBSTRING(LTRIM(keyCol), PATINDEX('%[^0]%', LTRIM(keyCol + '.')), LEN(keyCol)) As trimmed
You need to create a function
CREATE FUNCTION CompareSerialNumbers(#SerialA varchar(max), #SerialB varchar(max))
RETURNS bit
AS
BEGIN
DECLARE #ReturnValue AS bit
IF (ISNUMERIC(#SerialA) = 1 AND ISNUMERIC(#SerialB) = 1)
SELECT #ReturnValue =
CASE
WHEN CAST(#SerialA AS int) = CAST(#SerialB AS int) THEN 1
ELSE 0
END
ELSE
SELECT #ReturnValue =
CASE
WHEN #SerialA = #SerialB THEN 1
ELSE 0
END
RETURN #ReturnValue
END;
GO
If both are numeric then it compares them as integers otherwise it compares them as strings.

select first letter of different columns in oracle

I want a query which will return a combination of characters and number
Example:
Table name - emp
Columns required - fname,lname,code
If fname=abc and lname=pqr and the row is very first of the table then result should be code = ap001.
For next row it should be like this:
Fname = efg, lname = rst
Code = er002 and likewise.
I know that we can use substr to retrieve first letter of a colume but I don't know how to use it to do with two columns and how to concatenate.
OK. You know you can use substr function. Now, to concatenate you will need a concatenation operator ||. To get the number of row retrieved by your query, you need the rownum pseudocolumn. Perhaps you will also need to use to_char function to format the number. About all those functions and operators you can read in SQL reference. Anyway I think you need something like this (I didn't check it):
select substr(fname, 1, 1) || substr(lname, 1, 1) || to_char(rownum, 'fm009') code
from emp

Splitting comma delimited cell data

I have a spreadsheet with multiple columns, one of which is an owner_id column. The problem is that this column contains a comma delimited list of owner id's and not just a single one.
I've imported this spreadsheet into my sql database (2008) and have completed other importing tasks and now have a parcel_id column as a result of this process.
I need to create an entry in my parcelOwners table for each parcelID/ownerID pair, but I'm not sure how to go about this with the owner id's being in the comma delimited list.
My tables look like this:
ImportData
=================
owner_id varchar,
parcelID int
sample row (owner_id = '13782, 21431', parcelID = 319)
ParcelOwners
=================
ownerID int,
parcelID int
row from ImportData table should look like:
ownerID = 13782, parcelID = 319
ownerID = 21431, parcelID = 319
Is this a common situation for anybody and if so, how do you go about getting around this?
The below function will split you comma sep column into a table. You will then need to iterate through the temp table and insert 1 row into your parcelOwners table using the data from your single column. To get this to work you will need an outer loop to iterate through the parcelOwners table and an inner loop to iterate through the #temptable for each row. Also, don't forget, if you come to a row in your outer loop with no comma's in the owner_id column you won't want to do anything.
CREATE FUNCTION dbo.Split(#String varchar(8000), #Delimiter char(1))
returns #temptable TABLE (items varchar(8000))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end
You can do this easily leveraging SQL Server's XML functions:
WITH xmlData (xml_owner_id,parecelID) AS (
/* make into xml */
SELECT cast('<x>'+replace(owner_id,',','</x><x>')+'</x>' as XML) AS xml_owner_id, parecelID
FROM ImportData
)
SELECT x.value('.','int') AS owner_id, parecelID /* split up */
FROM xmlData
CROSS APPLY xmlData.xml_owner_id.nodes('//x') AS func(x)
(In response to #senloe's question about how to use the function supplied by #RandomBen)
This answer to a previous question shows how to use OUTER APPLY to apply a function to every row in a table. In your case, and assuming you have already run #RandomBen's code to create the dbo.Split function, the syntax would look something like this:
INSERT INTO ParcelOwners (ownerId, parcelID)
SELECT CONVERT(int, Results.items), ImportData.parcelID
FROM ImportData
OUTER APPLY dbo.Split(ImportData.owner_id, ',') AS Results
(I don't have access to SQL Server right now, so I haven't tried it yet. You can run it without the first line, i.e. just from SELECT onwards, to see what output it is going to generate before you actually do the INSERT).