Postgres crosstab confusion

Postgres crosstab confusion - postgresql

I'm trying to achieve the following from my data:
docid, yyyy, tiOne, tiTwo, tiThree, tiFour
d1 2011 txtA txtB txtC
d2 2012 txtD txtE txtF txtG
d3 2013 txtH txtI txtJ
d4 2013 txtK
This is how to recreate my data:
CREATE TEMP TABLE t (
docid text
, yyyy int
, timark text
, txtmark text
);
INSERT INTO t VALUES
('d1', 2011, 'tiOne', 'txtA'),
('d1', 2011, 'tiTwo', 'txtB'),
('d1', 2011, 'tiThree', 'txtC'),
('d2', 2012, 'tiOne', 'txtD'),
('d2', 2012, 'tiTwo', 'txtE'),
('d2', 2012, 'tiThree', 'txtF'),
('d2', 2012, 'tiFour', 'txtG'),
('d3', 2013, 'tiOne', 'txtH'),
('d3', 2013, 'tiTwo', 'txtI'),
('d3', 2013, 'tiThree', 'txtJ'),
('d4', 2013, 'tiOne', 'txtK')
;
This is my code
select *
FROM crosstab(
'SELECT docid, timark, txtmark
FROM t
ORDER BY 1,2') -- needs to be "ORDER BY 1,2" here
AS ct ("docid" text, "timark" text, "txtmark" text);
but I'm getting a completely confusing output as follows:
docid timark txtmark
d1 txtA txtC
d2 txtG txtD
d3 txtH txtJ
d4 txtK
The 'tiOne' data is not well structured, so it is hard to know precisely what is going to be in those columns, so it won't be easy to hard-code those values into the code

The crosstab query alias should contain the column names and types of the result set.
The result of the source query should be ordered by two columns: row identifier (docid) and category identifier (timark).
Unfortunately, the alphabetical order of the category names is not the expected one.
In this case use the form of crosstab function with two parameters.
The second parameter is a query that selects all categories in an expected order.
This form of crosstab allows also to have additional columns (yyyy) and properly shows incomplete data.
select *
FROM crosstab(
$$ SELECT docid, yyyy, timark, txtmark
FROM t
ORDER BY 1 $$,
$$ values ('tiOne'), ('tiTwo'), ('tiThree'), ('tiFour') $$)
AS ct ("docid" text, "yyyy" text, "tiOne" text, "tiTwo" text, "tiThree" text, "tiFour" text);
docid | yyyy | tiOne | tiTwo | tiThree | tiFour
-------+------+-------+-------+---------+--------
d1 | 2011 | txtA | txtB | txtC |
d2 | 2012 | txtD | txtE | txtF | txtG
d3 | 2013 | txtH | txtI | txtJ |
d4 | 2013 | txtK | | |
(4 rows)

Related

problems with full-text search in postgres

I have the next table, and data:
/* script for people table, with field tsvector and gin */
CREATE TABLE public.people (
id INTEGER,
name VARCHAR(30),
lastname VARCHAR(30),
complete TSVECTOR
)
WITH (oids = false);
CREATE INDEX idx_complete ON public.people
USING gin (complete);
/* data for people table */
INSERT INTO public.people ("id", "name", "lastname", "complete")
VALUES
(1, 'MICHAEL', 'BRYANT BRYANT', '''bryant'':2,3 ''michael'':1'),
(2, 'HENRY STEVEN', 'BUSH TIESSEN', '''bush'':3 ''henri'':1 ''steven'':2 ''tiessen'':4'),
(3, 'WILLINGTON STEVEN', 'STEPHENS FLINN', '''flinn'':4 ''stephen'':3 ''steven'':2 ''willington'':1'),
(4, 'BRET', 'MARTINEZ AROCH', '''aroch'':3 ''bret'':1 ''martinez'':2'),
(5, 'TERENCE BERT', 'CAVALIERE ENRON', '''bert'':2 ''cavalier'':3 ''terenc'':1');
I need retrieve the names and lastnames, according the tsvector field. Actually I have the query:
SELECT * FROM people WHERE complete ## to_tsquery('WILLINGTON & FLINN');
And the result is right (the third record). BUT if I try with
SELECT * FROM people WHERE complete ## to_tsquery('STEVEN & FLINN');
/* the same record! */
I don't have results. Why? What can I do?

You should use the same language to search your table as the values in your field 'complete' where inserted.
Check the result of that query compared english and german:
select * ,
to_tsvector('english', concat_ws(' ', name, lastname )) as english,
to_tsvector('german', concat_ws(' ', name, lastname )) as german
from public.people
so that should work for you :
SELECT * FROM people WHERE complete ## to_tsquery('english','STEVEN & FLINN');

You are probably using a text search configuration where either STEVEN or FLINN are modified by stemming.
I can reproduce this here:
test=> SHOW default_text_search_config;
default_text_search_config
----------------------------
pg_catalog.german
(1 row)
test=> SELECT complete FROM public.people WHERE id = 3;
complete
-------------------------------------------------
'flinn':4 'stephen':3 'steven':2 'willington':1
(1 row)
test=> SELECT * FROM ts_debug('STEVEN & FLINN');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+--------+---------------+-------------+---------
asciiword | Word, all ASCII | STEVEN | {german_stem} | german_stem | {stev}
blank | Space symbols | | {} | |
blank | Space symbols | & | {} | |
asciiword | Word, all ASCII | FLINN | {german_stem} | german_stem | {flinn}
(4 rows)
test=> SELECT * FROM public.people
WHERE complete ## to_tsquery('STEVEN & FLINN');
id | name | lastname | complete
----+------+----------+----------
(0 rows)
So you see, the German Snowball dictionary stems STEVEN to stev.
Since complete contains the unstemmed version steven, no match is found.
You should use the same text search configuration when you populate complete and in the query.

postgres search in column with escapes

Is there a way to do something like a grep for "site" from the following select (so that only "site=*" is returned from thedata)?
rr=# select thename,encode(thedata, 'escape') from
management_data.datas limit 2;
thename | thedata
-----------------------------------------------------------------------------------
Alexander | #
+
| #Fri Mar 15 14:58:18 PDT 2014
+
| BUs=ALL
+
| site=33$36$354$380$357$360$36$353$36$38$39$34$31$355
+
Anthony | #
+
| #Mon Jan 05 13:33:00 PST 2015
+
| mem=12000
+
| site=50$5$1$50
+
|

Given test data round-tripped successfully:
WITH somerow(name, blobofgarble) AS (
SELECT
TEXT 'Alexander',
BYTEA E'#\n#Fri Mar 15 14:58:18 PDT 2014\nBUs=ALL\nsite=33$36$354$380$357$360$36$353$36$38$39$34$31$355\n'
)
SELECT name, encode(blobofgarble, 'escape') FROM somerow;
Now, I can't possibly imagine why you'd store this information as bytea not a text field, but ... well, I guess there must be a reason. I'm going to rely on the simplifying assumption that the data, when escaped, can be treated as fairly sanely formed text, since otherwise the whole concept of "lines" is garbage and your question makes no sense.
With that assumption it's possible to use regexp_split_to_table to split on newlines, getting somewhat more sanely formed data:
WITH (...)
SELECT name, garblepart FROM somerow, regexp_split_to_table(encode(blobofgarble, 'escape'), E'\n') AS garblepart;
name | garblepart
-----------+------------------------------------------------------
Alexander | #
Alexander | #Fri Mar 15 14:58:18 PDT 2014
Alexander | BUs=ALL
Alexander | site=33$36$354$380$357$360$36$353$36$38$39$34$31$355
Alexander |
(5 rows)
(this is an implicit LATERAL query, so it'll only work in PostgreSQL 9.3 and above).
Now you can use pretty ordinary operations to find the row of interest and extract the desired part, in this case with some more pattern matching:
WITH (...)
SELECT
name, substring(garblepart from '=(.*$)')
FROM somerow,
regexp_split_to_table(encode(blobofgarble, 'escape'), E'\n') AS garblepart
WHERE garblepart LIKE 'site=%';
name | substring
-----------+-------------------------------------------------
Alexander | 33$36$354$380$357$360$36$353$36$38$39$34$31$355
(1 row)
Now go fix your schema so that you store your data sanely and don't have to do this.

Split a string and populate a table for all records in table in SQL Server 2008 R2

I have a table EmployeeMoves:
| EmployeeID | CityIDs
+------------------------------
| 24 | 23,21,22
| 25 | 25,12,14
| 29 | 1,2,5
| 31 | 7
| 55 | 11,34
| 60 | 7,9,21,23,30
I'm trying to figure out how to expand the comma-delimited values from the EmployeeMoves.CityIDs column to populate an EmployeeCities table, which should look like this:
| EmployeeID | CityID
+------------------------------
| 24 | 23
| 24 | 21
| 24 | 22
| 25 | 25
| 25 | 12
| 25 | 14
| ... and so on
I already have a function called SplitADelimitedList that splits a comma-delimited list of integers into a rowset. It takes the delimited list as a parameter. The SQL below will give me a table with split values under the column Value:
select value from dbo.SplitADelimitedList ('23,21,1,4');
| Value
+-----------
| 23
| 21
| 1
| 4
The question is: How do I populate EmployeeCities from EmployeeMoves with a single (even if complex) SQL statement using the comma-delimited list of CityIDs from each row in the EmployeeMoves table, but without any cursors or looping in T-SQL? I could have 100 records in the EmployeeMoves table for 100 different employees.

This is how I tried to solve this problem. It seems to work and is very quick in performance.
INSERT INTO EmployeeCities
SELECT
em.EmployeeID,
c.Value
FROM EmployeeMoves em
CROSS APPLY dbo.SplitADelimitedList(em.CityIDs) c;
UPDATE 1:
This update provides the definition of the user-defined function dbo.SplitADelimitedList. This function is used in above query to split a comma-delimited list to table of integer values.
CREATE FUNCTION dbo.fn_SplitADelimitedList1
(
#String NVARCHAR(MAX)
)
RETURNS #SplittedValues TABLE(
Value INT
)
AS
BEGIN
DECLARE #SplitLength INT
DECLARE #Delimiter VARCHAR(10)
SET #Delimiter = ',' --set this to the delimiter you are using
WHILE len(#String) > 0
BEGIN
SELECT #SplitLength = (CASE charindex(#Delimiter, #String)
WHEN 0 THEN
datalength(#String) / 2
ELSE
charindex(#Delimiter, #String) - 1
END)
INSERT INTO #SplittedValues
SELECT cast(substring(#String, 1, #SplitLength) AS INTEGER)
WHERE
ltrim(rtrim(isnull(substring(#String, 1, #SplitLength), ''))) <> '';
SELECT #String = (CASE ((datalength(#String) / 2) - #SplitLength)
WHEN 0 THEN
''
ELSE
right(#String, (datalength(#String) / 2) - #SplitLength - 1)
END)
END
RETURN
END

Preface
This is not the right way to do it. You shouldn't create comma-delimited lists in SQL Server. This violates first normal form, which should sound like an unbelievably vile expletive to you.
It is trivial for a client-side application to select rows of employees and related cities and display this as a comma-separated list. It shouldn't be done in the database. Please do everything you can to avoid this kind of construction in the future. If at all possible, you should refactor your database.
The Right Answer
To get the list of cities, properly expanded, from a table containing lists of cities, you can do this:
INSERT dbo.EmployeeCities
SELECT
M.EmployeeID,
C.CityID
FROM
EmployeeMoves M
CROSS APPLY dbo.SplitADelimitedList(M.CityIDs) C
;
The Wrong Answer
I wrote this answer due to a misunderstanding of what you wanted: I thought you were trying to query against properly-stored data to produce a list of comma-separated CityIDs. But I realize now you wanted the reverse: to query the list of cities using existing comma-separated values already stored in a column.
WITH EmployeeData AS (
SELECT
M.EmployeeID,
M.CityID
FROM
dbo.SplitADelimitedList ('23,21,1,4') C
INNER JOIN dbo.EmployeeMoves M
ON Convert(int, C.Value) = M.CityID
)
SELECT
E.EmployeeID,
CityIDs = Substring((
SELECT ',' + Convert(varchar(max), CityID)
FROM EmployeeData C
WHERE E.EmployeeID = C.EmployeeID
FOR XML PATH (''), TYPE
).value('.[1]', 'varchar(max)'), 2, 2147483647)
FROM
(SELECT DISTINCT EmployeeID FROM EmployeeData) E
;
Part of my difficulty in understanding is that your question is a bit disorganized. Next time, please clearly label your example data and show what you have, and what you're trying to work toward. Since you put the data for EmployeeCities last, it looked like it was what you were trying to achieve. It's not a good use of people's time when questions are not laid out well.

Selection formula excluding rows with columns having null values

I have a strange issue. I have a report CR. In the Selection Formula I do a test on two fields. The test is simple like that : {field_City} = 'Paris' OR {field_Country} = 'France'.
This is a sample of the data in my table:
|---------------|---------------|---------------|
| ID_Record | Country | City |
|---------------|---------------|---------------|
| 1 | null | Paris |
|---------------|---------------|---------------|
| 2 | France | null |
|---------------|---------------|---------------|
| 3 | France | Paris |
|---------------|---------------|---------------|
The result of the Selection should be the 3 records, however it's excluding the 2 first rows where there is a null value in one of the columns. Then I changed the Selection Formula like this to consider null values too : ({field_City} = 'Paris' AND (isnull({field_Country}) OR not(isnull({field_Country})))) OR ({field_Country} = 'France' AND (isnull({field_City}) OR not(isnull({field_City})))) but I am still getting only the last record ! To ensure myself that my code is correct, I generated the sql query via the option in CR 'Show sql query', then i've added a WHERE clause in which I wrote the same condition that i've put in the Selection Formula, and...... it gave me the 3 records ! Unfortunately I can't work with the sql query, I have to find out why the formula is excluding the records that have a null value in one of the columns :( I hope that you can help me. Thanks a lot !
This is the solution: ((isnull({field_Country}) AND {field_City} = 'Paris') OR (isnull({field_City}) AND {field_Country} = 'France') OR (not(isnull({field_Country})) AND {field_City} = 'Paris') OR (not(isnull({field_City})) AND {field_Country} = 'France')) , Thank you so much Craig!

You need to test for null values first:
( Not(Isnull({field_Country})) AND {field_Country}='France' )
OR
( Isnull({field_Country}) AND {field_City}='Paris' )

How do you exclude a column from showing up if there is no value?

Question about a query I'm trying to write in SQL Server Management Studio 2008. I am pulling 2 rows. The first row being the header information, the second row being the information for a certain Line Item. Keep in mind, the actual header information reads as "Column 0, 1, 2, 3, 4,.... etc."
The data looks something like this:
ROW 1: Model # | Item Description| XS | S | M | L | XL|
ROW 2: 3241 | Gray Sweatshirt| | 20 | 20 | 30 | |
Basically this shows that there are 20 smalls, 20 mediums, and 30 larges of this particular item. There are no XS's or XL's.
I want to create a subquery that puts this information in one row, but at the same time, disinclude the sizes with a blank quantity amount as shown under the XS and XL sizes.
I want it to look like this when all is said and done:
ROW 1: MODEL #| 3241 | ITEM DESCRIPTION | Gray Sweatshirt | S | 10 | M | 20 | L | 30 |
Notice there are no XS or XL's included. How do I do make it so those columns do not appear?

Since you are not posting your query, nor your table structure, I guess it is with columns Id, Description, Size. If so, you could do this and just replace with your table and column names:
DECLARE #columns varchar(8000)
SELECT #columns = COALESCE (#columns + ',[' + cast(Size as varchar) + ']', '[' + cast(Size as varchar) + ']' )
FROM YourTableName
WHERE COUNT(Size) > 0
DECLARE #query varchar(8000) = 'SELECT Id, Description, '
+ #columns +'
FROM
(SELECT Id, Description, Size
FROM YourTableName) AS Source
PIVOT
(
COUNT(Size)
FOR Size IN ('+ #columns +')
) AS Pvt'
EXEC(#query)
Anyhow, I also agree with #MichaelFredickson. I have implemented this pivot solution, yet it is absolutely better to let the presentation layer to take care of this after just pulling the raw data from SQL. If not, you would be processing the data twice, one on SQL to create the table and the other in the presentation when reading and displaying the values with your c#/vb/other code.