Only getting 1 result from postgres tsvector - postgresql

I am using PostgreSQL 9.3. I have built a dataset with a tsvector field called vector.
Then I execute a query against it
SELECT id, vector, relative_path, title
FROM site_server.indexed_url, plainto_tsquery('english','booking') query
WHERE vector ## query;
Only 1 row is returned. When I look at the data there are at least 6 rows that would match. How do I get it to retrieve all matching records?
Data file

Values in vector column in your data sample are not normalized. Which is ignored on COPY, as per docs:
It is important to understand that the tsvector type itself does not
perform any word normalization; it assumes the words it is given are
normalized appropriately for the application
If you run:
SELECT id, vector, relative_path, title
FROM site_server.indexed_url
WHERE to_tsvector(vector) ## plainto_tsquery('english','booking') query;
It will produce expected result I think.

Related

SELECT query returns more records than exists

Background
I have a table with raster data (grib_data) created by using raster2pgsql.
I have created a second table (turb_mod) with a subset of the points in grib_data that has a value above a certain threshold.
This subset table (turb_mod) has been created with the following query
WITH turb AS (SELECT rid, rast, (ST_PixelAsPoints(rast)).val AS val
FROM grib_data
)
SELECT rid, rast INTO turb_mod
FROM turb WHERE val > 0.5;
The response when creating the table is "SELECT 53" indicating that the table turb_mod would now hold 53 rows
Problem
If I now try to return the raster data from turb_mod using the below query it returns all records from the original table, not the 53 that I am expecting
SELECT (ST_PixelAsPoints(rast)).x AS x FROM turb_mod;
Questions
Why does my query not return only the 53 records?
Is there a better way to create a table with a selection of raster points from the original table? I want to use the subset to apply further geospatial functions like spatial clustering.
In your final SELECT, you're calling the function ST_PixelAsPoints, which is a set-returning function. This results in an output row [being] generated for each element of the function's result set (reference), and can thus result in a different row count to that of your source table, turb_mod.
Your query is functionally equivalent to this (preferred) syntax:
SELECT points.x
FROM
turb_mod
JOIN LATERAL ST_PixelAsPoints(rast) points ON TRUE;
This syntax better shows what's happening, and also shows how you might choose to include more columns from the function's output, which may help to answer your second point.

Postgres get all fields that are not certain values (including nulls)

I'm looking to filter a table but, the number of queries i'm expecting differ from the result.
SELECT *
FROM table
WHERE name NOT IN ('matrix', 'filters')
That name column contains strings values and nulls. It seems like the nulls are being filtered out but, I would like them included in the result

Sum a field in a jsonb column with an Ecto query

Say I have a jsonb type column in a Postgres DB, called info. One of the fields is bytes, which is stored as an integer in the info field.
If I try and sum the values of the info => bytes field in an Ecto Query, as below:
total_bytes = Repo.one(from job in FilesTable,
select: sum(fragment("info->>'bytes'")))
I get the error function sum(text) does not exist.
Is there a way to write the query above so that info => bytes can be summed, or would I have to just select that field from each row in the database, and then use Elixir to add up the values?
The error message says that it can't sum a text field. You need to explicitly cast the field to an integer so that sum works.
Also, it's incorrect to hardcode a column name in a fragment. It only works in this case because you're selecting from only one table. If you had some join statements in there with other tables with the same column name, the query won't work. You can use ? in the string and then pass the column as argument.
Here's the final thing that should work:
sum(fragment("(?->>'bytes')::integer", job.info)))

Full column metadata for views?

If you query tables from dbc.columns, you will get full metadata for every column, especially data type, length, nullable/non-nullable, etc. When querying views, you only get the database, table, and column names. All the other fields are null.
If I have a view that's doing nothing but select * from table, it seems that the underlying table's metadata would propagate to the view when it's compiled to database objects. This makes sense even for calculated columns since my experiments have shown that Teradata analyzes all possible logic paths to determine a calculated column's type. Here's an example:
replace view mydb.testview as
select case when 1 = 1 then 'a' else 'aaaa' end a;
create table mydb.testviewtotable as (select * from mydb.testview) with data;
show table mydb.testviewtotable;
In that case statement, only the first condition will ever return true, so the result will always be 'a'. However, when you look at the table DDL, you can see that it calculates the column as VARCHAR(4) which proves that it analyzes all cases:
a VARCHAR(4) CHARACTER SET UNICODE NOT CASESPECIFIC
Therefore, it seems reasonable to assume that this view metadata exists somewhere even though querying that view through DBC results in nulls for all but the aforementioned columns.

Similarity in tsv column

I'm needing some help getting the SQL to work here in PostgreSQL 9.5.1 using pgAdminIII. What I have is a column status (datatype, text) of Facebook statuses in the format they were typed and another column status_tsv which stores a tsvector of the status column with stop words removed and the words stemmed.
I'd like to find similar statuses by comparing the similarity of the tsvector column in a self-join.
Thus far I have tried using a regexp_replace function combined with the pg_trgm similarity search to keep only the a-zA-Z character set in the tsvector column but this didn't worked as regexp_replace says it can't do tsvector columns so I've changed datatype of tsv column to text.
The problem now is that it only compares the similarity of the first word in each row and ignores the rest, obviously this is no use and I need it to compare the whole row.
My SQL just now looks like
`SELECT * FROM status_table AS x
JOIN status_table AS y
ON ST_Dwithin (x.geom54032, y.geom54032,5000)
WHERE status_similarity (x.tsvector_status, y.tsvector_status) > 0.7
AND x.status_id != y.status_id;`
The status_similarity does this `(regexp_replace(x.tsvector_status, '[^a-zA-Z]', '', 'g'), regexp_replace(y.tsvector_status, '[^a-zA-Z]', '', 'g')) which I'm sure keeps only the a-zA-Z from the tsvector_status column.
What must I changed to get this returning similar status'?