Postgresql - converting text to ts_vector - postgresql

Sorry for the basic question.
I have a table with the following columns.
Column | Type | Modifiers
--------+---------+-----------
id | integer |
doc_id | bigint |
text | text |
I am trying to do text matching on the 'text' (3rd column)
I receive an error message when I try to text match on the text column. Saying that the string is too long for ts_vector.
I only want observations which contain the words "other events"
SELECT * FROM eightks\d
WHERE to_tsvector(text) ## to_tsquery('other_events')
I know that there are limitation to the length of the ts_vector.
Error Message
ERROR: string is too long for tsvector (2368732 bytes, max 1048575 bytes)
How do I convert the text column into a ts_vector and will this resolve my size limit problem?Alternatively, How do I exclude observations over the maximum size?
Postgres version 9.3.5.0
Here is the reference to the limit limit
Thanks

Related

How to convert postgres "double precision" to "numeric" without data loss/truncation

We have an existing column(type- double precision) in our postgres table and we want to convert the data type of that column to numeric, we've tried the below approaches but all of them had truncation/data loss on the last decimal positions.
directly converting to numeric
converting to numeric with precision and scale
converting to text and then to numeric
converting to text only
The data loss I mentioned looks like this for eg: if we have a value 23.291400909423828, then after altering the column datatype that value is converted to 23.2914009094238 resulting in loss of the last 2 decimal places.
note: This is happening only if the value has more than 13 decimals(values right to the decimal point)
One way to possibly do this:
show extra_float_digits ;
extra_float_digits
--------------------
3
create table float_numeric(number_fld float8);
insert into float_numeric values (21.291400909423828), (23.291400909422436);
select * from float_numeric ;
number_fld
--------------------
21.291400909423828
23.291400909422435
alter table float_numeric alter COLUMN number_fld type numeric using number_fld::text::numeric;
\d float_numeric
Table "public.float_numeric"
Column | Type | Collation | Nullable | Default
------------+---------+-----------+----------+---------
number_fld | numeric | | |
select * from float_numeric ;
number_fld
--------------------
21.291400909423828
23.291400909422435

Postgresql - Get MAX Numeric Value on Character Varying Column

I have a column in a Postgresql table that is unique and character varying(10) type. The table contains old alpha-numeric values that I need to keep. Every time a new row is created from this point forward, I want it to be numeric only. I would like to get the max numeric-only value from this table for this column then create a new row with that max value incremented by 1.
Is there a way to query this table for the max numeric value only for this column?
For example, if this column currently has the values:
1111
A1111A
1234
1234A
3331
B3332
C-3333
33-D33
3**333*
Is there a query that will return 3333, AKA cut out all the non-numeric characters from the values and then perform a MAX() on them?
Not precisely what you asking, but something that I think will work better for you.
To go over all the columns, convert each to numbers, and then cast it to integer & return max.:
SELECT MAX(regexp_replace(my_column, '[^0-9]', '', 'g')::int) FROM public.foobar;
This gets you your max value... say 2999.
Now, going forward, consider making the default for your column a serial-like value, and convert it to text... that way you set the "MAX" once, and then let postgres do all the work for future values.
-- create simple integer sequence
CREATE SEQUENCE public.foobar_my_column_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 0;
-- use new sequence as default value for column __and__ convert to text
ALTER TABLE foobar
ALTER COLUMN my_column
SET DEFAULT nextval('publc.foobar_my_column_seq'::regclass)::text;
-- initialize "next value" of sequence to whatever is larger than
-- what you already have in your data ... say 3000:
ALTER SEQUENCE public.foobar_my_column_seq RESTART WITH 3000;
Because you're simply setting default, you don't change your current alpha-numeric values.
I figured it out. The following query works.
select text_value, regexp_replace(text_value, '[^0-9]+', '') as new_value from the_table;
Result:
text_value | new_value
-----------------------+-------------
4*215474 | 4215474
740024 | 740024
4*100535 | 4100535
42356 | 42356
CASH |
4*215474 | 4215474
740025 | 740025
740026 | 740026
4*5089655798 | 45089655798
4*15680 | 415680
4*224034 | 4224034
4*265718708 | 4265718708

PostgreSQL text field loads as strL

I have data stored in PostgreSQL with the data type text.
When I load this data into Stata it has type strL, even if every string in a column is only one charter long. This takes up too much memory. I would like to continue using the text type in PostgreSQL.
Is there a way to specify that text data from PostgreSQL is loaded into Stata with type str8?
I also want numeric data to be loaded as numeric values so allstring is not a good solution. I would also like to avoid specifying data type on a column by column basis.
The command I use to load data into Stata is this:
odbc load, exec("SELECT * FROM mytable") <connect_options>
The file profile.do contains the following:
set odbcmgr unixodbc, permanently
set odbcdriver ansi, permanently
The file odbci.ini contains the following:
[database_name]
Debug = 0
CommLog = 0
ReadOnly = no
Driver = /usr/local/lib/psqlodbcw.so
Servername = <server>
FetchBufferSize = 99
Port = 5432
Database = postgres
In PosrgreSQL mytable looks like this:
postgres=# \d+ mytable
Table "public.mytable"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------+-----------+----------+--------------+-------------
c1 | text | | extended | |
c2 | text | | extended | |
postgres=# select * from mytable;
c1 | c2
----+-------
a | one
b | two
c | three
(3 rows)
In Stata mutable looks like this:
. describe
Contains data
obs: 3
vars: 2
size: 500
---------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------
c1 strL %9s
c2 strL %9s
---------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
I am using PostgreSQL v9.6.5 and Stata v14.2.
You can do this in Stata by compress-ing your data after you load the variables:
clear
input strL string
"My name is Pearly Spencer"
"I am a contributor on Stack Overflow"
"This is an example variable"
end
describe
Contains data
obs: 3
vars: 1
size: 355
------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------------
string strL %9s
------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
compress, nocoalesce
describe
Contains data
obs: 3
vars: 1
size: 108
------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------------
string str36 %36s
------------------------------------------------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
The option nocoalesce forces Stata to choose the appropriate length for the loaded string variables.

How to convert tsvector?

A typical and relevant application of tsvectot is to query and summarize information about the set of occurred words and about its frequency... And JSONB is the natural choice (!) to represent tsvectot datatype for these "querying applications"... So,
There are a simple workaround to cast tsvector into JSONB?
Example: counting global frequency of words of a cached tsvectot's, will be something like this query
SELECT r.key as word, SUM(r.value) as occurrences
FROM (
SELECT jsonb_each(kx_tsvectot::jsonb) as r FROM terms
) t
GROUP BY 1;
You can use ts_stat() function, which will give you exactly what you need
word text — the value of a lexeme
ndoc integer — number of documents (tsvectors) the word occurred in
nentry integer — total number of occurrences of the word
Example may be the following:
CREATE TABLE t (
tsv TSVECTOR
);
INSERT INTO t VALUES
('word'::TSVECTOR),
('second word'::TSVECTOR),
('third word'::TSVECTOR);
SELECT * FROM
ts_stat('SELECT tsv FROM t');
Result:
word | ndoc | nentry
--------+------+--------
word | 3 | 3
third | 1 | 1
second | 1 | 1
(3 rows)
If you still want to convert it to jsonb you can use cast word from text to jsonb.

Convert postgresql column from character varying to integer

I'm trying to change a column type from "character varying(15)" to an integer.
If I run "=#SELECT columnX from tableY limit(10);" I get back:
columnX
----------
34.00
12.00
7.75
18.50
4.00
11.25
18.00
16.50
If i run "=#\d+ columnX" i get back:
Column | Type | Modifiers | Storage | Description
columnX | character varying(15) | not null | extended |
I've searched high and low, asked on the postgresql irc channel, but no one could figure out how to change it, I've tried:
ALTER TABLE race_horserecord ALTER COLUMN win_odds TYPE integer USING (win_odds::integer);
Also:
ALTER TABLE tableY ALTER COLUMN columnX TYPE integer USING (trim("columnX")::integer);
Every time I get back:
"ERROR: invalid input syntax for integer: "34.00""
Any help would be appreciated.
Try USING (win_odds::numeric::integer).
Note that it will round your fractional values (e.g., '7.75'::numeric::integer = 8).