to_tsvector function with strings ending with numbers - postgresql

I am trying to vectorize below text column (ends with a number always) value using to_tsvector function, so i can use the vector for full text search.
sample data
this-is-news-1
to_tsvector function is resulting in below values
'1':5 'is':3 'news':4 'this':2 'this-is-news':1
But i want -1 to come along with the full string.
Expected data is
'1':5 'is':3 'news':4 'this':2 'this-is-news-1':1
How can i get this data.

Related

Oracle to_char numeric masking to postgres

I'm porting a procedure from Oracle to Postgres.
In select of a query, I have TO_CHAR(v_numeric, '990.000')
It seems, the same TO_CHAR(v_numeric, '990.000') works in Postgres with same result.
Can someone please explain what the '990.000' in the query does?
TO_CHAR(123.4, '990.000') returns 123.400 in both Oracle and Postgres. Whereas TO_CHAR(1234.400, '990.000') returns ######## in Oracle and ###.### in Postgres. Does this ######## and ###.### hold the same numeric value which is inputted?
to_char is a function to format a number as string for output. The PostgreSQL function is there expressly for Oracle compatibility, but it is not totally compatible, as you see.
The format 990.000 means that there will be one to three digits before the decimal point and three digits after it. 9 means that a value of 0 in that position will result in a blank rather than a 0.
The # characters signify that the number cannot be represented in that format. The reason is that there are more than three digits before the decimal point.
The resulting string does not "hold" a number, it is the rendering of a number as a string. It doesn't hold anything but the characters it consists of.

PostgreSQL Rolling Standard Deviation over time in single query

This may be an easily solvable question but I can't see an immediate solution. I am calling a PostgreSQL function which returns multiple columns, 2 of which are relevant to this question - a date column & a numeric field of return values. An example of the function call would be
SELECT curr_date, return_val
FROM schema.function_name($1,$2);
With example output such as
"2014-07-31";0.003767
"2014-08-07";-0.028531
"2014-08-14";0.020051
"2014-08-21";-0.003541
"2014-08-28";0.007766
"2014-09-04";-0.021926
"2014-09-11";0.026330
"2014-09-18";0.008137
"2014-09-25";-0.033303
"2014-10-02";0.030100
"2014-10-09";-0.012116
"2014-10-16";-0.017148
So on, so forth. The data will always return from this function with the dates ascending. What I would like to do is to use Postgres's stddev_samp function on every row, but only considering the return_value's from that row's date back in time. Something like:
SELECT curr_date, return_val,
--stddev_samp(return_val) where curr_date <= curr_date of current row
FROM schema.function_name($1,$2);
Naturally, if I calculated the sample deviation of the return_value's from 2014-07-31 to 2014-10-02 in the sample provided, it would differ slightly to calculating it using the result set from 2014-07-31 to any other date present. I know I could probably write another function which takes a numeric array as input and returns the standard deviation as output, and then call this in my query above, but I'm hoping someone may have a simpler approach which I'm just currently not seeing. If any other information is required, feel free to ask. I'm using version 10.7.
demo:db<>fiddle
Using window functions:
SELECT
stddev_samp(return_val) OVER(ORDER BY curr_date)
FROM
mytable

datenum and matrix column string conversion

I want to convert the second column of table T using datenum.
The elements of this column are '09:30:31.848', '15:35:31.325', etc. When I use datenum('09:30:31.848','HH:MM:SS.FFF') everything works, but when I want to apply datenum to the whole column it doesn't work. I tried this command datenum(T(:,2),'HH:MM:SS.FFF') and I receive this error message:
"The input to DATENUM was not an array of character vectors"
Here a snapshot of T
Thank you
You are not calling the data from the table, but rather a slice of the table (so its stays a table). Refer to the data in the table using T.colName:
times_string = ['09:30:31.848'; '15:35:31.325'];
T = table(times_string)
times_num = datenum(T.times_string, 'HH:MM:SS.FFF')
Alternatively, you can slice the table using curly braces to extract the data (if you want to use the column number instead of name):
times_num = datenum(T{:,2}, 'HH:MM:SS.FFF')

Logic to convert string of words to number

I am looking for a logic which will help me in coverting a string to number in teradata and hive.
It should be easily implementable in Tearadata as I dont have permission to deploy a UDF in TD. In hive if it is not simple I can easily write a UDF.
My requirement - Lets say I have columns sender_country, receiver country. I want to generate a number for concat('sender_country','_','receiver_country')
The number should always be same if the countries appear again.
Below is the illustration
UID sender_country receiver_country concat number
1 US UK US_UK 198760
2 FR IN FR_IN 146785
3 CH RU CH_RU 467892
4 US UK US_UK 198760
It should be in a way where all unique combinations of a country should have unique values. Like in above example US_US is repeated, it has same corresponding number.
I tried hashbucket(hashrow('concat')) in TD, but don't know its equivalent implementation in hive.
Similarly we have hash() function in hive, but don't have its equivalent function in TD.
I could not find any hash functions which returns similar values in TD and Hive too
You can simply convert each character into a number:
Ascii(Substr(sender_country,1,1))*1000000+
Ascii(Substr(sender_country,2,1))*10000+
Ascii(Substr(receiver_country,1,1))*100+
Ascii(Substr(receiver_country,2,1))
returns 85838575 for US,UK

Problems reading CSV in Octave

I have a .csv file and I can't read it on Octave. On R I just use the command below and everything is read alright:
myData <- read.csv("myData.csv", stringsAsFactors = FALSE)
However, when I go to Octave it doesn't do it properly with the below command:
myData = csvread('myData.csv',1,0);
When I open the file with Notepad, the data looks something like the below. Note there isn't a comma separating the last column name (i.e. Column3) from the first value (i.e. Value1) and the same thing happens with the last value of the first row (i.e. Value3) and the first value of the second row (i.e Value4)
Column1,Column2,Column3Value1,Value2,Value3Value4,Value5,Value6
The Column1 is meant for date values (with format yyyy-mm-dd hh:mm:ss), I don't know if that has anything to do with the problem.
Alex's answers already explains why csvread does not work for your case. That function only reads numeric data and returns an array. Since your fields are all strings, you need something that reads a csv file into a cell array.
That function is named csv2cell and is part of the io package.
As a separate note, if you plan to make operation with those dates, you may want to convert those dates as strings, into serial date numbers. This will allow you to put your dates in a numeric array which will allow for faster operations and reduced memory usage. Also, the financial package has many functions to deal with dates.
csvread only reads numeric data, so a date does not qualify unfortunately.
In Octave you might want to check out the dataframe package. In Matlab you would do readtable.
Otherwise there are also more primitive functions you can use like textscan.