hypertable column name in where clause - nosql

I have application with 2 columns viz. city, category. I want to fit this in hypertable.
There is also id which I would like to add as ROW.
create table ads (city, category);
insert into ads values ("1", "city:mumbai", "1");
insert into ads values ("1", "category:cars", "1");
insert into ads values ("2", "city:pune", "1");
insert into ads values ("2", "category:bikes", "1");
My question is how do I fetch rows where city = mumbai which should fetch 2 rows of ROW = 1.
So suppose I would make similar query in MySQL..
select * from ads where city = "mumbai";
I will get 1 row having category=cars, city=mumbai and id=1..How can same be achived in hypertable query?
Thanks.

it seems to be a current limitation as writed in:
http://hypertable.com/documentation/reference_manual/hql/#select
when you query for column you can get as results only the column values.
they say:
(these limitations will be removed in future versions of Hypertable)
Column Value Predicate
When specifying a column value predicate, the column family must be identical to the column family used in the SELECT clause, and exactly one column family must be selected. The following examples are valid:
SELECT col FROM test WHERE col = "foo";
SELECT col FROM test WHERE col =^ "prefix";
The following examples are not valid because they select more than one column family or because the column family in the select clause is different from the one in the predicate (these limitations will be removed in future versions of Hypertable):
SELECT * FROM test WHERE col = "foo";
SELECT col, col2 FROM test WHERE col =^ "prefix";
SELECT foo FROM test WHERE bar = "value";

Related

Fast new row insertion if a value of a column depends on previous value in existing row

I have a table cusers with a primary key:
primary key(uid, lid, cnt)
And I try to insert some values into the table:
insert into cusers (uid, lid, cnt, dyn, ts)
values
(A, B, C, (
select C - cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 1
), now())
on conflict do nothing
Quite often (with the possibility of 98%) a row cannot be inserted to cusers because it violates the primary key constraint, so hard select queries do not need to be executed at all. But as I can see PostgreSQL first counts the select query as a result of dyn column and only then rejects row because of uid, lid, cnt violation.
What is the best way to insert rows quickly in such situation?
Another explanation
I have a system where one row depends on another. Here is an example:
(x, x, 2, 2, <timestamp>)
(x, x, 5, 3, <timestamp>)
Two columns contain an absolute value (2 and 5) and relative value (2, 5 - 2). Each time I insert new row it should:
avoid same rows (see primary key constraint)
if new row differs, it should count a difference and put it into the dyn column (so I take the last inserted row for the user according to the timestamp and subtract values).
Another solution I've found is to use returning uid, lid, ts for inserts and get user ids which were really inserted - this is how I know they have differences from existing rows. Then I update inserted values:
update cusers
set dyn = (
select max(cnt) - min(cnt)
from (
select cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 2) Table
)
where uid = A and lid = B and ts = TS
But it is not a fast approach either, as it seeks all over the ts column to find the two last inserted rows for each user. I need a fast insert query as I insert millions of rows at a time (but I do not write duplicates).
What the solution can be? May be I need a new index for this? Thanks in advance.

Handling of subselects in PostgreSQL

In the following example we want to find duplicates with different ids
-- table msg
create table msg(
id int primary key,
msg varchar(10));
insert into msg
values (1, 'single'),
(2, 'twice'),
(3, 'twice'),
(4, 'threefold'),
(5, 'threefold'),
(6, 'threefold');
select * from msg;
Counting the number of identical messages with different ids
First example:
select id, msg, (select count(*) from msg m1 where m1.msg = msg) as cnt
from msg;
=> wrong result since "msg" in subselect is identified with "m1.msg", not "msg.msg"
Second example:
select id, msg, (select count(*) from msg m1 where msg.msg = m1.msg) as cnt
from msg;
=> correct result
Third example (may be easier to understand):
select m1.id, m1.msg, (select count(*) from msg m2 where m1.msg = m2.msg) as cnt
from msg m1;
=> correct result
Question:
Is the behaviour of the first example conform with the SQL standard?
Always We should user table alias and refer the same with column name. Otherwise database will assume anyway so we wont get desired result.
referring table alias also solve some performance problem. Assume you have joined n tables and refer tables columns in select clause and where clause. When you execute the query, system will parse the query during that parsing it will check column availability in the respective tables. If you are not use alias name then it has to check all the table and find which table have that column. It may take time. To avoid that we have to refer the table alias name in respective columns. Also to avoid column ambiguously defined error
In subquery we always refer the table alias with the column

Replace rows based on a modified timestamp

I am looking for an efficient method (which I can reuse for similar situations) to drop rows which have been updated.
My table has many columns, but the important ones are:
creation_timestamp, id, last_modified_timestamp
My primary key is the creation_timestamp and the id. However, after and id has been created, it can be modified by other users which is indicated by the last_modified_timestamp.
1) Read a daily file and add any new rows (based on creation_timestamp and id)
2) Remove old rows which have a different last_modified_timestamp and replace them with the latest versions.
I typically do most of my operations with Pandas (python library) and pyscopg2, so I am not extremely familiar with PostgreSQL 9.6 which is the database I am using. My initial approach is to just add the last_modified_timestamp to the primary key, and then just use a view to SELECT DISTINCT based on the latest changes. However, it seems like that is 'cheating' and I will be wasting space since I do not need to retain previous versions.
EDIT:
def create_update_query(df, table=FACT_TABLE):
columns = ', '.join([f'{col}' for col in DATABASE_COLUMNS])
constraint = ', '.join([f'{col}' for col in PRIMARY_KEY])
placeholder = ', '.join([f'%({col})s' for col in DATABASE_COLUMNS])
updates = ', '.join([f'{col} = EXCLUDED.{col}' for col in DATABASE_COLUMNS])
query = f"""
INSERT INTO {table} ({columns})
VALUES ({placeholder})
ON CONFLICT ({constraint})
DO UPDATE SET {updates};"""
query.split()
query = ' '.join(query.split())
return query
def load_updates(df, connection=DATABASE):
conn = connection.get_conn()
cursor = conn.cursor()
df1 = df.where((pd.notnull(df)), None)
insert_values = df1.to_dict(orient='records')
for row in insert_values:
cursor.execute(create_update_query(df), row)
conn.commit()
cursor.close()
del cursor
conn.close()
This appears to work. I was running into some issues, so right now i am looping through each row of the DataFrame as a dictionary, then inserting that row. Also, I had to figure out a way to fill in the nan columns with None, because I was getting errors with Timestamp dtypes with blank values, etc.

Temporary Table Value into a Table-Value UDF

I was having some trouble with an SQL 2k sproc and which we moved to SQL 2k5 so we could used Table Value UDF's instead of Scalar UDF's.
This is simplified, but this is my problem.
I have a temporary table that I fill up with product information. I then pass that product information into a UDF and return the information back to my main results set. It doesn't seem to work.
Am I not allowed to pass a Temporary Table value into an CROSS APPLY'd Table Value UDF?
--CREATE AND FILL #brandInfo
SELECT sku, upc, prd_id, cp.customerPrice
FROM products p
JOIN #brandInfo b ON p.brd_id=b.brd_id
CROSS APPLY f_GetCustomerPrice(b.priceAdjustmentValue, b.priceAdjustmentAmount, p.Price) cp
--f_GetCUstomerPrice uses the AdjValue, AdjAmount, and Price to calculate users actual price
When I put dummy values in for b.priceAdjustmentValue and b.priceAdjustmentAmount it works great. But as soon as I try to load the temp table values in it bombs.
Msg 207, Level 16, State 1, Line 140
Invalid column name 'b.priceAdjustmentValue'.
Msg 207, Level 16, State 1, Line 140
Invalid column name 'b.priceAdjustmentAmount'.
Have you tried:
--CREATE AND FILL #brandInfo
SELECT sku, upc, prd_id, cp.customerPrice
FROM products p
JOIN #brandInfo b ON p.brd_id=b.brd_id
CROSS APPLY (
SELECT *
FROM f_GetCustomerPrice(b.priceAdjustmentValue, b.priceAdjustmentAmount, p.Price) cp
)
--f_GetCUstomerPrice uses the AdjValue, AdjAmount, and Price to calculate users actual price
Giving the UDF the proper context in order to resolve the column references?
EDIT:
I have built the following UDF in my local Northwind 2005 database:
CREATE FUNCTION dbo.f_GetCustomerPrice(#adjVal DECIMAL(28,9), #adjAmt DECIMAL(28,9), #price DECIMAL(28,9))
RETURNS TABLE
AS RETURN
(
SELECT Level = 'One', AdjustValue = #adjVal, AdjustAmount = #adjAmt, Price = #price
UNION
SELECT Level = 'Two', AdjustValue = 2 * #adjVal, AdjustAmount = 2 * #adjAmt, Price = 2 * #price
)
GO
And referenced it in the following query without issue:
SELECT p.ProductID,
p.ProductName,
b.CompanyName,
f.Level
FROM Products p
JOIN Suppliers b
ON p.SupplierID = b.SupplierID
CROSS APPLY dbo.f_GetCustomerPrice(p.UnitsInStock, p.ReorderLevel, p.UnitPrice) f
Are you certain that your definition of #brandInfo has the priceAdjustmentValue and priceAdjustmentAmount columns defined on it? More importantly, if you are putting this in a stored procedure as you mentioned, does there exist a #brandInfo table already without those columns defined? I know #brandInfo is a temporary table, but if it exists at the time you attempt to create the stored procedure and it lacks the columns, the parsing engine may be getting tripped up. Oddly, if the table doesn't exist at all, the parsing engine simply glides past the missing table and creates the SP for you.

Using two different rows from the same table in an expression

I'm using PostgreSQL + PostGIS.
In table I have a point and line geometry in the same column of the same table, in different rows. To get the line I run:
SELECT the_geom
FROM filedata
WHERE id=3
If i want to take point I run:
SELECT the_geom
FROM filedata
WHERE id=4
I want take point and line together, like they're shown in this WITH expression, but using a real query against the table instead:
WITH data AS (
SELECT 'LINESTRING (50 40, 40 60, 50 90, 30 140)'::geometry AS road,
'POINT (60 110)'::geometry AS poi)
SELECT ST_AsText(
ST_Line_Interpolate_Point(road, ST_Line_Locate_Point(road, poi))) AS projected_poi
FROM data;
You see in this example data comes from a hand-created WITH expression. I want take it from my filedata table. My problem is i dont know how to work with data from two different rows of one table at the same time.
One possible way:
A subquery to retrieve another value from a different row.
SELECT ST_AsText(
ST_Line_Interpolate_Point(
the_geom
,ST_Line_Locate_Point(
the_geom
,(SELECT the_geom FROM filedata WHERE id = 4)
)
)
) AS projected_poi
FROM filedata
WHERE id = 3;
Use a self-join:
SELECT ST_AsText(
ST_Line_Interpolate_Point(fd_road.the_geom, ST_Line_Locate_Point(
fd_road.the_geom,
fd_poi.the_geom
)) AS projected_poi
FROM filedata fd_road, filedata fd_poi
WHERE fd_road.id = 3 AND fd_poi.id = 4;
Alternately use a subquery to fetch the other row, as Erwin pointed out.
The main options for using multiple rows from one table in a single expression are:
Self-join the table with two different aliases as shown above, then filter the rows;
Use a subquery expression to get a value for all but one of the rows, as Erwin's answer shows;
Use a window function like lag() and lead() to get a row relative to the current row within the query result; or
JOIN on a subquery that returns a table
The latter two are more advanced options that solve problems that're difficult or inefficient to solve with the simpler self-join or subquery expression.