I am trying to do log table in dynamodb and my table looks like
Pid[HashKey] || TableName[SecondaryIndex] || CreateDate[RangeKey] || OldValue || NewValue
10 || Product || 10.10.2013 00:00:01 || Shoe || Skirt
10 || Product || 10.10.2013 00:00:02 || Skirt || Pant
11 || ProductCategory || 10.10.2013 00:00:01 || Shoes || Skirts
19 || ProductCategory || 10.10.2013 00:00:01 || Tables || Armchairs
Pid = My main db tables primary key
TableName = My main db table names
CreateDate = Row created date
now I want to get list of
where (Pid = 10 AND TableName = "Product") OR (Pid = 11 AND
TableName="ProductCategory")
in a single request (it wouldn't be so short like this. It could include too many tables and pids)
I tried batchget but I didn't use it because I couldn't query with secondary index. It needs rangekey with equal operator.
I tried query but this time I couldn't send multiple hash key in a same query.
Any ideas or successions?
Thank you.
The problem here is the OR .... Generally you cannot get this where condition with a single query operation without modifying your row,
Solution 1: You have to issue 2 query operations, and append them to the same resultset.
where (Pid = 10 AND TableName = "Product")
union
where (Pid = 11 AND TableName = "ProductCategory")
Those operations should run in parallel to optimize performance.
Solution2: create a field xxx that describe your condition and maintain it on writes, than
you could create a global secondary index on it and perform a single query.
Related
I need to count how many records in the tableA are not in the tableA, how to do this with LINQ?
with SQL I do the following way
select count(*) as total from produtoitemgrade g
where g.id not in (select idprodutograde from produtoestoque where idProduto = 12)
and g.idProduto = 12
my linq code so far.
var temp = (from a in Produtoitemgrades
join b in Produtoestoques on a.IdUnico equals b.IdUnicoGrade into g1
where g1.Count(y => y.IdProduto == 12)>0 && !g1.Any()
select a).ToList();
I tried to follow that example LINQ get rows from a table that don't exist in another table when using group by?
but an error occurs when running, how can I do this?
Thanks!
Your query should looks like the following, if you want to have the same SQL execution plan:
var query =
from a in Produtoitemgrades
where !Produtoestoques.Where(b => a.IdUnico == b.IdUnicoGrade && b.idProduto == 12).Any()
&& a.idProduto == 12
select a;
var result = query.Count();
Hope you're all okay.
We hit this limit quite often. We know there is no way to up the 500 limit of concurrent user connections in Redshift. We also know certain views (pg_user_info) provide info as to the user's actual limit.
We are looking for some answers not found in this forum plus any guidance based on your experience.
Questions:
Does recreation of the cluster with bigger EC2 instances, would yield a higher limit value?
Does adding new nodes to the existing cluster would yield a higher limit value?
From the app development perspective: What specific strategies/actions you'd recommend in order to spot or predict a situation whereby this limit will be hit?
Txs - Jimmy
Okay folks.
thanks to all who answered.
I posted a support ticket in AWS and this is the recommendation, pasting all here, it's long but I hope it works for many people running into this issue. The idea is to catch the situation before it happens:
To monitor the number of connections made to the database, you can create a cloudwatch alarm based on the Database connections metrics that will trigger a lambda function when a certain threshold is reached. This lambda function can then terminate idle connections by calling a procedure that terminates idle connections.
Please find the query that creates a procedure to log and terminate long running inactive sessions
:
1. Add view to get all current inactive sessions in the cluster
CREATE OR REPLACE VIEW inactive_sessions as (
select a.process,
trim(a.user_name) as user_name,
trim(c.remotehost) as remotehost,
a.usesysid,
a.starttime,
datediff(s,a.starttime,sysdate) as session_dur,
b.last_end,
datediff(s,case when b.last_end is not null then b.last_end else a.starttime end,sysdate) idle_dur
FROM
(
select starttime,process,u.usesysid,user_name
from stv_sessions s, pg_user u
where
s.user_name = u.usename
and u.usesysid>1
and process NOT IN (select pid from stv_inflight where userid>1
union select pid from stv_recents where status != 'Done' and userid>1)
) a
LEFT OUTER JOIN (
select
userid,pid,max(endtime) as last_end from svl_statementtext
where userid>1 and sequence=0 group by 1,2) b ON a.usesysid = b.userid AND a.process = b.pid
LEFT OUTER JOIN (
select username, pid, remotehost from stl_connection_log
where event = 'initiating session' and username <> 'rsdb') c on a.user_name = c.username AND a.process = c.pid
WHERE (b.last_end > a.starttime OR b.last_end is null)
ORDER BY idle_dur
);
2. Add table for logging information about long running transactions that was terminated
CREATE TABLE IF NOT EXISTS terminated_inactive_sessions (
process int,
user_name varchar(50),
remotehost varchar(50),
starttime timestamp,
session_dur int,
idle_dur int,
terminated_on timestamp DEFAULT GETDATE()
);
3. Add procedure to log and terminate any inactive transactions running for longer than 'n' amount of seconds
CREATE OR REPLACE PROCEDURE terminate_and_log_inactive_sessions (n INTEGER)
AS $$
DECLARE
expired RECORD ;
BEGIN
FOR expired IN SELECT process, user_name, remotehost, starttime, session_dur, idle_dur FROM inactive_sessions where idle_dur >= n
LOOP
EXECUTE 'INSERT INTO terminated_inactive_sessions (process, user_name, remotehost, starttime, session_dur, idle_dur) values (' || expired.process || ' , ''' || expired.user_name || ''' , ''' || expired.remotehost || ''' , ''' || expired.starttime || ''' , ' || expired.session_dur || ' , ' || expired.idle_dur || ');';
EXECUTE 'SELECT PG_TERMINATE_BACKEND(' || expired.process || ')';
END LOOP ;
END ;
$$ LANGUAGE plpgsql;
4. Execute the procedure by running the following command:
call terminate_and_log_inactive_sessions(100);
Here is a sample lambda function that attempts to close idle connections by querying the view 'inactive_sessions' created above, which you can use as a reference.
#Current time
now = datetime.datetime.now()
query = "SELECT process, user_name, session_dur, idle_dur FROM inactive_sessions where idle_dur >= %d"
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
try:
conn = psycopg2.connect("dbname=" + db_database + " user=" + db_user + " password=" + db_password + " port=" + db_port + " host=" + db_host)
conn.autocommit = True
except:
logger.error("ERROR: Unexpected error: Could not connect to Redshift cluster.")
sys.exit()
logger.info("SUCCESS: Connection to RDS Redshift cluster succeeded")
with conn.cursor() as cur:
cur.execute(query % (session_idle_limit))
row_count = cur.rowcount
if row_count >=1:
result = cur.fetchall()
for row in result:
print("terminating session with pid %s that has been idle for %d seconds at %s" % (row[0],row[3],now))
cur.execute("SELECT PG_TERMINATE_BACKEND(%s);" % (row[0]))
conn.close()
else:
conn.close()
As you said this is a hard limit in Redshift and there is no way to up it. Redshift is not a high concurrency / high connection database.
I expect that if you need the large data analytic horsepower of Redshift you can get around this with connection sharing. Pgpool is a common tool for this.
When running the following query on Postgres 10 using a connection pool in a concurrent manner, the results return in n * the length of time it takes to run one invocation.
Tested in Racket using threads and Node with promises. Both return ~ the same timings in milliseconds.
1 invocation:
8322
10 invocations:
82432, 82260, 82025, 82260, 82432, 82103, 82025, 82556, 82040, 82119
Here is the query:
WITH contact_searches as (select
md.geo_point as md_point, md.sf_id as md_name, md.sf_id as md_sf_id,
md.freehold_tenure as md_fh_ten,
md.freehold_search_price md_fh_sp, md.leasehold_tenure as md_lh_ten,
md.leasehold_search_price as md_lh_sp,
md.special_tenure_search_price as md_st_sp, md.geo_point as point,
ss.freehold_tenure as ss_fh_ten, ss.leasehold_tenure as ss_lh_ten,
ss.price_from as ss_pf, ss.price_to as ss_pt, md.marketing_trades as md_trades,
ss.trades as ss_trades,
ss.sf_id as ss_sf_id, ss.person_id as person_id, ss.areas as areas
from saved_searches ss
inner join marketing_details md on md.marketing_trades && ss.trades
where md.sf_id = ANY ($1)
and
((md.freehold_tenure is not null and ss.freehold_tenure ='true'
and (md.freehold_search_price is null or md.freehold_search_price >= ss.price_from)
and (md.freehold_search_price is null or md.freehold_search_price <= ss.price_to))
or
(md.leasehold_tenure is not null and ss.leasehold_tenure ='true'
and (md.leasehold_search_price is null or md.leasehold_search_price >= ss.price_from)
and (md.leasehold_search_price is null or md.leasehold_search_price <= ss.price_to))
or
(md.special_tenure_search_price is not null
and (md.special_tenure_search_price >= ss.price_from)
and (md.special_tenure_search_price <= ss.price_to))
)
), contact_ids as (select distinct person_id as sf_id
from contact_searches inner join areas a on st_contains(a.polygon, contact_searches.point)
where a.name = ANY (contact_searches.areas))
select c.sf_id, hasoptedoutofemail, preferred_contact_method, cco_email_opt_out, cf_email_opt_out, valid_email(c)
from contacts c inner join contact_ids on c.sf_id = contact_ids.sf_id where c.applicant_status = 'Live'
Does anyone have any insight as to whether complex queries don't run in parralell or whether there is a config change I could make or anything else that could help?
I have a table with an integer column. It has 12 records numbered 1000 to 1012. Remember, these are ints.
This query returns, as expected, 12 results:
select count(*) from proposals where qd_number::text like '%10%'
as does this:
SELECT COUNT(*) FROM "proposals" WHERE (lower(first_name) LIKE '%10%' OR qd_number::text LIKE '%10%' )
but this query returns 2 records:
SELECT COUNT(*) FROM "proposals" WHERE (lower(first_name) || ' ' || qd_number::text LIKE '%10%' )
which implies using || in concatenated where expressions is not equivalent to using OR. Is that correct or am I missing something else here?
You probably have nulls in first_name. For these records (lower(first_name) || ' ' || qd_number::text results in null, so you don't find the numbers any longer.
using || in concatenated where expressions is not equivalent to using ORIs that correct or am I missing something else here?
That is correct.
|| is the string concatenation operator in SQL, not the OR operator.
I create view in the postgresql that for write query but error message display table name used more than once. How to solve this problem?
Query
SELECT
tbcitizen.firstname || '-' || tbcitizen.middlename || '-' || tbcitizen.familyname as firstname,
tbcitizen.dateofbirth,
tbcity.cityname,
tbcontact.contactdetails,
tbcitizen.citizenidp
FROM
public.tbcitizen,
public.tbaddress,
public.tbcity,
public.tbcontact
INNER JOIN
tbcontact ON tbcitizen.citizenidp = tbcontact.referenceidf AND tbcontact.referencetypeidf = 1 AND
tbcontact.isprimery = 1 INNER JOIN
tbaddress ON tbcitizen.citizenidf = tbaddress.referenceidf AND tbaddress.referencetypeidf = 1 AND
tbaddress.isprimery = 1 INNER JOIN
tbcity ON tbaddress.cityidf = tbcity.cityidp
WHERE
tbaddress.referencetypeidf = tbcitizen.citizenidp AND
tbaddress.referenceidf = tbcitizen.citizenidp AND
tbaddress.cityidf = tbcity.cityidp
Error = table name "tbcontact" specified more than once
Thanks
As the error states that - your table tbcontact used 2 times as the source table. So it creates ambiguity for the postgres database engine. So to resolve this issue you have to use table alias with different name.
Hope it'll help you.