Exception due to insert query taking a long time - postgresql

I am Using the below mentioned query for insert operation and sometimes when the data is large it's taking long time for execution because of which I am getting exception error.I want to know how can I reduce the execution time for insert query for large data.
#"DO $$
DECLARE maxversionnumber integer;
BEGIN
select max(versionnumber) into maxversionnumber from reissue.holdingsclose;
IF maxversionnumber IS NULL THEN
select 1 into maxversionnumber;
ELSE
select max(versionnumber)+1 into maxversionnumber from reissue.holdingsclose;
END IF;
INSERT INTO reissue.holdingsclose (feedid,
filename,
asofdate,
returntypeid,
currencyid,
securityid,
indexid,
indexname,
securitycode,
sizemarkercode,
icbclassificationkey,
rgsclassificationkey,
securitycountry,
securitycurrency,
pricecurrency,
sizemarkerkey,
startmarketcap,
closemarketcap,
marketcapbeforeinvestability,
marketcapafterinvestability,
adjustmentfactor,
marketcapafteradjustmentfactor,
grossmarketcap,
netmarketcap,
xddividendmarketcap,
xddividendnetmarketcap,
openprice,
closeprice,
shares,
sharechg,
investabilityweight,
dividendyield,
pctindexweight,
growthfactor,
valuefactor,
marketcapgrowthfactor,
marketcapvaluefactor,
dailypriceperformance,
dailytotalperformance,
dailynetperformance,
insertiontime,
classificationnareitkey,
classificationnomurakey,
growthshares,
valueshares,
smid,
meena,
eurozone150flag,
euro40,
paneurope60,
top100flag,
top200flag,
r2500,
developedemerging,
europe,
europeexuk,
frontier,
g123,
gccflag,
gicsindustry,
methodologycode,
createdate,
createuserid,
modifieduser,
modifieddate,
reissueflag,
productcode,versionnumber)
SELECT feedid,
filename,
asofdate,
returntypeid,
currencyid,
securityid,
indexid,
indexname,
securitycode,
sizemarkercode,
icbclassificationkey,
rgsclassificationkey,
securitycountry,
securitycurrency,
pricecurrency,
sizemarkerkey,
startmarketcap,
closemarketcap,
marketcapbeforeinvestability,
marketcapafterinvestability,
adjustmentfactor,
marketcapafteradjustmentfactor,
grossmarketcap,
netmarketcap,
xddividendmarketcap,
xddividendnetmarketcap,
openprice,
closeprice,
shares,
sharechg,
investabilityweight,
dividendyield,
pctindexweight,
growthfactor,
valuefactor,
marketcapgrowthfactor,
marketcapvaluefactor,
dailypriceperformance,
dailytotalperformance,
dailynetperformance,
insertiontime,
classificationnareitkey,
classificationnomurakey,
growthshares,
valueshares,
smid,
meena,
eurozone150flag,
euro40,
paneurope60,
top100flag,
top200flag,
r2500,
developedemerging,
europe,
europeexuk,
frontier,
g123,
gccflag,
gicsindustry,
methodologycode,
createdate,
createuserid,
modifieduser,
modifieddate,
reissueflag,
productcode
,maxversionnumber as versionnumber
FROM fct.holdingsclose
WHERE feedid =" + feedId + " ;" +
"DELETE FROM fct.holdingsclose WHERE feedid =" + feedId + " ;" + "END $$;";

Related

Union in JPA and using Alias

I have 2 entites, PERSON and PEOPLE
PERSON has columns namely -> FN, LN
PEOPLE has columns -> FIRST, LAST
I want to fetch the union of both the tables together. How do I do it in JPA.
I have used the below way:
Created a new DTO -> Human having 2 fields FIRSTNAME and LASTNAME (case sensitive)
#Query(value=" SELECT FIRSTNAME, LASTNAME FROM "+
"( SELECT "+
" P.FN AS FIRSTNAME, "+
" P.LN AS LASTNAME " +
" FROM PERSON P"+
" UNION "+
" SELECT "+
" A.FIRST AS FIRSTNAME, "+
" A.LAST AS LASTNAME ' +
" FROM PEOPLE A"+
")", nativeQuery = true)
Pageable<Human> getEntireHumanRace() {
....
}
The SQL runs fine, but the ORM always forms a malformed SQL
such as "Syntax error in SQL statement SELECT COUNT(P) FROM PERSON P ...."
InvalidDataAccessResourceUsageException: could not prepare statement
Is there any suggestion on what can be done? Why does it put the count in front of the generated query?
Appreciate in advance.
Why does it put the count in front of the generated query?
Because you are trying to get data with pagination (Pageable). So for total element count Count query executing.
Is there any suggestion on what can be done?
You are using class-based projection and use List<Human>
#Query(value="SELECT FIRSTNAME, LASTNAME FROM ...")
List<Human> getEntireHumanRace();

AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users

Hope you're all okay.
We hit this limit quite often. We know there is no way to up the 500 limit of concurrent user connections in Redshift. We also know certain views (pg_user_info) provide info as to the user's actual limit.
We are looking for some answers not found in this forum plus any guidance based on your experience.
Questions:
Does recreation of the cluster with bigger EC2 instances, would yield a higher limit value?
Does adding new nodes to the existing cluster would yield a higher limit value?
From the app development perspective: What specific strategies/actions you'd recommend in order to spot or predict a situation whereby this limit will be hit?
Txs - Jimmy
Okay folks.
thanks to all who answered.
I posted a support ticket in AWS and this is the recommendation, pasting all here, it's long but I hope it works for many people running into this issue. The idea is to catch the situation before it happens:
To monitor the number of connections made to the database, you can create a cloudwatch alarm based on the Database connections metrics that will trigger a lambda function when a certain threshold is reached. This lambda function can then terminate idle connections by calling a procedure that terminates idle connections.
Please find the query that creates a procedure to log and terminate long running inactive sessions
:
1. Add view to get all current inactive sessions in the cluster
CREATE OR REPLACE VIEW inactive_sessions as (
select a.process,
trim(a.user_name) as user_name,
trim(c.remotehost) as remotehost,
a.usesysid,
a.starttime,
datediff(s,a.starttime,sysdate) as session_dur,
b.last_end,
datediff(s,case when b.last_end is not null then b.last_end else a.starttime end,sysdate) idle_dur
FROM
(
select starttime,process,u.usesysid,user_name
from stv_sessions s, pg_user u
where
s.user_name = u.usename
and u.usesysid>1
and process NOT IN (select pid from stv_inflight where userid>1
union select pid from stv_recents where status != 'Done' and userid>1)
) a
LEFT OUTER JOIN (
select
userid,pid,max(endtime) as last_end from svl_statementtext
where userid>1 and sequence=0 group by 1,2) b ON a.usesysid = b.userid AND a.process = b.pid
LEFT OUTER JOIN (
select username, pid, remotehost from stl_connection_log
where event = 'initiating session' and username <> 'rsdb') c on a.user_name = c.username AND a.process = c.pid
WHERE (b.last_end > a.starttime OR b.last_end is null)
ORDER BY idle_dur
);
2. Add table for logging information about long running transactions that was terminated
CREATE TABLE IF NOT EXISTS terminated_inactive_sessions (
process int,
user_name varchar(50),
remotehost varchar(50),
starttime timestamp,
session_dur int,
idle_dur int,
terminated_on timestamp DEFAULT GETDATE()
);
3. Add procedure to log and terminate any inactive transactions running for longer than 'n' amount of seconds
CREATE OR REPLACE PROCEDURE terminate_and_log_inactive_sessions (n INTEGER)
AS $$
DECLARE
expired RECORD ;
BEGIN
FOR expired IN SELECT process, user_name, remotehost, starttime, session_dur, idle_dur FROM inactive_sessions where idle_dur >= n
LOOP
EXECUTE 'INSERT INTO terminated_inactive_sessions (process, user_name, remotehost, starttime, session_dur, idle_dur) values (' || expired.process || ' , ''' || expired.user_name || ''' , ''' || expired.remotehost || ''' , ''' || expired.starttime || ''' , ' || expired.session_dur || ' , ' || expired.idle_dur || ');';
EXECUTE 'SELECT PG_TERMINATE_BACKEND(' || expired.process || ')';
END LOOP ;
END ;
$$ LANGUAGE plpgsql;
4. Execute the procedure by running the following command:
call terminate_and_log_inactive_sessions(100);
Here is a sample lambda function that attempts to close idle connections by querying the view 'inactive_sessions' created above, which you can use as a reference.
#Current time
now = datetime.datetime.now()
query = "SELECT process, user_name, session_dur, idle_dur FROM inactive_sessions where idle_dur >= %d"
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
try:
conn = psycopg2.connect("dbname=" + db_database + " user=" + db_user + " password=" + db_password + " port=" + db_port + " host=" + db_host)
conn.autocommit = True
except:
logger.error("ERROR: Unexpected error: Could not connect to Redshift cluster.")
sys.exit()
logger.info("SUCCESS: Connection to RDS Redshift cluster succeeded")
with conn.cursor() as cur:
cur.execute(query % (session_idle_limit))
row_count = cur.rowcount
if row_count >=1:
result = cur.fetchall()
for row in result:
print("terminating session with pid %s that has been idle for %d seconds at %s" % (row[0],row[3],now))
cur.execute("SELECT PG_TERMINATE_BACKEND(%s);" % (row[0]))
conn.close()
else:
conn.close()
As you said this is a hard limit in Redshift and there is no way to up it. Redshift is not a high concurrency / high connection database.
I expect that if you need the large data analytic horsepower of Redshift you can get around this with connection sharing. Pgpool is a common tool for this.

Using IndexOf and/Or Substring to parse data into separate columns

I am working on migrating data from one database to another for a hospital. In the old database, the doctor's specialty IDs are all in one column (swvar_specialties), each separated by commas. In the new database, each specialty ID will have it's own column (example: Specialty1_PrimaryID, Specialty2_PrimaryID, Specialty3_PrimaryID, etc). I am trying to export the data out of the old database and separate these into these separate columns. I know I can use indexof and substring to do this - I just need help with the syntax.
So this query:
Select swvar_specialties as Specialty1_PrimaryID
From PhysDirectory
might return results similar to 39,52,16. I need this query to display Specialty1_PrimaryID = 39, Specialty2_PrimaryID = 52, and Specialty3_PrimaryID = 16 in the results. Below is my query so far. I will eventually have a join to pull the specialty names from the specialties table. I just need to get this worked out first.
Select pd.ref as PrimaryID, pd.swvar_name_first as FirstName, pd.swvar_name_middle as MiddleName,
pd.swvar_name_last as LastName, pd.swvar_name_suffix + ' ' + pd.swvar_name_degree as NameSuffix,
pd.swvar_birthdate as DateOfBirth,pd.swvar_notes as AdditionalInformation, 'images/' + '' + pd.swvar_photo as ImageURL,
pd.swvar_philosophy as PhilosophyOfCare, pd.swvar_gender as Gender, pd.swvar_specialties as Specialty1_PrimaryID, pd.swvar_languages as Language1_Name
From PhysDirectory as pd
The article Split function equivalent in T-SQL? provides some details on how to use a split function to split a comma-delimited string.
By modifying the table-valued function in presented in this article to provide an identity column we can target a specific row such as Specialty1_PrimaryID:
/*
Splits string into parts delimitered with specified character.
*/
CREATE FUNCTION [dbo].[SDF_SplitString]
(
#sString nvarchar(2048),
#cDelimiter nchar(1)
)
RETURNS #tParts TABLE (id bigint IDENTITY, part nvarchar(2048) )
AS
BEGIN
if #sString is null return
declare #iStart int,
#iPos int
if substring( #sString, 1, 1 ) = #cDelimiter
begin
set #iStart = 2
insert into #tParts
values( null )
end
else
set #iStart = 1
while 1=1
begin
set #iPos = charindex( #cDelimiter, #sString, #iStart )
if #iPos = 0
set #iPos = len( #sString )+1
if #iPos - #iStart > 0
insert into #tParts
values ( substring( #sString, #iStart, #iPos-#iStart ))
else
insert into #tParts
values( null )
set #iStart = #iPos+1
if #iStart > len( #sString )
break
end
RETURN
END
Your query can the utilise this split function as follows:
Select
pd.ref as PrimaryID,
pd.swvar_name_first as FirstName,
pd.swvar_name_middle as MiddleName,
pd.swvar_name_last as LastName,
pd.swvar_name_suffix + ' ' + pd.swvar_name_degree as LastName,
pd.swvar_birthdate as DateOfBirth,pd.swvar_notes as AdditionalInformation,
'images/' + '' + pd.swvar_photo as ImageURL,
pd.swvar_philosophy as PhilosophyOfCare, pd.swvar_gender as Gender,
(Select part from SDF_SplitString(pd.swvar_specialties, ',') where id=1) as Specialty1_PrimaryID,
(Select part from SDF_SplitString(pd.swvar_specialties, ',') where id=2) as Specialty2_PrimaryID,
pd.swvar_languages as Language1_Name
From PhysDirectory as pd

How to keep the following TSQL query from running my server at 100%?

The following queries run in an sproc targeting the ItemData table in SQL Server 2008R2:
SELECT TOP(500) ItemListID, GeoCity, GeoState, GeoDisplay, Title, Link, Description, CleanDescription, OptimizedDescription, PubDateParsed, ImageBytes, DateAdded FROM ( SELECT TOP(500) ItemListID, GeoCity, GeoState, GeoDisplay, Title, Link, Description, CleanDescription, OptimizedDescription, PubDateParsed, ImageBytes, DateAdded, ROW_NUMBER()
OVER( ORDER BY ItemListID DESC )
AS RowNumber
FROM ItemData
WHERE CONTAINS(Title, #FTSSearchTerm ) -- ' + #OriginalSearchTerm + '"')
AND ( WebsiteID=1 AND
(#GeoCity = '-1' OR GeoCity = #GeoCity) AND
(#GeoState = '-1' OR GeoState = #GeoState) )
) ItemData WHERE RowNumber >= ( #PageNum - 1) * #PageSize + 1 AND RowNumber <= #PageNum * #PageSize ORDER BY ItemListID DESC
SELECT #NumberOfResultsReturned = ##ROWCOUNT
SELECT #ActualNumberOfResults = COUNT(*) FROM ItemData WHERE CONTAINS(Title, #FTSSearchTerm ) -- ' + #OriginalSearchTerm + '"') AND ( WebsiteID=1 AND (#GeoCity = '-1' OR GeoCity = #GeoCity) AND (#GeoState = '-1' OR GeoState = #GeoState) )
Depending on the data the query uses either CONTAINS or FREETEXT.
With load this query runs very slow and peeks the server at 100%.
I have set the following indexes:
What do I need to do so these queries stop running so hot?
Thanks.
-- UPDATE --
The table has one clustered index which only consists of ItemListID, and FTS on Title and Description.
I have added a non-clustered index (incorrectly named in the Identity name) as follows:
Without actually looking into the execution plan, it seems like you need a non-clustered index on Title, GeoCity, GeoState, and WebsiteID with the following include columns: ItemListID, GeoDisplay, Link, Description, CleanDescription, OptimizedDescription, PubDateParsed, ImageBytes, DateAdded
This will allow the execution plan to use the one non-clustered index that contains all of the information you are looking for in this query. Without it, it will use one of the indexes you showed and still have to go to the table to get the data you need.
This won't totally fix your problem though, depending on how much data is in your table, doing the Contains on Title to do searching will always be expensive. It would be best if you could leverage full text searching to do the searching portion.
Hopefully this helps!

T-SQL if exists

I am summer intern new to T-SQL and I have to run an sql select statement on various databases. What I would like to do is use 'if exists' to keep an error from occuring because some of the databases on the list to have this statement executed on no longer exist. However, I cannot figure out how to apply it to my statement. Any help would be greatly appreciated. Below is the statment me and another intern wrote:
select distinct mg.MatterName, mg.ClientNumber, mg.MatterNumber,grp.groupName as SecurityGroup
from (select distinct mat.matterName, mat.clientNumber, mat.matterNumber, usr.GroupID
from <db_name>.dbo.matter mat
inner join <db_name>.dbo.usrAccount usr
on usr.NTlogin=mat.matterCreateBy) as mg
inner join <db_name>.dbo.usrGroup grp
on mg.groupID=grp.groupID
order by matterName
the < db_name> is where the passed in parameter that is the name of the database, would go.
You could use sp_MSforeachdb to enumerate all of the databases on the instance.
This would be similar to:
exec sp_MSforeachdb 'select distinct mg.MatterName, mg.ClientNumber, mg.MatterNumber,grp.groupName as SecurityGroup from (select distinct mat.matterName, mat.clientNumber, mat.matterNumber, usr.GroupID from ?.dbo.matter mat inner join ?.dbo.usrAccount usr on usr.NTlogin=mat.matterCreateBy) as mg inner join ?.dbo.usrGroup grp on mg.groupID=grp.groupID order by matterName'
Alternatively, you could use dynamic sql to manufacture a script:
select 'use ' + name + ';' + char(13) + 'select distinct mg.MatterName, mg.ClientNumber, mg.MatterNumber,grp.groupName as SecurityGroup' +CHAR(13) + 'from (select distinct mat.matterName, mat.clientNumber, mat.matterNumber, usr.GroupID' + char(13) + 'from dbo.matter mat' + char(13) + 'inner join dbo.usrAccount usr on usr.NTlogin=mat.matterCreateBy) as mg' + char(13) + 'inner join dbo.usrGroup grp on mg.groupID=grp.groupID' + CHAR(13) + 'order by matterName;'
from master.sys.databases where database_id>4
If you redirect your output to "Results to Text" in SSMS then run the script, you will see a script written that you can then put into a query editor to execute.
I got it to work. I think this is a bit hackey but what I did was catch the exception thrown and just change a label on the page to reflect that the database doesnt exist.
DataAccess dal = new DataAccess();
dal.SelectedConnectionString = "WebServer08";
String exNetName = Request.QueryString["name"];
if (exNetName != null && !exNetName.Equals(""))
{
try
{
gvMatters.DataSource = dal.GetMatters(exNetName);
gvMatters.DataBind();
}
catch (Exception ex)
{
noDB.Text = "This database doesn't exist.";
gvMatters.Visible = false;
}
}
And i just left the SQL statement the way it was rather than try to screw around with it