Improve performance on native sql query - postgresql

I'm trying to build a location-based application in Java EE, which returns providers from a certain radius given coordinates as input. As I use JPA, I couldn't find a solution to querying the database with a named query as it doesn't support functions like Sin, Cos etc. I found a nice read here, sql-haversine, where I could use a native query instead. The query executes fast (33 ms) on the database (Postgres), but very slow when executed from the application (> 800 ms).
Is there a way to speed things up, from the application end, or do you have to use something like a stored procedure in the database.
My method looks like this:
#SuppressWarnings("unchecked")
public List<Company> findCompaniesByProximity(Coordinate coordinate, int distance) {
double latitude = coordinate.getLatitude();
double longitude = coordinate.getLongitude();
String sql = "SELECT * FROM ("
+ "SELECT *, c.distance_unit * DEGREES(ACOS(COS(RADIANS(c.lat)) * COS(RADIANS(companies.latitude)) * "
+ "COS(RADIANS(c.lng) - RADIANS(companies.longitude)) + SIN(RADIANS(c.lat)) * SIN(RADIANS(companies.latitude)))"
+ ") AS distance "
+ "FROM Companies "
+ "JOIN (SELECT ?1 AS lat, ?2 AS lng, ?3 AS search_radius, 111.045 AS distance_unit) "
+ "AS c ON 1=1 "
+ "WHERE companies.latitude "
+ "BETWEEN c.lat - (c.search_radius / c.distance_unit) "
+ "AND c.lat + (c.search_radius / c.distance_unit) "
+ "AND companies.longitude "
+ "BETWEEN c.lng - (c.search_radius / (c.distance_unit * COS(RADIANS(c.lat)))) "
+ "AND c.lng + (c.search_radius / (c.distance_unit * COS(RADIANS(c.lat))))"
+ ") AS proximity "
+ "WHERE distance <= search_radius "
+ "ORDER BY distance "
+ "LIMIT 25";
List<Company> companies = null;
try {
Query query = em.createNativeQuery(sql, Company.class);
query.setParameter(1, latitude);
query.setParameter(2, longitude);
query.setParameter(3, distance / 1000.0);
companies = (List<Company>) query.getResultList();
} catch (IllegalArgumentException e) {
System.err.println("Argument is invalid " + e);
} catch (PersistenceException e) {
System.err.println("PersistenceException: " + e.getMessage());
}
return companies;
}
The method is called from a singleton EJB, and I'm using Payara server with Postgres and EclipseLink. Everything is called locally so I thought the connection to database would be much faster. Iv'e also tried the postgres earthdistance extension, but it was even slower ( > 1800 ms). I've quite new to programming and especially Java EE so I might have done something wrong along the way :)
Thanks, in advance.

Related

Project taking too long to build and run after adding complex query

I have integrated SQLite.swift framework in one of my swift project and everything was working fine until i added the below query. After adding the below query project is taking too long to build. I waited for 30 mins but still project is n't complied.
do
{
let stmt = try DB!.prepare ("SELECT e." + ENDPOINT_ID + " as _id, lk." + HUB_ID + ", e." + X_ENDPOINT_ID + ", e." + ENDPOINT_DESC + ", e." + ENDPOINT_TYPE_ID +", et." + ENDPOINT_STATUS_MIN + ", et." + ENDPOINT_STATUS_MAX + ", e." + ENDPOINT_STATUS + " FROM " + TABLE_ENDPOINT + " as e INNER JOIN " + TABLE_ENDPOINT_TYPE + " as et " +
" ON e." + ENDPOINT_TYPE_ID + " = et." + ENDPOINT_TYPE_ID +
" INNER JOIN " + TABLE_LINKING + " as lk " +
" ON e." + ENDPOINT_ID + " = lk." + ENDPOINT_ID +
" INNER JOIN " + TABLE_NODE + " as n " +
" ON lk." + NODE_ID + " = n." + NODE_ID +
" INNER JOIN " + TABLE_NODE_TYPE + " as nt " +
" ON n." + NODE_TYPE_ID + " = nt." + NODE_TYPE_ID +
" WHERE lk." + SECTION_ID + "=" + section_Id +
" AND nt." + NODE_CATEGORY + " = "S" " +
" ORDER BY e." + ENDPOINT_ID + " ASC")
let arr = Array(try stmt.run)
print("\(arr)")
return arr
} catch {
print("failed: \(error)")
return []
}
If i comment the above code and try to run the project it takes hardly a minute to run the project but after adding this code, it's taking hell lot of time. I have waited for almost 30 mins but still the project is not compiled nor it's throwing any error.
Thanks in advance for help
The Swift compiler has difficulties to deal with big string literals. See also this question for further hints.
I would recommend to split up the sql and to build a string variable in small steps:
var sql = "SELECT e." + ENDPOINT_ID
sql = sql + HUB_ID + ", e."
sql = sql + X_ENDPOINT_ID + ", e."
...
let stmt = try DB!.prepare(sql)
let query = (TABLE_ENDPOINT.select(TABLE_ENDPOINT[ENDPOINT_ID], TABLE_LINKING[HUB_ID], TABLE_ENDPOINT[ETCT_ENDPOINT_ID], TABLE_ENDPOINT[ENDPOINT_DESC], TABLE_ENDPOINT[ENDPOINT_TYPE_ID], TABLE_ENDPOINT_TYPE[ENDPOINT_STATUS_MIN], TABLE_ENDPOINT_TYPE[ENDPOINT_STATUS_MAX], TABLE_ENDPOINT[ENDPOINT_STATUS]).join(TABLE_ENDPOINT_TYPE, on: TABLE_ENDPOINT[ENDPOINT_TYPE_ID] == TABLE_ENDPOINT_TYPE[ENDPOINT_TYPE_ID]).join(TABLE_LINKING, on: TABLE_ENDPOINT[ENDPOINT_ID] == TABLE_LINKING[ENDPOINT_ID]).join(TABLE_NODE, on: TABLE_LINKING[NODE_ID] == TABLE_NODE[NODE_ID]).join(TABLE_NODE_TYPE, on: TABLE_NODE[NODE_TYPE_ID] == TABLE_NODE_TYPE[NODE_TYPE_ID])
.filter(TABLE_LINKING[SECTION_ID] == section_Id && TABLE_NODE_TYPE[NODE_CATEGORY] == "S")
.order(TABLE_ENDPOINT[ENDPOINT_ID].asc))
let arr = Array(try DB!.prepare(query))

Spark UDF optimization for Graph Database (Neo4j) inserts

This is first issue i am posting so apologies if i miss some info and mediocre formatting. I can update if required.
I will try to add as many details as possible. I have a not so optimized Spark Job which converts RDBMS data to graph nodes and relations in Neo4j.
To do this. Here is the steps i follow:
create a denormalized dataframe 'data' with spark sql and joins.
Foreach row in 'data' run a graphInsert function which does the following:
a. read contents of the row b. formulate a neo4j cypher query (We use Merge command so that we have have only one City e.g. Chicago created in Neo4j when Chicago will be present in multiple lines in RDBMS table) c. connect to neo4j d. execute the query e. disconnect from neo4j
Here is the list of problems i am facing.
Inserts are slow.
I know Merge query is slower than create but is there another way to do this instead of connecting and disconnecting for every record? This was my first draft code and maybe i am struggling how i will use one connection to insert from multiple threads on different spark worker nodes. Hence connecting and disconnecting for every record.
The job is not scalable. It only runs fine with 1 core. As soon as i run the job with 2 spark cores i suddenly get 2 cities with same name, even when i am running merge queries. e.g. There are 2 Chicago cities which violates the use of Merge. I am assuming that Merge functions something like "Create if not exist".
I dont know if my implementation is wrong in neo4j part or spark. If anyone can direct me to any documentation which helps me implement this on a better scale it will be helpful as i have a big spark cluster which i need to utilize at full potential for this job.
If you are interested to look at code instead of algorithm. Here is graphInsert implementation in scala:
class GraphInsert extends Serializable{
var case_attributes = new Array[String](4)
var city_attributes = new Array[String](2)
var location_attributes = new Array[String](20)
var incident_attributes = new Array[String](20)
val prop = new Properties()
prop.load(getClass().getResourceAsStream("/GraphInsertConnection.properties"))
// properties Neo4j
val url_neo4j = prop.getProperty("url_neo4j")
val neo4j_user = prop.getProperty("neo4j_user")
val neo4j_password = prop.getProperty("neo4j_password")
def graphInsert(data : Row){
val query = "MERGE (d:CITY {name:city_attributes(0)})\n" +"MERGE (a:CASE { " + case_attributes(0) + ":'" +data(11) + "'," +case_attributes(1) + ":'" +data(13) + "'," +case_attributes(2) + ":'" +data(14) +"'}) \n" +"MERGE (b:INCIDENT { " + incident_attributes(0) + ":" +data(0) + "," +incident_attributes(1) + ":" +data(2) + "," +incident_attributes(2) + ":'" +data(3) + "'," +incident_attributes(3) + ":'" +data(8)+ "'," +incident_attributes(4) + ":" +data(5) + "," +incident_attributes(5) + ":'" +data(4) + "'," +incident_attributes(6) + ":'" +data(6) + "'," +incident_attributes(7) + ":'" +data(1) + "'," +incident_attributes(8) + ":" +data(7)+"}) \n" +"MERGE (c:LOCATION { " + location_attributes(0) + ":" +data(9) + "," +location_attributes(1) + ":" +data(10) + "," +location_attributes(2) + ":'" +data(19) + "'," +location_attributes(3) + ":'" +data(20)+ "'," +location_attributes(4) + ":" +data(18) + "," +location_attributes(5) + ":" +data(21) + "," +location_attributes(6) + ":'" +data(17) + "'," +location_attributes(7) + ":" +data(22) + "," +location_attributes(8) + ":" +data(23)+"}) \n" +"MERGE (a) - [r1:"+relation_case_incident+"]->(b)-[r2:"+relation_incident_location+"]->(c)-[r3:belongs_to]->(d);"
println(query)
try{
var con = DriverManager.getConnection(url_neo4j, neo4j_user, neo4j_password)
var stmt = con.createStatement()
var rs = stmt.executeQuery(query)
con.close()
}catch{
case ex: SQLException =>{
println(ex.getMessage)
}
}
}
def operations(sqlContext: SQLContext){
....
#Get 'data' before this step
city_attributes = entity_metadata.filter(entity_metadata("source_name") === "tb_city").map(x =>x.getString(5)).collect()
case_attributes = entity_metadata.filter(entity_metadata("source_name") === "tb_case_number").map(x =>x.getString(5)).collect()
location_attributes = entity_metadata.filter(entity_metadata("source_name") === "tb_location").map(x =>x.getString(5)).collect()
incident_attributes= entity_metadata.filter(entity_metadata("source_name") === "tb_incident").map(x =>x.getString(5)).collect()
data.foreach(graphInsert)
}
object GraphObject {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("GraphNeo4j")
.setMaster("xyz")
.set("spark.cores.max","2")
.set("spark.executor.memory","10g")
Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val graph = new GraphInsert()
graph.operations(sqlContext)
}
}
Whatever you write inside the closure i.e it needs to be executed on Worker gets distributed.
You can read more about it here : http://spark.apache.org/docs/latest/programming-guide.html#understanding-closures-a-nameclosureslinka
And as you increase the number of cores, I think it must not effect the application because if you do not specify it ! then it takes the greedy approach ! I hope this document helps .
I am done improving the process but nothing could make it as fast as LOAD command in Cypher.
Hope this helps someone though:
use foreachPartition instead of foreach gives significant gain while doing such process. Also adding periodic commit using cypher.

libpq, insert with parameters

I am wondering if I can make parameterized queries directly from C/C++ with libpq instead of using strings and if do how should this code look's like?
string tblins = "";
tblins = "INSERT INTO " + commtable + " "
"(vdoc, bdoc, mytime, txml) VALUES ("
"'" + cxml.vdoc + "', "
+ cxml.bdoc + ", " //integer
"'" + cxml.mytime + "', "
"'" + cxml.txml + "')";
result = PQexec(conn, tblins.c_str());
Thanks.
Yes, you can use the PQexecParams function as explained in the documentation.
If parameters are used, they are referred to in the command string as $1, $2, etc. nParams is the number of parameters supplied; it is the length of the arrays paramTypes[], paramValues[], paramLengths[], and paramFormats[].

Checking User Agent in Wicket

I am using wicket 1.5 and I am not able to see in the getClientInfo() method
(WebRequest)RequestCycle.get().getRequest()
I saw the other place this code
WebClientInfo clientInfo = (WebClientInfo)WebRequestCycle.get().getClientInfo();
But I am not able to see any WebRequestCycle in Wicket 1.5.
Any ideas how to check the user agent in Wicket 1.5?
The easiest way is to use
WebSession.get().getClientInfo().getUserAgent();
On newer Wicket Versions (6 or newer), you should use:
WebClientInfo clientInfo = new WebClientInfo(getRequestCycle());
System.out.println("Client: " + clientInfo.getUserAgent());
System.out.println("Navigator: " + clientInfo.getProperties().getNavigatorAppName() + ", version " + clientInfo.getProperties().getNavigatorAppVersion() + ", codName: " + clientInfo.getProperties().getNavigatorAppCodeName() + ", plataform: " + clientInfo.getProperties().getNavigatorPlatform() + ", AppCodName: " + clientInfo.getProperties().getNavigatorAppCodeName());
System.out.println("NavigatorUserAgent: " + clientInfo.getProperties().getNavigatorUserAgent());
System.out.println("Tamanho da tela (Width x Height): " + clientInfo.getProperties().getScreenWidth() + " x " + clientInfo.getProperties().getScreenHeight() );
You can also do:
((WebRequest) getRequest()).getHeader("User-Agent")

Data Access Layer for Analysis Services w/Dynamic MDX

We have project that uses Analysis Services as it's datasource. To try to avoid having to create 100's of queries because of all the selection options we allow, we create our mdx queries with alot of switches and string concatenation. This is our "Data Access Layer". It is a beast to manage and the smallest mistake: missing spaces, mispellings are easy to miss and even easier to accidently include. Does anyone know of a good resource that can help make this more manageable, like a tutorial, white paper or sample project.
To give you an idea of the case logic I'm talking about and it goes on and on...
if (Time == Day)
{
if (Years == One)
{
return (" MEMBER " + CurrentSalesPercent +
" AS ([Sales % " + YearString + " " + StatusType + "]) ");
}
else //2Y
{
return (" MEMBER " + CurrentSalesPercent +
" AS ([Sales % 2Y " + StatusType + "]) ");
}
}
else if (Time == Week)
{
if (Years == One)
{
return (" MEMBER " + CurrentSalesPercent +
" AS ([Sales WTD % " + YearString + " " + StatusType + "]) ");
}
else //2Y
{
return (" MEMBER " + CurrentSalesPercent +
" AS ([Sales WTD % 2Y " + StatusType + "]) ");
}
...
To be honest, I'm not sure if all the different measures and calculations are correct either. But, that's controlled by another team, so we have a little less influence here.
Thanks!
mkt
Have you looked at the way MS generates MDX? If you have SSRS installed, get "Red gate Reflector" and disassemble C:\Program Files\Microsoft SQL Server\MSRS10.MSSQLSERVER\Reporting Services\ReportServer\bin\MDXQueryGenerator.dll
Apart from that, pre-canned queries that take parameters seems pretty standard :(