I can execute this query below just fine through web interface. It takes virtually no time at all to finish.
SELECT from Person;
But when I try to do it from my Java application, it takes more than 17 seconds to finish.
The code I'm using is basically this two lines:
OrientGraph graph = new OrientGraph("remote:93.x.x.x/test");
OCommandRequest req = graph.command(new OCommandSQL(query));
req.execute();
Could it be that the REST requests are so much slower? Web interface is using plocal (I guess), while my Java app uses remote connection.
Try to run the same query also from the console.
The time spent in the console should be about the same (just a little slower than that in java).
I did a test inserting 100,000 Vertex class Person. Doing various query response times is:
Studio = 7.72 sec, Console = 2,043 sec, Java = 1:23 to 1:41 sec
If revenues from a very different time, perhaps something is wrong in java.
You have shown "OCommandSQL", check with "OSQLSynchQuery" to see if there is a great difference.
String query = "";
Iterable<Vertex> result;
query = "select from Persona";
//query with OSQLSynchQuery
result = g.command(new OSQLSynchQuery<Vertex>(query)).execute();
List<OrientVertex> listVertex = new ArrayList<OrientVertex>();
CollectionUtils.addAll(listVertex, result.iterator());
//query with OCommandSQL
OCommandRequest req = g.command(new OCommandSQL(query));
req.execute();
Related
I am looking to use to Scala to get faster performance in accessing and downloading a Amazon S3 file. The file comes in as a InputStream and it is large (over 30 million rows).
I have tried this in python (pandas), but it is too slow. I am hoping to increase the speed with Scala.
So far I am doing this, but it is too slow. Have I encountered a bottle neck in the stream in that I cannot access data from the stream any faster than what I have with the code below?
val obj = amazonS3Client.getObject(bucket_name, file_name)
val reader = new BufferedReader(new InputStreamReader(obj.getObjectContent()))
while(line != null) {
list_of_lines = list_of_lines ::: List(line)
line = reader.readLine
}
I'm looking for serious speed improvement compared to the above approach. Thanks.
I suspect that your performance bottleneck is appending to a (linked) List in that loop:
list_of_lines = list_of_lines ::: List(line)
With 30 million lines, it should take a few hundred trillion times longer to process all the lines than it takes to process one line. If the first iteration of that loop is 1ns, then this should take somewhere around 15 minutes to execute.
Switching to prepending to the List and then reversing at the end should improve the speed of your loop by a factor of more than a million:
while(line != null) {
list_of_lines = line :: list_of_lines
line = reader.readLine
}
list_of_lines = list_of_lines.reverse
List is also notoriously memory-inefficient, so for this many elements, it's may also be worth doing something like this (which is also more idiomatic Scala):
import scala.io.{ Codec, Source }
val obj = amazonS3Client.getObject(bucket_name, file_name)
val source = Source.fromInputStream(obj.getObjectContent())(Codec.defaultCharsetCodec)
val lines = source.getLines().toVector
Vector being more memory-efficient than List should dramatically reduce GC thrashing.
The best way to achieve better performance is using the TransferManager provided by the AWS Java SDKs. It's a high level file transfer manager that will automatically parallelise downloads. I'd recommend using SDK v2, but the same can be done with SDK v1. Though, be aware, SDK v1 comes with limitations and only multipart files can be downloaded in parallel.
You need the following dependency (assuming you are using sbt with Scala). But note, there's no benefit of using Scala over Java with the TransferManager.
libraryDependencies += "software.amazon.awssdk" % "s3-transfer-manager" % "2.17.243-PREVIEW"
Example (Java):
S3TransferManager transferManager = S3TransferManager.create();
FileDownload download =
transferManager.downloadFile(b -> b.destination(Paths.get("myFile.txt"))
.getObjectRequest(req -> req.bucket("bucket").key("key")));
download.completionFuture().join();
I recommend reading more on the topic here: Introducing Amazon S3 Transfer Manager in the AWS SDK for Java 2.x
I working on updating a Typo3 7.6 to 8.7. I do this on my local machine with XAMPP on windows with PHP 7.2.
I got the backend working. It needed some manual work in the DB, like changing the CType in tt_content for my own content elements as well as filling the colPos.
However when I call the page on the frontend all I get is a timeout:
Fatal error: Maximum execution time of 60 seconds exceeded in
C:\xampp\htdocs\typo3_src-8.7.19\vendor\doctrine\dbal\lib\Doctrine\DBAL\Driver\Mysqli\MysqliStatement.php on line 92
(this does not change if I set max_execution_time to 300)
Edit: I added an echo just before line 92 in the above file, this is the function:
public function __construct(\mysqli $conn, $prepareString)
{
$this->_conn = $conn;
echo $prepareString."<br />";
$this->_stmt = $conn->prepare($prepareString);
if (false === $this->_stmt) {
throw new MysqliException($this->_conn->error, $this->_conn->sqlstate, $this->_conn->errno);
}
$paramCount = $this->_stmt->param_count;
if (0 < $paramCount) {
$this->types = str_repeat('s', $paramCount);
$this->_bindedValues = array_fill(1, $paramCount, null);
}
}
What I get is the following statement 1000 of times, always exactly the same:
`SELECT `tx_fed_page_controller_action_sub`, `t3ver_oid`, `pid`, `uid` FROM `pages` WHERE (uid = 0) AND ((`pages`.`deleted` = 0) AND (`pages`.`hidden` = 0) AND (`pages`.`starttime` <= 1540305000) AND ((`pages`.`endtime` = 0) OR (`pages`.`endtime` > 1540305000)))`
Note: I don't have any entry in pages with uid=0. So I am really not sure what this is good for. Does there need to be a page with uid=0?
I enabled logging slow query in mysql, but don't get anything logged with it. I don't get any aditional PHP error nor do I get a log entry in typo3.
So right now I am a bit stuck and don't know how to proceed.
I enabled general logging for mysql and when I call a page on frontent I get this SQL query executed over and over again:
SELECT `tx_fed_page_controller_action_sub`, `t3ver_oid`, `pid`, `uid` FROM `pages` WHERE (uid = 0) AND ((`pages`.`deleted` = 0) AND (`pages`.`hidden` = 0) AND (`pages`.`starttime` <= 1540302600) AND ((`pages`.`endtime` = 0) OR (`pages`.`endtime` > 1540302600)))
executing this query manually gives back an empty result (I don't have any entry in pages with uid=0). I don't know if that means anything..
What options do I have? How can I find whats missing / where the error is?
First: give your PHP more time to run.
in the php.ini configuration increase the max execution time to 240 seconds.
be aware that for TYPO3 in production mode 240 seconds are recommended. If you start the install-tool you can do a system check and get information about configuration which might need optimization.
Second: avoid development mode and use production mode.
the execution is faster, but you will loose the option to debug.
debugging always costs more time and more memory to prepare all that information. maybe 240 seconds are not enough and you even need more memory.
The field tx_fed_page_controller_action_sub comes from an extension it is not part of the core. Most likely you have flux and fluidpages installed in your system.
Try to deactivate those extensions and proceed without them. Reintegrate them later if you still need them. A timeout often means that there is some kind of recursion going on. From my experience with flux it is possible that a content element has itself set as its own flux_parent and therefore creates an infinite rendering loop that will cause a fatal after the max_execution_time.
So, in your case I'd try to find the record that is causing this (seems to be a page record) and/or the code that initiates the Query. You do not need to debug in Doctrine itself :)
I have 7 entity classes to index using Hibernate Search. Having tried both MassIndexer and FlushToIndexes, the indexer process churned away through the smallest entites but the largest entities/tables did not finish, even though a MassIndexerProgressMonitor told the indexing finished. The process just hangs when it hits 100-200 MB allocated. I want to ensure indexing process ends properly.
Questions: Is the code correct? Should hibernate or database settings be tuned?
Environment: 64-bit Windows 7, JBoss, Struts2, Hibernate, Hibernate Search, Lucene, SQL Server. The Hibernate Search Index is placed in filesystem.
MassIndexer code sample:
final Session session = HibernateSessionFactory.getSession();
final FullTextSession fullTextSession = Search.getFullTextSession(session);
MassIndexerProgressMonitor monitor = new IndexProgressMonitor("Kanalregister");
fullTextSession.createIndexer()
.purgeAllOnStart(true)
.progressMonitor(monitor)
.batchSizeToLoadObjects(BATCH_SIZE) // 250000
.startAndWait();
FlushToIndexes code sample: (from Hibernate ref. doc.) (seems to index ok, but never ends)
final Session session = HibernateSessionFactory.getSession();
final FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
Transaction t1 = fullTextSession.beginTransaction();
// Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria(Land.class)
.setFetchSize(BATCH_SIZE) // 250000
.scroll(ScrollMode.FORWARD_ONLY);
int index = 0;
while (results.next()) {
index++;
fullTextSession.index(results.get(0)); // index each element
if (index % BATCH_SIZE == 0) {
fullTextSession.flushToIndexes(); // apply changes to indexes
fullTextSession.clear(); // free memory since the queue is processed
}
}
t1.commit();
The code is verified to end when mocking all indexing work, using the following setting in hibernate.cfg.xml:
<property name="hibernate.search.default.worker.backend">blackhole</property>
The code above is validated and correct.
My problem with the console not ending is thought to be related to Eclipse, as a printout at end of main() was indeed displayed.
There were some missing entity classes (in my model) which were not reported properly. Once I got notified of those and added them to my model, the indexing process ended successfully for MassIndexer, as evidenced by 3+ files in each directory in the lucene index.
I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?
Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}
I have 20 different methods and use datareader to read and get result from these functions in the same event.In top of page I create datareader and then begin to load it step by step(It uses same connection and same data access function).Till 15.function datareader loads without problem but after 15,it loads slowly(record count is about 20-30).When i close datareader after 15.function,this problem doesnt occur.But now after 15.function,i should close datareader if i execute some function.Why does this problem occur,I dont know.I posted sample code here.
'Trying method 1
strSQL.ToString="Select * from A"
dr = DB_Gateway.ReadAndBind(strSQL.ToString)
'Trying method 2
strSQL.ToString="Select * from B"
dr = DB_Gateway.ReadAndBind(strSQL.ToString)
'Trying method 15
strSQL.ToString="Select * from K"
dr = DB_Gateway.ReadAndBind(strSQL.ToString)
AFTER 15. EXECUTION,DATAREADER BEGINS TO LOAD DATA SLOWLY.WHEN I ADD DR.CLOSE AND EXECUTE IT,I DONT HAVE PROBLEM.IF I DONT DO IT,IT LOADS 20 RECORDS WITHING 5 SECONDS.THIS IS MY READANDBIND FUNCTION.I AM CONNECTING ORACLE 11 G.WHAT CAN CAUSE THIS PROBLEM?
Public Shared Function ReadAndBind(ByVal SQL As String) As OracleDataReader
Dim oraCommand As New OracleCommand
With oraCommand
.Connection =
New OracleConnection(CONN_NAME)
.CommandText = SQL
Dim dtreader As OracleDataReader
Try
.Connection.Open()
dtreader = .ExecuteReader(CommandBehavior.CloseConnection)
Catch ex As Exception
Exception_Save(ex.Message, oraCommand.ToString)
Throw
Finally
'.Connection.Close()
'.Connection.Dispose()
oraCommand.Dispose()
oraCommand =
Nothing
End Try
Return dtreader
End With
End Function
No, you are not using the same connection for all the commands, you are opening a new connection for each one. As you fail to close them, at the end of the code you will be having 20 database connections open at once.
Also, you are not using a single data reader, you are creating a new data reader for each query. When you assign the method result to the dr variable it's not reusing the data reader, it's throwing away the reference to one reader and replaces it with a new one. It's normal to use one reader for each result, but it means that you have to close each data reader before getting the next, or you will get an unreachable object that holds on to a database connection until the garbage collector removes it.
If you close each reader before getting the next, the database connection will be closed and returned to the connection pool so that it can be reused for the next query. Slightly better would be to create a single connection object for the page and use that for each command, that will save a few round trips to the database.