Auto increment Vertex property in orientdb - orientdb

I am using Orientdb 2.2.12.
I want to set an auto-increment property to vertex
e.g
for(i 1 to 100)
{
vertex.setProperty("counter", AUTO_INCREMENT_Value(start = 0))
}
I tried to achieve this by creating sequence
sequenceLibrary.createSequence(AUTO_INCREMENT_Value, SEQUENCE_TYPE.ORDERED, new OSequence.CreateParams().setStart((long) 0));
for(int i=1 ; i<=100; i++)
{
vertex.setProperty("counter", AUTO_INCREMENT_Value);
graph.commit();
graph.shutdown();
}
Though it work fine in single threaded system but in multithreaded system it gave ambiguous results.
I gave each thread a seperate OrientGraph instance from OrientGraphfactory as stated in official doc :
OrientGraphFactory graphFactory = new OrientGraphFactory(
"remote:" + IP + ":" + PORT+ "/" + appName).setPool(1, 100);
OrientGraph graph = graphFactory.getTx();
Is there any way to achieve this in orientdb.
Thanks..!

Related

creating edges by "joining" 2 classes

In Orientdb 3.0 RC1 (Tinkerpop/Gremlin community edition) I have 2 vertex classes:
- 'Company' with 1 property 'address'
- 'Address' with 1 property 'address' having a UNIQUE_HASH_INDEX on this property
Need to create edges of class 'Location' between 'Company' vertexes to the corresponding 'Address' vertex based on having the same 'address' property value.
First I tried using Gremlin the following way:
g.V().hasLabel("Company").as("a").
V().hasLabel("Address").as("b").
where("a", eq("b")).by("address").
addE("Location").next()
But the mid-traversal is not hitting the index .....I guess the OrientDB-Gremlin implementation not complete yet or my above query not good.
Then I converted the above to use a sideEffect():
g.V().hasLabel("Company").sideEffect{g.V().hasLabel("Address").has("address",it.get().property('address').value()).addE('Location').from(it.get()).next()}
but after quickly adding around 1k edges the query abruptly aborts and the OrientDB logs a lot of warnings like this:
"WARNI {db=ter1050} This database instance has 1280 open command/query result sets, please make sure you close them with OResultSet.close()"
So once again .....my query has a problem or I hit a bug.
I didn't find a way to do it in OrientDB SQL also.
I know this can be done using Tinkerpop API in Java but I was hoping for something more simple.
Try this:
OrientDB orientDB = new OrientDB("remote:localhost/", "<username>", "<password>", OrientDBConfig.defaultConfig());
ODatabaseDocument db = orientDB.open("<db name>","<username>", "<password>");
OResultSet result = db.command("select from Company");
OResultSet address_class = db.command("select from Address");
List<OResult> company = new ArrayList<OResult>();
List<OResult> address = new ArrayList<OResult>();
while(result.hasNext())
{
OResult record = result.next();
company.add(record);
}
while(address_class.hasNext())
{
OResult record = address_class.next();
address.add(record);
}
for(int i = 0; i < company.size(); i++)
{
String company_address = company.get(i).getProperty("address");
for(int j = 0; j < address.size(); j++)
{
String addresses = address.get(j).getProperty("address");
if(company_address.equals(addresses))
{
ORecordId company_rid = company.get(i).getProperty("#rid");
ORecordId addresses_rid = address.get(j).getProperty("#rid");
db.command("create edge Location from " + company_rid + " to " + addresses_rid);
}
}
}
db.close();
orientDB.close()
this is the result:
Hope it helps
Regards

Database for numerical data from physics simulation

I work in theoretical physics and I do lot of computer simulations. An important part of my duty is the analysis of the results. I make simulations and store the numerical results in a file with some simple name. Typically I have lot of data files with very similar name and after a while I do not remember what kind of parameters the file corresponds to. I was thinking that maybe there exists a better way to store numerical results from a simulation e.g. some database (SQL, MongoDB etc.) where I could put some comments about parameters of the program, names, date etc. - a sort of a library with numerical data. I just have everything in a one place well organized. Do you know of anything like this? How do you store you numerical data from computer simulations?
More details
Typical procedure looks like this. Let say we want to simulate time evolution of the three body problem. We have three bodies of different masses interacting with Newton forces. I want to test how these objects move in space depending on: relative mass value, initial position - 6 parameters. I run simulation for one choice of parameters and save it in file: three_body_m1=0p1_m2=0p3_(the rest).dat - all double precision in total 1+3*3 (3d) columns of data in one file. Then I lunch gnuplot, python etc. and visualize them. In principle there is no relation between the data from different simulations, but I can use them to make comparison plot.
Within same nodejs context, you can,
Stream big xyz data file to server using socket.io-stream + fs modules and save filename+parameters to database using mongodb module.(max 1-page of coding but more for complex server talking)
If data fits in ram and if you don't have to save immediately, you can use redis module to send everything to server cache easily(as key-value pairs such as data->xyzData and parameters->simulationParameters and user->name_surname) and read from it high speed. If you need data as file by other processes in server, you can stram to a ramdisk instead and have most of RAM bandwidth as a file cache.(needs more ram ofcourse but fast)
mongodb is slow(even with optimizations) for saving millions of particles xyz data but is most easiest and quickest install for parameter saving and sharing.
Using all could be better.
Saving: stream file to physical disk using socket.io-stream and fs. Send parameters to mongodb.
Loading: check redis if user is registered, check if data is in cache, if yes, get it, if no, stream from physical disk and also save some of it to redis at the same time.
Editing: check if cache exists, if yes then edit it. Another serverside process can update physical disk from that cache, if no then update physical disk directly.
The communication scheme could be:
data server talks to cache server if there is any pending writes/reads/edits, consumes jobs from there.
compute server talks to cache server for producing read/write/edit jobs or consuming compute jobs.
clients can talk to cache server for reading only.
admins can also place their own data or produce compute jobs or read stuff.
compute server, data server and cache server can be on same computer easily or moved to other computers thanks to nodejs's awesomeness and countless modules of it such as redis, socket.io-stream, fs, ejs, express(for clients for example), etc.
a cache server can offload some data to another cache server and have a redirection to it(or some mapping of data to it)
a cache server can communicate N number of data servers and M number of compute servers at the same time as long as RAM holds.
You have slow network? You can use gzip module to compress the data on-the-fly with just 3-5 lines of extra code(at both ends)
You don't have money?
Nodejs works on raspberry pi (as data server maybe?)
Nvidia GTX660 can work with an Intel galileo (compute server?) using nodejs with some extra native modules for opencl(could be hard to implement)(also connecting(and powering) gpu and galileo may not be easy but should be much faster than a cluster of raspberry pi boards for fp32 number crunching)
bypass cache, RAM is expensive for now.
data server cluster
\
\
\ client
\ client /
\ / /
\ / /
mainframe cache and database server ----- compute cluster
| \
| \
support cache server admin
A very simple example to send some files to another computer(or same):
var pipeline_n = 8;
var fs = require("fs");
// server part accepting files
{
var io = require('socket.io').listen(80);
var ss = require('socket.io-stream');
var path = require('path');
var ctr = 0;
var ctr2 = 0;
io.of('/user').on('connection', function (socket) {
var z1 = new Date();
for (var i = 0; i < pipeline_n; i++) {
ss(socket).on('data'+i.toString(), function (stream, data) {
var t1 = new Date();
stream.pipe(fs.createWriteStream("m://bench_server" + ctr + ".txt"));
ctr++;
stream.on("finish", function (p) {
var len = stream._readableState.pipes.bytesWritten;
var t2 = new Date();
ctr2++;
if (ctr2 == pipeline_n) {
var z2 = new Date();
console.log(len * pipeline_n);
console.log((z2 - z1));
console.log("throughput: " + ((len * pipeline_n) / ((z2 - z1)/1000.0))/(1024*1024)+" MB/s");
}
});
});
}
});
}
//client or another server part sending a file
//(you can change it to do parts of same file instead of same file n times),
//just a dummy file sending code to stress other server
for (var i = 0; i < pipeline_n; i++)
{
var io = require('socket.io-client');
var ss = require('socket.io-stream');
var socket = io.connect('http://127.0.0.1/user');
var stream = ss.createStream();
var filename = 'm://bench.txt'; // ramdrive or cluster of hdd raid
ss(socket).emit('data'+i.toString(), stream, { name: filename });
fs.createReadStream(filename).pipe(stream);
}
Here is testing insert vs bulk insert performance of mongodb(this could be a wrong way to benchmark but is simple, just uncomment-in the part you want to benchmark)
var mongodb = require('mongodb');
var client = mongodb.MongoClient;
var url = 'mongodb://localhost:2019/evdb2';
client.connect(url, function (err, db) {
if (err) {
console.log('fail:', err);
} else {
console.log('success:', url);
var collection = db.collection('tablo');
var bulk = collection.initializeUnorderedBulkOp();
db.close();
//benchmark insert
//var t = 0;
//t = new Date();
//var ctr = 0;
//for (var i = 0; i < 1024 * 64; i++)
//{
// collection.insert({ x: i + 1, y: i, z: i * 10 }, function (e, r) {
// ctr++;
// if (ctr == 1024 * 64)
// {
// var t2 = 0;
// db.close();
// t2 = new Date();
// console.log("insert-64k: " + 1000.0 / ((t2.getTime() - t.getTime()) / (1024 * 64)) + " insert/s");
// }
// });
//}
// benchmark bulk insert
//var t3 = new Date();
//for (var i = 0; i < 1024 * 64; i++)
//{
// bulk.insert({ x: i + 1, y: i, z: i * 10 });
//}
//bulk.execute();
//var t4 = new Date();
//console.log("bulk-insert-64k: " + 1000.0/((t4.getTime() - t3.getTime()) / (1024 * 64)) + " insert/s");
// db.close();
}
});
be sure to setup mongodb and or redis servers before this. Also "npm install module_name" necessary modules from nodejs command prompt.

orientdb clusterselection, how can I distribute my records in each node averagely

Orientdb verson 2.1.11.
I have 3 node, I want to distribute my records in each node averagely,my code is:
ODatabase database = new DatabaseDocumentTx("remote:node1;node2;node3/mydb").open("root", "1234");
System.out.println("selection:" + database.get(ODatabase.ATTRIBUTES.CLUSTERSELECTION));
database.command(new OCommandSQL("alter class Person clusterSelection round-robin"));
System.out.println("selection:" + database.get(ODatabase.ATTRIBUTES.CLUSTERSELECTION));
for (int i = 0; i < 100; ++i) {
ODocument document = new ODocument("Person");
document.field("name", "pengtao.geng" + i);
document.field("age", 28 + i);
document.save();
System.out.println("save " + i);
}
database.close();
however, it does't work. I try to modify class clusterselection by studio, it does't work too.
How can I do this,could you give me an example, thank you very much
In v2.2 OrientDB supports Load Balancing: http://orientdb.com/docs/2.1/Distributed-Configuration.html#load-balancing. You can use ROUND_ROBIN_REQUEST strategy with this code (the servers must be in cluster):
ODatabase database = new DatabaseDocumentTx("remote:node1;node2;node3/mydb").open("root", "1234");
db.setProperty(OStorageRemote.PARAM_CONNECTION_STRATEGY, OStorageRemote.CONNECTION_STRATEGY.ROUND_ROBIN_REQUEST);
for (int i = 0; i < 100; ++i) {
ODocument document = new ODocument("Person");
document.field("name", "pengtao.geng" + i);
document.field("age", 28 + i);
document.save();
System.out.println("save " + i);
}
database.close();

Log Fiddler Requests to Database Real-time

Is there any way to log all requests ongoing to a database or can you only log snapshots to a database?
The following example relies upon OLEDB 4.0 which is not available for 64bit processes. You can either select another data provider (e.g. SQLServer) or you can force Fiddler to run in 32bit mode.
Add the following to the Rules file to create a new menu item.
// Log the currently selected sessions in the list to a database.
// Note: The DB must already exist and you must have permissions to write to it.
public static ToolsAction("Log Selected Sessions")
function DoLogSessions(oSessions: Fiddler.Session[]){
if (null == oSessions || oSessions.Length < 1){
MessageBox.Show("Please select some sessions first!");
return;
}
var strMDB = "C:\\log.mdb";
var cnn = null;
var sdr = null;
var cmd = null;
try
{
cnn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + strMDB);
cnn.Open();
cmd = new OleDbCommand();
cmd.Connection = cnn;
for (var x = 0; x < oSessions.Length; x++){
var strSQL = "INSERT into tblSessions ([ResponseCode],[URL]) Values (" +
oSessions[x].responseCode + ", '" + oSessions[x].url + "')";
cmd.CommandText = strSQL;
cmd.ExecuteNonQuery();
}
}
catch (ex){
MessageBox.Show(ex);
}
finally
{
if (cnn != null ){
cnn.Close();
}
}
}
Note: To use the Database Objects in Fiddler 2.3.9 and below, you'll need to add system.data to the References list inside Tools | Fiddler Options | Extensions | Scripting. In 2.3.9.1 and later, this reference will occur automatically.
Then, list the new import at the top of your rules script:
import System.Data.OleDb;
see FiddlerScript CookBook

Devart Oracle Entity Framework 4.1 performance

I want to know why Code fragment 1 is faster than Code 2 using POCO's with Devart DotConnect for Oracle.
I tried it over 100000 records and Code 1 is way faster than 2. Why? I thought "SaveChanges" would clear the buffer making it faster as there is only 1 connection. Am I wrong?
Code 1:
for (var i = 0; i < 100000; i++)
{
using (var ctx = new MyDbContext())
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
Code 2:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
ctx.SaveChanges();
}
}
The first code snippet works faster as the same connection is taken from the pool every time, so there are no performance losses on its re-opening.
In the second case 100000 objects gradually are added to the context. A slow snapshot-based tracking is used (if no dynamic proxy). This leads to the detection if any changes in any of cached objects occured on each SaveChanges(). More and more time is spent by each subsequent iteration.
We recommend you to try the following approach. It should have a better performance than the mentioned ones:
using (var ctx = new MyDbContext())
{
for (var i = 0; i < 100000; i++)
{
MyObj obj = new MyObj();
obj.Id = i;
obj.Name = "Foo " + i;
ctx.MyObjects.Add(obj);
}
ctx.SaveChanges();
}
EDIT
If you use an approach with executing large number of operations within one SaveChanges(), it will be useful to configure additionally the Entity Framework behaviour of Devart dotConnect for Oracle provider:
// Turn on the Batch Updates mode:
var config = OracleEntityProviderConfig.Instance;
config.DmlOptions.BatchUpdates.Enabled = true;
// If necessary, enable the mode of re-using parameters with the same values:
config.DmlOptions.ReuseParameters = true;
// If object has a lot of nullable properties, and significant part of them are not set (i.e., nulls), omitting explicit insert of NULL-values will decrease greatly the size of generated SQL:
config.DmlOptions.InsertNullBehaviour = InsertNullBehaviour.Omit;
Only some options are mentioned here. The full list of them is available in our article:
http://www.devart.com/blogs/dotconnect/index.php/new-features-of-entity-framework-support-in-dotconnect-providers.html
Am I wrong to assume that when SaveChanges() is called, all the
objects in cache are stored to DB and the cache is cleared, so each
loop is independent?
SaveChanges() sends and commits all changes to database, but change tracking is continued for all entities which are attached to the context. And new SaveChanges, if snapshot-based change tracking is used, will start a long process of checking (changed or not?) the values of each property for each object.