how read-through work in ignite - persistence

my cache is empty so sql queries return null.
The read-through means that if the cache is missed, then Ignite will automatically get down to the underlying db(or persistent store) to load the corresponding data.
If there are new data inserted into the underlying db table ,i have to down cache server to load the newly inserted data from the db table automatically or it will sync automatically ?
Is work same as Spring's #Cacheable or work differently.
It looks to me that the answer is no. Cache SQL query don't work as no data in cache but when i tried cache.get in i got following results :
case 1:
System.out.println("data == " + cache.get(new PersonKey("Manish", "Singh")).getPhones());
result ==> data == 1235
case 2 :
PersonKey per = new PersonKey();
per.setFirstname("Manish");
System.out.println("data == " + cache.get(per).getPhones());
throws error:- as following
error image, image2

Read-through semantics can be applied when there is a known set of keys to read. This is not the case with SQL, so in case your data is in an arbitrary 3rd party store (RDBMS, Cassandra, HBase, ...), you have to preload the data into memory prior to running queries.
However, Ignite provides native persistence storage [1] which eliminates this limitation. It allows to use any Ignite APIs without having anything in memory, and this includes SQL queries as well. Data will be fetched into memory on demand while you're using it.
[1] https://apacheignite.readme.io/docs/distributed-persistent-store

When you insert something into the database and it is not in the cache yet, then get operations will retrieve missing values from DB if readThrough is enabled and CacheStore is configured.
But currently it doesn't work this way for SQL queries executed on cache. You should call loadCache first, then values will appear in the cache and will be available for SQL.
When you perform your second get, the exact combination of name and lastname is sought in DB. It is converted into a CQL query containing lastname=null condition, and it fails, because lastname cannot be null.
UPD:
To get all records that have firstname column equal to 'Manish' you can first do loadCache with an appropriate predicate and then run an SQL query on cache.
cache.loadCache((k, v) -> v.lastname.equals("Manish"));
SqlFieldsQuery qry = new SqlFieldsQuery("select firstname, lastname from Person where firstname='Manish'");
try (FieldsQueryCursor<List<?>> cursor = cache.query(qry)) {
for (List<?> row : cursor)
System.out.println("firstname:" + row.get(0) + ", lastname:" + row.get(1));
}
Note that loadCache is a complex operation and requires to run over all records in the DB, so it shouldn't be called too often. You can provide null as a predicate, then all records will be loaded from the database.
Also to make SQL run fast on cache, you should mark firstname field as indexed in QueryEntity configuration.

In your case 2, have you tried specifying lastname as well? By your stack trace it's evident that Cassandra expects it to be not null.

Related

MyBatis API RowBounds Class does full scan of the DB?

When the Class RowBounds in MyBatis API gets data from DB, does it do full scan and then cut the row that is set up by limit and offset parameters? or does it only get the data bound?
If the SQL query contains offset and limit/fetch first n rows only then the resultset will return only data within bounds. Bounds are applied on DB side. OFFSET 10000 LIMIT 20 will produces a (maximum) 20 records resultset.
This is likely what you need.
Rowbound does not alter the SQL query and operates independently. Mybatis works with whole Resultset returned by the DB.
e.g.: RowBounds(10000, 20) will skip first 10000 records of the resultset, then fetch 20 records and stop. But the result size may be MAX_INT.
It does retrieve full data from database however return only the requested number of records into the program. So, no worries for OutOfMemory but query will take long on database side.
Hibernate and Eclipselink, on the other hand pass on the given limit count onto the database and retrieves only required number of records from database. Hibernate achieves this by using database vendor specific SQL construct in its generated SQL. Ex - LIMIT clause in MS-SQL, Rownum for Oracle.
If you want to achieve the same in mybatis, you need to use these constructs yourselves.
It is easy and you can use mybatis conditions to make the SQL specific to any database.

EF 6 Caching withing a context

I have a single DbContext.. First I do:
var all = context.MySet.Where(c=>c.X == 1).ToList();
later (with the same context instance)
var special = context.MySet.Where(c=>(c.X == 1) && (c.Y===1).ToList();
The database is hit AGAIN! Since the first query is guaranteed
to return all of the elements that will exist in the second, why is the DB being hit again?
If you wish to avoid hitting the database again then you could try this;
var special = all.Where(c=>(c.X == 1) && (c.Y===1).ToList();
Since the list of all objects already contains everything you want you can just query that list and the database won't get hit again.
Your link expression is just a query, it only retrieves data when you enumerate it (for example calling .ToList()). You can keep changing the query and hold off actually getting the data until you need it. The entity framework will convert your query into an SQL query in the background and then fetch data.
Avoid writing "ToList()" at the end of every query as this forces the EF to hit the database.
If you only ever what to hit the database once then get the data you need by calling "ToList(), To.Array etc and then work with that collection (in your case the "all" collection) since this is the object holding all the data.

ResultSet exhausted error is coming while accessing value of cassandra table through scala

I had written a simple code to connect to cassandra and fetch the table value in scala .Here is my code
def t()={
var cluster:Cluster=connect("127.0.0.1")
println(cluster)
var session:Session=cluster.connect("podv1");
// var tx= session.execute("CREATE KEYSPACE myfirstcassandradb WITH " +
// "replication = {'class':'SimpleStrategy','replication_factor':1}")
var p=session.execute("select data from section_tbl limit 2;")
session.close()
cluster.close
}
Although create keyspace thing is working(which i had commented) but when i am trying to fetch the data from table it is giving error like
ResultSet[ exhausted: false, Columns[data(varchar)]] .I just started connecting scala to cassandra so may be i am doing some mistake or something.Any help regarding how to get data will be appreciable.
it is giving error like ResultSet[ exhausted: false, Columns[data(varchar)]]
This is not an error, ResultSet is the object returned from execute. exhausted indicates that not all the rows have been consumed by you yet, and Columns indicates the name of each column and its type.
So for example in your case of 'p' being the variable assigned to session.execute, you can do something like:
var p = session.execute("select data from section_tbl limit 2;")
println(p.all())
all() the grabs all the rows associated with your query from Cassandra and captures it as a List[Row]. If you have a really large result set (which in your case you don't since you are limiting to two rows), you may want to iterate of the ResultSet instead, which will allow the driver to use paging to retrieve chunks of Rows at a time from cassandra.
A ResultSet implements java.util.Iteratable<Row>, so you can go all the operations you can with iterable. Refer to the ResultSet api for more operations.

C# Comparing lists of data from two separate databases using LINQ to Entities

I have 2 SQL Server databases, hosted on two different servers. I need to extract data from the first database. Which is going to be a list of integers. Then I need to compare this list against data in multiple tables in the second database. Depending on some conditions, I need to update or insert some records in the second database.
My solution:
(WCF Service/Entity Framework using LINQ to Entities)
Get the list of integers from 1st db, takes less than a second gets 20,942 records
I use the list of integers to compare against table in the second db using the following query:
List<int> pastDueAccts; //Assuming this is the list from Step#1
var matchedAccts = from acct in context.AmAccounts
where pastDueAccts.Contains(acct.ARNumber)
select acct;
This above query is taking so long that it gives a timeout error. Even though the AmAccount table only has ~400 records.
After I get these matchedAccts, I need to update or insert records in a separate table in the second db.
Can someone help me, how I can do step#2 more efficiently? I think the Contains function makes it slow. I tried brute force too, by putting a foreach loop in which I extract one record at a time and do the comparison. Still takes too long and gives timeout error. The database server shows only 30% of the memory has been used.
Profile the sql query being sent to the database by using SQL Profiler. Capture the SQL statement sent to the database and run it in SSMS. You should be able to capture the overhead imposed by Entity Framework at this point. Can you paste the SQL Statement emitted in step #2 in your question?
The query itself is going to have all 20,942 integers in it.
If your AmAccount table will always have a low number of records like that, you could just return the entire list of ARNumbers, compare them to the list, then be specific about which records to return:
List<int> pastDueAccts; //Assuming this is the list from Step#1
List<int> amAcctNumbers = from acct in context.AmAccounts
select acct.ARNumber
//Get a list of integers that are in both lists
var pastDueAmAcctNumbers = pastDueAccts.Intersect(amAcctNumbers);
var pastDueAmAccts = from acct in context.AmAccounts
where pastDueAmAcctNumbers.Contains(acct.ARNumber)
select acct;
You'll still have to worry about how many ids you are supplying to that query, and you might end up needing to retrieve them in batches.
UPDATE
Hopefully somebody has a better answer than this, but with so many records and doing this purely in EF, you could try batching it like I stated earlier:
//Suggest disabling auto detect changes
//Otherwise you will probably have some serious memory issues
//With 2MM+ records
context.Configuration.AutoDetectChangesEnabled = false;
List<int> pastDueAccts; //Assuming this is the list from Step#1
const int batchSize = 100;
for (int i = 0; i < pastDueAccts.Count; i += batchSize)
{
var batch = pastDueAccts.GetRange(i, batchSize);
var pastDueAmAccts = from acct in context.AmAccounts
where batch.Contains(acct.ARNumber)
select acct;
}

Is it correct to scan a table in MySQL using "SELECT * .. LiMIT start, count" without an ORDER BY clause?

Suppose Table X has a 100 tuples.
Will the following approach to scanning X generate all the tuples in TABLE X, in MySQL?
for start in [0, 10, 20, ..., 90]:
print results of "select * from X LIMIT start, 10;"
I ask, because I've been using PostgreSQL, which clearly says that this approach need not work, but there seems to be no such info for MySQL. If it won't, is there a way to return results in a fixed ordering without knowing any other info about the table (like what the primary key fields are)?
I need to scan each tuple in a table in an application, and I want a way to do it without using too much memory in the application (so simply doing a "select * from X" is out).
No, that isn't a safe assumption. Without an ORDER BY clause, there is no guaranteeing that your query will return unique results each time. If this table is properly indexed, adding an ORDER BY (for the index) shouldn't be too expensive.
Edit: Non-ORDER BYed results will sometimes be in the order of the clustered index, but I wouldn't put any money on that!
If you are using Innodb or MyISAM table types, a better approach is to use the HANDLER interface. Only MySQL supports this, but it does what you want:
http://dev.mysql.com/doc/refman/5.0/en/handler.html
Also, the MySQL API supports two modes of retrieving data from the server:
store result: in this mode, as soon as a query is executed, the API retrieves the entire result set before returning to the user code. This can use up a lot of client memory buffering results, but minimises the use of resources on the server.
use result: in this mode, the API pulls results row-by-row and returns control to the user code more frequently. This minimises the use of memory on the client, but can hold locks on the server for longer.
Most of the MySQL APIs for various languages support this in oneform or another. It is usually an argument that can be supplied as when creating the connection, and / or a separate call that can be used against an existing connection to switch it to that mode.
So, in answer to your question - I would do the following:
set the connection to "use result" mode;
select * from X