Fetching Cassandra row keys - nosql

Assume a Cassandra datastore with 20 rows, with row keys named "r1" .. "r20".
Questions:
How do I fetch the row keys of the first ten rows (r1 to r10)?
How do I fetch the row keys of the next ten rows (r11 to r20)?
I'm looking for the Cassandra analogy to:
SELECT row_key FROM table LIMIT 0, 10;
SELECT row_key FROM table LIMIT 10, 10;

Take a look at:
list<KeySlice> get_range_slices(keyspace, column_parent, predicate, range, consistency_level)
Where your KeyRange tuple is (start_key, end_key) == (r1, r10)

Based on my tests there is no order for the rows (unlike columns). CQL 3.0.0 can retrieve row keys but not distinct (there should be a way that I do not know).I my case I do not know what my key range is, so I tried to retrieve all the keys with both Hector and Thrift, and sort the keys later. The performance test with CQL 3.0.0 for 100000 columns 200 rows was about 500 milliseconds, Hector around 100 and thrift about 50 milliseconds. My Row key here is integer. Hector code follows:
public void queryRowkeys() {
myCluster = HFactory.getOrCreateCluster(CLUSTER_NAME, "127.0.0.1:9160");
ConfigurableConsistencyLevel ccl = new ConfigurableConsistencyLevel();
ccl.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE);
myKeyspace = HFactory.createKeyspace(KEYSPACE_NAME, myCluster, ccl);
RangeSlicesQuery<Integer, Composite, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(myKeyspace, IntegerSerializer.get(),
CompositeSerializer.get(), StringSerializer.get());
long start = System.currentTimeMillis();
QueryResult<OrderedRows<Integer, Composite, String>> result =
rangeSlicesQuery.setColumnFamily(CF).setKeys(0, -1).setReturnKeysOnly().execute();
OrderedRows<Integer, Composite, String> orderedRows = result.get();
ArrayList<Integer> list = new ArrayList<Integer>();
for(Row<Integer, Composite, String> row: orderedRows){
list.add(row.getKey());
}
System.out.println((System.currentTimeMillis()-start));
Collections.sort(list);
for(Integer i: list){
System.out.println(i);
}
}
This is the Thrift code:
public void retreiveRows(){
try {
transport = new TFramedTransport(new TSocket("localhost", 9160));
TProtocol protocol = new TBinaryProtocol(transport);
client = new Cassandra.Client(protocol);
transport.open();
client.set_keyspace("prefdb");
ColumnParent columnParent = new ColumnParent("events");
SlicePredicate predicate = new SlicePredicate();
predicate.setSlice_range(new SliceRange(ByteBuffer.wrap(new byte[0]), ByteBuffer.wrap(new byte[0]), false, 1));
KeyRange keyRange = new KeyRange(); //Get all keys
keyRange.setStart_key(new byte[0]);
keyRange.setEnd_key(new byte[0]);
long start = System.currentTimeMillis();
List<KeySlice> keySlices = client.get_range_slices(columnParent, predicate, keyRange, ConsistencyLevel.ONE);
ArrayList<Integer> list = new ArrayList<Integer>();
for (KeySlice ks : keySlices) {
list.add(ByteBuffer.wrap(ks.getKey()).getInt());
}
Collections.sort(list);
System.out.println((System.currentTimeMillis()-start));
for(Integer i: list){
System.out.println(i);
}
transport.close();
} catch (Exception e) {
e.printStackTrace();
}
}

You should firstly modify cassandra.yaml in the version of cassandra1.1.o, where you should set as follows:
partitioner: org.apache.cassandra.dht.ByteOrderedPartitioner
Secondly,you should define as follows:
create keyspace DEMO with placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy' and
strategy_options = [{replication_factor:1}];
use DEMO;
create column family Users with comparator = AsciiType and
key_validation_class = LongType and
column_metadata = [
{
column_name: aaa,
validation_class: BytesType
},{
column_name: bbb,
validation_class: BytesType
},{
column_name: ccc,
validation_class: BytesType
}
];
Finally, you can insert data into cassandra and can realize range query.

Related

Group By with Entity Framework

enter image description hereI have a code. And there you need to make a grouping by name.
//<date,<partid,amount>>
Dictionary<string, Dictionary<int, double>> emSpending = new Dictionary<string, Dictionary<int, double>>();
foreach (Orders order in db.Orders.ToList())
{
foreach (OrderItems orderitem in order.OrderItems.ToList())
{
if (!emSpending.ContainsKey(order.Date.ToString("yyyy-MM"))) emSpending.Add(order.Date.ToString("yyyy-MM"), new Dictionary<int, double>());
if (!emSpending[order.Date.ToString("yyyy-MM")].ContainsKey(Convert.ToInt32(orderitem.PartID))) emSpending[order.Date.ToString("yyyy-MM")].Add(Convert.ToInt32(orderitem.PartID), 0);
emSpending[order.Date.ToString("yyyy-MM")][Convert.ToInt32(orderitem.PartID)] += Convert.ToDouble(orderitem.Amount);
}
}
DataGridViewColumn col1 = new DataGridViewColumn();
col1.CellTemplate = new DataGridViewTextBoxCell();
col1.Name = "Department";
col1.AutoSizeMode = DataGridViewAutoSizeColumnMode.Fill;
col1.HeaderText = "Department";
dgvEMSpending.Columns.Add(col1);
foreach (string date in emSpending.Keys)
{
DataGridViewColumn col = new DataGridViewColumn();
col.Name = date;
col.HeaderText = date;
col.AutoSizeMode = DataGridViewAutoSizeColumnMode.Fill;
col.CellTemplate = new DataGridViewTextBoxCell();
dgvEMSpending.Columns.Add(col);
}
List<string> allKey = emSpending.Keys.ToList();
foreach (string date in allKey)
if (date == "Department") continue;
else
{
dgvEMSpending.Rows.Add();
foreach (int partid in emSpending[date].Keys)
{
dgvEMSpending.Rows[dgvEMSpending.Rows.Count - 1].Cells[0].Value = db.Parts.Where(x => x.ID == partid).SingleOrDefault().Name.GroupBy(Name);
for (int i = 1; i < dgvEMSpending.Columns.Count; i++)
{
if (!emSpending.ContainsKey(dgvEMSpending.Columns[i].Name)) emSpending.Add(dgvEMSpending.Columns[i].Name, new Dictionary<int, double>());
if (!emSpending[dgvEMSpending.Columns[i].Name].ContainsKey(partid)) emSpending[dgvEMSpending.Columns[i].Name].Add(partid, 0);
double val = emSpending[dgvEMSpending.Columns[i].Name][partid];
dgvEMSpending.Rows[dgvEMSpending.RowCount - 1].Cells[i].Value = val;
}
}
}
I tried to use group by myself, but something doesn't work. It just outputs the same names, and I want to group them so that there is a grouping. Pls helped to me.
Ok, a few issues to help you out first. This code:
foreach (Orders order in db.Orders.ToList())
{
foreach (OrderItems orderitem in order.OrderItems.ToList())
{
if (!emSpending.ContainsKey(order.Date.ToString("yyyy-MM"))) emSpending.Add(order.Date.ToString("yyyy-MM"), new Dictionary<int, double>());
if (!emSpending[order.Date.ToString("yyyy-MM")].ContainsKey(Convert.ToInt32(orderitem.PartID))) emSpending[order.Date.ToString("yyyy-MM")].Add(Convert.ToInt32(orderitem.PartID), 0);
emSpending[order.Date.ToString("yyyy-MM")][Convert.ToInt32(orderitem.PartID)] += Convert.ToDouble(orderitem.Amount);
}
}
Right off the bat this is going to trip lazy loading on OrderItems. If you have 10 orders 1-10 you're going to be running 11 queries against the database:
SELECT * FROM Orders;
SELECT * FROM OrderItems WHERE OrderId = 1;
SELECT * FROM OrderItems WHERE OrderId = 2;
// ...
SELECT * FROM OrderItems WHERE OrderId = 10;
Now if you have 100 orders or 1000 orders, you should see the problem. At a minimum ensure that if you are touching a collection or reference on entities you are loading, eager load it with Include:
foreach (Orders order in db.Orders.Include(x => x.OrderItems).ToList())
This will run a single query that fetches the Orders and their OrderItems. However, if you have a LOT of rows this is going to take a while and consume a LOT of memory.
The next tip is "only load what you need". You need 1 field from Order and 2 fields from OrderItem. So why load everything from both tables??
var orderItemDetails = db.Orders
.SelectMany(o => o.OrderItems.Select(oi => new { o.Date, oi.PartId, oi.Amount })
.ToList();
This would give us just the Order date, and each Part ID and Amount. Now that this data is in memory we can group it to populate your desired dictionary structure without having to iterate over it row by row.
var emSpending = orderItemDetails.GroupBy(x => x.Date.ToString("yyyy-MM"))
.ToDictionary(g => g.Key,
g => g.GroupBy(y => y.PartId)
.ToDictionary(g2 => g2.Key, g2 => g2.Sum(z => z.Amount)));
Depending on the Types in your entities you may need to insert casts. This first groups the outer dictionary of the yyyy-MM of the order dates, then it groups the remaining data for each date by part ID, and sums the Amount.
Now relating to your question, from your code example I'm guessing the problem area you are facing is this line:
dgvEMSpending.Rows[dgvEMSpending.Rows.Count - 1].Cells[0].Value = db.Parts
.Where(x => x.ID == partid)
.SingleOrDefault().Name.GroupBy(Name);
Now the question would be to explain what exactly you are expecting from this? You are fetching a single Part by ID. How would you expect this to be "grouped"?
If you want to display the Part name instead of the PartId then I believe you would just want to Select the Part Name:
dgvEMSpending.Rows[dgvEMSpending.Rows.Count - 1].Cells[0].Value = db.Parts
.Where(x => x.ID == partid)
.Select(x => x.Name)
.SingleOrDefault();
We can go one better to fetch the Part names for each used product in one hit using our loaded order details:
var partIds = orderItemDetails
.Select(x=> x.PartId)
.Distinct()
.ToList();
var partDetails = db.Parts
.Where(x => partIds.Contains(x.ID))
.ToDictionary(x => x.ID, x => x.Name);
This fetches us a dictionary set indexed by ID for the part names, it would be done outside of the loop after we had loaded the orderItemDetails. Now we don't have to go to the DB with every row:
dgvEMSpending.Rows[dgvEMSpending.Rows.Count - 1].Cells[0].Value = partDetails[partId];

Bulk Operations - Adding multiple rows to sheet

I'm attempting to write data from a data table to a sheet via the smartsheet API (using c# SDK). I have looked at the documentation and I see that it supports bulk operations but I'm struggling with finding an example for that functionality.
I've attempted to do a work around and just loop through each record from my source and post that data.
//Get column properties (column Id ) for existing smartsheet and add them to List for AddRows parameter
//Compare to existing Column names in Data table for capture of related column id
var columnArray = getSheet.Columns;
foreach (var column in columnArray)
{
foreach (DataColumn columnPdiExtract in pdiExtractDataTable.Columns)
{
//Console.WriteLine(columnPdiExtract.ColumnName);
if(column.Title == columnPdiExtract.ColumnName)
{
long columnIdValue = column.Id ?? 0;
//addColumnArrayIdList.Add(columnIdValue);
addColumnArrayIdList.Add(new KeyValuePair<string, long>(column.Title,columnIdValue));
}
}
}
foreach(var columnTitleIdPair in addColumnArrayIdList)
{
Console.WriteLine(columnTitleIdPair.Key);
var results = from row in pdiExtractDataTable.AsEnumerable() select row.Field<Double?>(columnTitleIdPair.Key);
foreach (var record in results)
{
Cell[] cells = new Cell[]
{
new Cell
{
ColumnId = columnTitleIdPair.Value,
Value = record
}
};
cellRecords = cells.ToList();
cellRecordsInsert.Add(cellRecords);
}
Row rows = new Row
{
ToTop = true,
Cells = cellRecords
};
IList<Row> newRows = smartsheet.SheetResources.RowResources.AddRows(sheetId, new Row[] { rows });
}
I expected to generate a value for each cell, append that to the list and then post it through the Row Object. However, my loop is appending the column values as such: A1: 1, B2: 2, C3: 3 instead of A1: 1, B1: 2, C3: 3
The preference would be to use bulk operations, but without an example I'm a bit at a loss. However, the loop isn't working out either so if anyone has any suggestions I would be very grateful!
Thank you,
Channing
Have you seen the Smartsheet C# sample read / write sheet? That may be a useful reference. It contains an example use of bulk operations that updates multiple rows with a single call.
Taylor,
Thank you for your help. You lead me in the right direction and I figured my way through a solution.
I grouped by my column value list and built records for the final bulk operation. I used a For loop but the elements in each grouping of columns is cleaned and assigned a 0 prior to this method so that they retain the same count of values per grouping.
// Pair column and cell values for row building - match
// Data source column title names with Smartsheet column title names
List<Cell> pairedColumnCells = new List<Cell>();
//Accumulate cells
List<Cell> cellsToImport = new List<Cell>();
//Accumulate rows for additions here
List<Row> rowsToInsert = new List<Row>();
var groupByCells = PairDataSourceAndSmartsheetColumnToGenerateCells(
sheet,
dataSourceDataTable).GroupBy(
c => c.ColumnId,
c => c.Value,
(key, g) => new {
ColumnId = key, Value = g.ToList<object>()
});
var countGroupOfCells = groupByCells.FirstOrDefault().Value.Count();
for (int i = 0; i <= countGroupOfCells - 1; i++)
{
foreach (var groupOfCells in groupByCells)
{
var cellListEelement = groupOfCells.Value.ElementAt(i);
var cellToAdd = new Cell
{
ColumnId = groupOfCells.ColumnId,
Value = cellListEelement
};
cellsToImport.Add(cellToAdd);
}
Row rows = new Row
{
ToTop = true,
Cells = cellsToImport
};
rowsToInsert.Add(rows);
cellsToImport = new List<Cell>();
}
return rowsToInsert;

Generate jpql queries dynamically based on values provided/not provided from multiple choice lists

I have four choice lists on my HTML page and I'm retrieving data when the choices are selected. How do I dynamically create a jqpl query based on the selection on the choice lists.
In my case, there are 4 choice lists and a user can either select options from all the lists or a combination of them. How do I write my query in this scenario?
My query is something like
SELECT x FROM tablename x WHERE x.column1= :choice1 AND x.column2 = :choice2 AND x.column3 = :choice3 AND x.column4 = :choice4
I assume that you may try this Criteria API
Very simple. You can use a Map to save the paremeters in each condition that verify if the choiceX exists.
Example:
String jpql = "SELECT x FROM tablename x WHERE 1 = 1 ";
Map<String, Object> parameters = new HashMap<>();
if (choice1 != null) {
jpql += "x.column1 = :choice1 ";
parameters.put("choice1", choice1);
}
if (choice2 != null) {
jpql += "x.column2 = :choice2 ";
parameters.put("choice2", choice2);
}
Query query = entityManager.createQuery(jpql);
for (Entry<String, Object> entry : parameters.entrySet()) {
query.setParameter(entry.getKey(), entry.getValue());
}
return query.getResultList();

map reduce function return id instead of count

I m applying map reduce function but facing an issue. In case of one record it returns the id instead of count = 1.
map_func = """function () {
emit(this.school_id, this.student_id);
}"""
reduce_func = """
function (k, values) {
values.length;
}
"""
if school 100 has only one student then it should return school id 100 , value =1 but in this scenario it return
schoolid = 100 , value = 12 ( 12 is its student id in db ). for other records it works fine.
map_func = """function () {
emit({this.school_id, this.student_id},{count:1});
}"""
reduce_func = """
function (k, values) {
var count =0 ;
values.forEach(function(v)
{
count += v['count'];
});
return {count:count};
}
"""
map_func2 = """
function() {
emit(this['_id']['school_id'], {count: 1});
}
"""
http://cookbook.mongodb.org/patterns/unique_items_map_reduce/
i used this example but it uses two maps-reduce function so it took much more time.
It looks like you may be misunderstanding some of the mechanics of mapReduce.
The emit will get called on every document, but reduce will only be called on keys which have more than one value emitted (because the purpose of the reduce function is to merge or reduce an array of results into one).
You map function is wrong - it needs to emit a key and then a value you want - in this case a count.
Your reduce function needs to reduce these counts (add them) but it has to work correctly even if it gets called multiple times (to re-reduce previously reduced results).
I recommend reading here for more details.
if you are trying to count number of students per school :
map = """emit(this.school_id, 1)"""
reduce = """function (key, values) {var total = 0; for (var i = 0; i < values.length; i++) { total += values[i]; } return total;} """

Performance question about Mongo database

today I have tested the Mongo database, but I got a performance issue.
After I insert 1.800.00, I tried to make a sum of all values but it too 57s.
Then I tried the same thing in MSSQL and took 0s!!
Can you give any tips what I'm doing wrong?
Is this a Mango limitation?
static void Main(string[] args)
{
//Create a default mongo object. This handles our connections to the database.
//By default, this will connect to localhost, port 27017 which we already have running from earlier.
var connStr = new MongoConnectionStringBuilder();
connStr.ConnectTimeout = new TimeSpan(1, 0, 0);
connStr.SocketTimeout = new TimeSpan(1, 0, 0);
connStr.Server = new MongoServerAddress("localhost");
var mongo = MongoServer.Create(connStr);
//Get the blog database. If it doesn't exist, that's ok because MongoDB will create it
//for us when we first use it. Awesome!!!
var db = mongo.GetDatabase("blog");
var sw = new Stopwatch();
sw.Start();
//Get the Post collection. By default, we'll use the name of the class as the collection name. Again,
//if it doesn't exist, MongoDB will create it when we first use it.
var collection = db.GetCollection<Post>("Post");
Console.WriteLine(collection.Count());
sw.Stop();
Console.WriteLine("Time: " + sw.Elapsed.TotalSeconds);
sw.Reset();
sw.Start();
var starting = collection.Count();
var batch = new List<Post>();
for (int i = starting; i < starting + 200000; i++)
{
var post = new Post
{
Body = i.ToString(),
Title = "title " + i.ToString(),
CharCount = i.ToString().Length,
CreatedBy = "user",
ModifiedBy = "user",
ModifiedOn = DateTime.Now,
CreatedOn = DateTime.Now
};
//collection.Insert<Post>(post);
batch.Add(post);
}
collection.InsertBatch(batch);
Console.WriteLine(collection.Count());
sw.Stop();
Console.WriteLine("Time to insert 100.000 records: " + sw.Elapsed.TotalSeconds);
//var q = collection.Find(Query.LT("Body", "30000")).ToList();
//Console.WriteLine(q.Count());
sw.Reset();
sw.Start();
var q2 = collection.AsQueryable<Post>();
var sum = q2.Sum(p => p.CharCount);
Console.WriteLine(sum);
sw.Stop();
Console.WriteLine("Time to sum '" + q2.Count() + "' Post records: " + sw.Elapsed.TotalSeconds); //PROBLEM: take 57 to SUM 1.000.000 records
}
}
Performance issue in the following row:
var q2 = collection.AsQueryable<Post>();
In row above you loading all posts from the posts collection into memory, because of driver does not support linq. In MSSQL it's taking less than second because of linq and calculating will go through the database. Here i guess almost all 57 second need to load data into memory.
In mongodb to achieve best performance you need to create extra fields (de normalize data) and calculate any sums,counters, etc whenever it possible. If it not possible you need to use map/reduce or available aggregate functions, like group (good fit for your example of sum calculation).