Amazon DynamoDB - Joins - nosql

I have a question regarding joins in Amazon Dynamo DB. As Amazon Dynamo DB is a NoSQL database and doesn't supports joins. I am looking for an alternate solution for using a join command for Dynamo DB tables. I am using Dynamo DB with Android SDK.

No way to do joins in Dynamo DB.
Dynamo db table's primary key is a composite of partition key & sort
key. You must need partition key to query in table.
It is not like Relational Database, Dynamo DB table is irrespective
of data type. SO it's quite complicate to use joins in it.
Query each table individually & use values of resultant to proceed with other table.

Since DynamoDB is a NOSQL Database It doesn't support RDBMS. So you cannot join tables in the dynamo db. AWS has other databases which support RDBMS like AWS Aurora.

Disclaimer: I work at Rockset- but I def. see that this can help you solve this issue really easily. Yes, you can't do joins on DynamoDB, but you can indirectly do joins using Dyanmodb integrated with Rockset.
Create integration between dynamo db giving Rockset read permissions.
Write your SQL query with JOIN on the editor
Save the SQL query as a RESTFUL endpoint via Query Lambda on Rockset console.
On your android app, making a HTTP request to that endpoint and get your query:
Assuming you imported all the proper libraries:
public class MainActivity extends AppCompatActivity {
private String url = "https://api.rs2.usw2.rockset.com/v1/orgs/self/ws/commons/lambdas/LambdaName/versions/versionNumber";
private String APIKEY = "ApiKey YOUR APIKEY";
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
new JSONAsyncTask().execute("url");
}
class JSONAsyncTask extends AsyncTask<String, Void, Boolean> {
#Override
protected void onPreExecute() {
super.onPreExecute();
}
#Override
protected Boolean doInBackground(String... urls) {
try {
HttpPost httpPost = new HttpPost(url);
HttpClient httpclient = new DefaultHttpClient();
httpPost.addHeader("Authorization" , APIKEY);
httpPost.addHeader("Content-Type" , "application/json");
HttpResponse response = httpclient.execute(httpPost);
int status = response.getStatusLine().getStatusCode();
if (status == 200) {
HttpEntity entity = response.getEntity();
String data = EntityUtils.toString(entity);
Log.e("foo", data);
JSONObject jsono = new JSONObject(data);
return true;
} else {
Log.e("foo", "error" + String.valueOf(status));
}
} catch (IOException e) {
e.printStackTrace();
} catch (JSONException e) {
e.printStackTrace();
}
return false;
}
protected void onPostExecute(Boolean result) {
}
}
}
From there, you'll get your results as a log and then you can do what you want with that data.
While you can't do JOINS directly on DyanmoDB, you can indirectly with Rockset if you're building data-driven applications.

DynamoDb is a NoSQl database and as such you cant do joins.
However from my experience there isn't anything you can't do if you create a correct design of your database.Use a single table and a combination of primary and secondary keys.
Here are the docs https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html

You cannot use joins in DynamoDB, but you can structure your data in a single table using GSI indexes, so that you can query data in most of the possible way.
So before designing database structure, you will need to make a list of all the queries that will need and design DB structure, mainly set indexes, according to that.

Related

Flink - DynamoDB source

I'm new working with real-time applications. Currently, I'm using AWS Kinesis/Flink and Scala I have the following architecture:
old architecture
As you can see I consume a CSV file using CSVTableSource. Unfortunately, the CSV file became too big for the Flink Job. The file is updated daily, then new rows are added.
So, now I am working in a new architecture, where I want to replace the CSV for a DynamoDB.
new architecture
My question is: what do you recommend to consume the DynamoDB table?
PD: I need the to do a left outer join using the DynamoDB table and the Kinesis Data Stream data
You could use a RichFlatMapFunction to open DynamoDB client and lookup data from DynamoDB. A sample code is given below.
public static class DynamoDBMapper extends RichFlatMapFunction < IN, OUT > {
// Declare Dynamodb table
private Table table;
private String tableName = "";
#Override
public void open(Configuration parameters) throws Exception {
// Initialize DynamoDB client
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
.withRegion(Regions.US_EAST_1)
.build();
DynamoDB dynamoDB = new DynamoDB(client);
this.table = dynamoDB.getTable(tableName);
}
#Override
public void flatMap(IN < T > , Collector < T > out) throws Exception {
// execute getitem
out.collect();
}
}

JBOSS Arquillian : How to force database to throw exception while running the aquillian test?

I am implementing a feature where if there is any exception while writing data into DB, we should retry it for 5 times before failing. I have implemented the feature but not able to test it using arquillian test.
We are using JPA and Versant as database. Till now, I am debbuging the the arquillian test and once my flow reaches DB handler code, I am stopping the database. But this is worst way of testing.
Do you have any suggestion how to achieve the same ?
With JPA in mind, the easiest way is to add method to your data access layer, with which you are able to run native queries. Then you run query against nonexisting table or something similar. So in my DAO utilities I found method like this:
public List findByNativeQuery(String nativeQuery, Map<String, Object> args) {
try{
final EntityManager em = getEntityManager();
final Query query = em.createNativeQuery(nativeQuery);
if (args!=null && args.entrySet().size()>0) {
final Iterator it = args.entrySet().iterator();
while (it.hasNext()) {
final Map.Entry pairs = (Map.Entry)it.next();
query.setParameter(pairs.getKey().toString(), pairs.getValue());
}
}
return query.getResultList();
}
catch (RuntimeException e) {
// throw some new Exception(e.getMessage()); // the best is to throw checked exception
}
}
Native solutions
There is the old trick by dividing by zero in the database. At the time of selection you could try:
select 1/0 from dual;
Insertion time (you need a table):
insert into test_table (test_number_field) values (1/0);
pure JPA solution
You can try to utilize the #Version annotation and decrement it to throw OptimisticLockException. This is not thrown in the database, but in the Java layer, but fullfills your need.
Those all will result in DB fail.

ServiceStack OrmLite - pre and post execution

We are using the awesome & fast OrmLite (ServiceStack) library as our microORM to connect to our PostgreSQL database.
We have TDE encryption enabled in our PostgreSQL database. To ensure that the relevant data is decrypted before we query, we need to execute the following:
Db.ExecuteSql(string.Format("SELECT pgtde_begin_session('{0}');", tdeKey));
and at the end:
Db.ExecuteSql("SELECT pgtde_end_session();");
Instead of inserting these into each of our RequestDto methods, can we instead ensure that these sql statements are executed before and after each call.
You can try using an OrmLite Exec Filter, with something like:
public class PgSqlSecureSessionFilter : OrmLiteExecFilter
{
public override T Exec<T>(IDbConnection db, Func<IDbCommand, T> filter)
{
try
{
db.Execute("SELECT pgtde_begin_session(#tdeKey)", new { tdeKey });
return base.Exec(db, filter);
}
finally {
db.Execute("SELECT pgtde_end_session();");
}
}
}
OrmLiteConfig.ExecFilter = new PgSqlSecureSessionFilter();

Is there a way to use Solr's streaming API with spring data solr?

I have a use case where I need to fetch the ids of my entire solr collection. For that, with solrj, I use the Streaming API like this :
CloudSolrServer server = new CloudSolrServer("zkHost1:2181,zkHost2:2181,zkHost3:2181");
SolrQuery query = new SolrQuery("*:*");
server.queryAndStreamResponse(tmpQuery, handler);
Where handler is a class that implements StreamingResponseCallback, ommited in my code for brevity.
Now, the Spring data repositories abstraction give me the ability to search by pages, by cursors, but I can't seem to find a way to handle the streaming use case.
Is there a workaround ?
SolrTemplate allows to access the underlying SolrClient in a callback style. So you could use that one to work around the current limitations.
The result conversion using the MappingSolrConverter available via the SolrTemplate is broken at the moment (I need to check why) - but you get the idea of how to do it.
solrTemplate.execute(new SolrCallback<Void>() {
#Override
public Void doInSolr(SolrClient solrClient) throws SolrServerException, IOException {
SolrQuery sq = new SolrQuery("*:*");
solrClient.queryAndStreamResponse("collection1", sq, new StreamingResponseCallback() {
#Override
public void streamSolrDocument(SolrDocument doc) {
// the bean conversion fails atm
// ExampleSolrBean bean = solrTemplate.getConverter().read(ExampleSolrBean.class, doc);
System.out.println(doc);
}
#Override
public void streamDocListInfo(long numFound, long start, Float maxScore) {
// do something useful
}
});
return null;
}
});

EF Code First - Recreate Database If Model Changes

I'm currently working on a project which is using EF Code First with POCOs. I have 5 POCOs that so far depends on the POCO "User".
The POCO "User" should refer to my already existing MemberShip table "aspnet_Users" (which I map it to in the OnModelCreating method of the DbContext).
The problem is that I want to take advantage of the "Recreate Database If Model changes" feature as Scott Gu shows at: http://weblogs.asp.net/scottgu/archive/2010/07/16/code-first-development-with-entity-framework-4.aspx - What the feature basically does is to recreate the database as soon as it sees any changes in my POCOs. What I want it to do is to Recreate the database but to somehow NOT delete the whole Database so that aspnet_Users is still alive. However it seems impossible as it either makes a whole new Database or replaces the current one with..
So my question is: Am I doomed to define my database tables by hand, or can I somehow merge my POCOs into my current database and still take use of the feature without wipeing it all?
As of EF Code First in CTP5, this is not possible. Code First will drop and create your database or it does not touch it at all. I think in your case, you should manually create your full database and then try to come up with an object model that matches the DB.
That said, EF team is actively working on the feature that you are looking for: altering the database instead of recreating it:
Code First Database Evolution (aka Migrations)
I was just able to do this in EF 4.1 with the following considerations:
CodeFirst
DropCreateDatabaseAlways
keeping the same connection string and database name
The database is still deleted and recreated - it has to be to for the schema to reflect your model changes -- but your data remains intact.
Here's how: you read your database into your in-memory POCO objects, and then after the POCO objects have successfully made it into memory, you then let EF drop and recreate the database. Here is an example
public class NorthwindDbContextInitializer : DropCreateDatabaseAlways<NorthindDbContext> {
/// <summary>
/// Connection from which to ead the data from, to insert into the new database.
/// Not the same connection instance as the DbContext, but may have the same connection string.
/// </summary>
DbConnection connection;
Dictionary<Tuple<PropertyInfo,Type>, System.Collections.IEnumerable> map;
public NorthwindDbContextInitializer(DbConnection connection, Dictionary<Tuple<PropertyInfo, Type>, System.Collections.IEnumerable> map = null) {
this.connection = connection;
this.map = map ?? ReadDataIntoMemory();
}
//read data into memory BEFORE database is dropped
Dictionary<Tuple<PropertyInfo, Type>, System.Collections.IEnumerable> ReadDataIntoMemory() {
Dictionary<Tuple<PropertyInfo,Type>, System.Collections.IEnumerable> map = new Dictionary<Tuple<PropertyInfo,Type>,System.Collections.IEnumerable>();
switch (connection.State) {
case System.Data.ConnectionState.Closed:
connection.Open();
break;
}
using (this.connection) {
var metaquery = from p in typeof(NorthindDbContext).GetProperties().Where(p => p.PropertyType.IsGenericType)
let elementType = p.PropertyType.GetGenericArguments()[0]
let dbsetType = typeof(DbSet<>).MakeGenericType(elementType)
where dbsetType.IsAssignableFrom(p.PropertyType)
select new Tuple<PropertyInfo, Type>(p, elementType);
foreach (var tuple in metaquery) {
map.Add(tuple, ExecuteReader(tuple));
}
this.connection.Close();
Database.Delete(this.connection);//call explicitly or else if you let the framework do this implicitly, it will complain the connection is in use.
}
return map;
}
protected override void Seed(NorthindDbContext context) {
foreach (var keyvalue in this.map) {
foreach (var obj in (System.Collections.IEnumerable)keyvalue.Value) {
PropertyInfo p = keyvalue.Key.Item1;
dynamic dbset = p.GetValue(context, null);
dbset.Add(((dynamic)obj));
}
}
context.SaveChanges();
base.Seed(context);
}
System.Collections.IEnumerable ExecuteReader(Tuple<PropertyInfo, Type> tuple) {
DbCommand cmd = this.connection.CreateCommand();
cmd.CommandText = string.Format("select * from [dbo].[{0}]", tuple.Item2.Name);
DbDataReader reader = cmd.ExecuteReader();
using (reader) {
ConstructorInfo ctor = typeof(Test.ObjectReader<>).MakeGenericType(tuple.Item2)
.GetConstructors()[0];
ParameterExpression p = Expression.Parameter(typeof(DbDataReader));
LambdaExpression newlambda = Expression.Lambda(Expression.New(ctor, p), p);
System.Collections.IEnumerable objreader = (System.Collections.IEnumerable)newlambda.Compile().DynamicInvoke(reader);
MethodCallExpression toArray = Expression.Call(typeof(Enumerable),
"ToArray",
new Type[] { tuple.Item2 },
Expression.Constant(objreader));
LambdaExpression lambda = Expression.Lambda(toArray, Expression.Parameter(typeof(IEnumerable<>).MakeGenericType(tuple.Item2)));
var array = (System.Collections.IEnumerable)lambda.Compile().DynamicInvoke(new object[] { objreader });
return array;
}
}
}
This example relies on a ObjectReader class which you can find here if you need it.
I wouldn't bother with the blog articles, read the documentation.
Finally, I would still suggest you always back up your database before running the initialization. (e.g. if the Seed method throws an exception, all your data is in memory, so you risk your data being lost once the program terminates.) A model change isn't exactly an afterthought action anyway, so be sure to back your data up.
One thing you might consider is to use a 'disconnected' foreign key. You can leave the ASPNETDB alone and just reference the user in your DB using the User key (guid). You can access the logged in user as follows:
MembershipUser currentUser = Membership.GetUser(User.Identity.Name, true /* userIsOnline */);
And then use the User's key as a FK in your DB:
Guid UserId = (Guid) currentUser.ProviderUserKey ;
This approach decouples your DB with the ASPNETDB and associated provider architecturally. However, operationally, the data will of course be loosely connected since the IDs will be in each DB. Note also there will be no referential constraints, whcih may or may not be an issue for you.