Generic Querying using Slick - scala

I'm working on an application that uses a generic Slick class to make queries based on information (such as url, user, pass, column count, etc) provided in metadata files or property files. As a result, I am unable to hardcode any information about the tables I will be accessing. Thus, I will be using a lot of raw SQL queries within Slick, and then proceeding to filter and paginate through the data using Slick tools.
My question is this:
In the example provided in Slick's documentation:
import slick.driver.H2Driver.api._
val db = Database.forConfig("h2mem1")
val action = sql"select ID, NAME, AGE from PERSON".as[(Int,String,Int)]
db.run(action)
You see that action has .as[(Int, String, Int)] at the end of it, I'm guessing to tell the compiler what to expect. That makes sense. However, what I'd like to do would require me to know that information based on non-source-code. Is there any way to have the rows returned from the query be some sort of List or Array that I could access with dynamic information (such as index numbers). I'd be willing to accept a List[String] for example to make this less of a type headache.
I'll keep working at it, but as a Slick newbie, I was wondering if anyone more experienced than me would have a solution off the top of their head.

Related

Doobie - Streaming an arbitrary SQL query

The problem is quite simple - I have an SQL query obtained from an external source (thus, neither the query nor the schema (data types) is not known at compile time), and I want to create a Stream of "raw" rows (e.g. Array[AnyRef] or similar, thus deferring the actual type-checking to the stream processing).
However, creating a Query0, e.g. via
val query: String = ...
Query0[Array[AnyRef]](query)
.stream
does not work (quite expectedly), since Array[AnyRef] has no Read instance.
The question is: Should I try to construct my own Read instance for the raw row or use more low-level methods (manually dealing with statement/result set etc. APIs)?

How to construct REST API endpoints with both composite keys and arrays?

Although there are tons of similar questions regarding the REST API design, I am asking a very specific question that I could not found answers in other similar questions.
Suppose that I am trying to GET a list of devices in the database with Building_Type and Room_Type filters. I would like to pass an array of filters, and each filter contains two field as a composite key. I've found standard practice to pass parameter arrays, but I could not find a good way for composite keys in the array.
Example:
GET /api/v1/devices?building_type=Educational&room_type=Office
This GETs all rooms with Educational building type and Office room type. However, I am trying to get a list of rooms for multiple composite combinations of {building_type, room_type}.
I am thinking of something like the following:
GET /api/v1/devices?location[]={building_type=Educational,room_type=Office}&location[]={building_type=Commercial,room_type=Office}&location[]={building_type=Educational,room_type=Classroom}
However this doesn't look like standard practice. I am asking for a better way to design this endpoint. I also don't want POST because this query does not change the state
on the server.
Note:
Please note that the following is incorrect, because I need to filter by an array of composite attributes of {building_type, room_type}.
GET /api/v1/devices?building_type[]=Educational&building_type[]=Commercial&room_type[]=Office&room_type=Classroom
It depends on what your backend can handle, but I would try an array of objects, like:
GET /api/v1/devices?location[][building_type]=Educational&location[][room_type]=Office&location[][building_type]=Commercial&location[][room_type]=ClassRoom
Rails 6 parses this like I expect:
"location"=>[{"building_type"=>"Education", "room_type"=>"Office"}, {"building_type"=>"Commercial", "room_type"=>"ClassRoom"}]
But, as this article goes into, libraries don't handle complex object serialization/deserialization into query params consistently. If your backend doesn't like the above, numerically indexing the array should work (though it's more work to construct from your client code):
GET /api/v1/devices?location[0][building_type]=Educational&location[0][room_type]=Office&location[1][building_type]=Commercial&location[1][room_type]=ClassRoom
If you want something that won't be implementation-dependent, you could also consider URL-encoding a JSON string that represents your search query:
GET /api/v1/devices?query=%7B%22locations%22%3A%20%5B%7B%22building_type%22%3A%20%22Educational%22%2C%20%22room_type%22%3A%20%22Office%22%7D%2C%20%7B%20%22building_type%22%3A%20%22Commercial%22%2C%20%22room_type%22%3A%20%22Office%22%7D%5D%7D
Not pretty, but possibly less frustrating.

GraphQL,Cassandra and denormalization strategy

Would a database like Cassandra and scheme like GraphQL work well together?
Cassandra ideology is based on the idea of optimizing your queries and denormalizing data. This doesn't seem to really mesh well with a GraphQL ideology where data seems to be accessible in every level of a query.
Example:
Suppose I architect my Cassandra table like so:
User:
name
address
etc... (many properties)
Group:
id
name
user_name (denormalized user, where we generally just need the name of a user)
But with GraphQL, it's one wouldn't exactly expect a denormalized User.
query getGroup {
group(id: 1) {
name
users {
name
}
}
}
So a couple of things:
1.) This GraphQL query could end up hitting our Cassandra database multiple times (assuming no caching). Getting the group name and for each of the users we might even hit it for each user. But lets say our resolve creates multiple User objects with one cassandra call.
2.) We can't really build a cassandra idiomatic database with denormalization and graphql in mind, can we? Otherwise we should expect certain properties of a User aren't returned to us with the query.
To sum up the question, what's the graphql strategy for working with denormalized data? Is it acceptable to omit certain properties that the client thinks are accessible? E.g the client tries to access address of user but we don't have that at the moment because our data is denormalized. Or should one not even worry about denormalization and just let graphQL make calls with a caching mechanism in between the db and graphql. E.g graphql first gets the group, then gets the user data for the group id.
This is a side effect of GraphQL where a query can get quite complex in retrieving the data. But as long as the user is actually requesting the data they need if you are smart about your resolvers the end result will actually be faster.
Consider tools like dataloader to cache when resolving a query.
As far as omitting certain properties graphql validates the response and will throw an error, although it will also return the data you gave. It would probably be better to implement some sort of timeout and throw a more descriptive error if there is an issue retrieving the data.

How do we do select query using phantom driver without table defintion

I have streaming of data coming from SparkStreaming. Which i need to process and finally want to store the data in Cassandra. So, earlier i was trying to use SparkCassandra connector. But it doesn't give the access of SparkStreaming Context object on workers. So, I have to use separate cassandra-scala driver. Hence, i ended up with phantom. Now, my question is i have already defined the column family in the cassnandra. So, how do i do the select and update query from scala.
I have followed these documentation link1 but i don't understand why do we need to give the table definition at client (scala code) side. Why can't we just give Keyspace, ClusterPoints and ColumnFamily and be done with it.
object CustomConnector {
val hosts = Seq("IP1", "IP2")
val Connector = ContactPoints(hosts).keySpace("KEYSPACE_NAME")
}
realTimeAgg.foreachRDD{ x => if (x.toLocalIterator.nonEmpty) {
x.foreachPartition {
How to achieve select/insert in Cassandra table here using phantom
}
This is not yet possible using phantom, we are actively working on phantom-spark to allow you to do this, but at this stage in time this is still a few months away.
In the interim, you will have to rely on the spark cassandra connector and use the non type-safe API to achieve this. It's a more unfortunate setup, but in the very near future this will be resolved.

Breeze: complex graph returns only 1 collection

I have a physician graph that looks something like this:
The query I use to get data from a WebApi backend looks like this:
var query = new breeze.EntityQuery().from("Physicians")
.expand("ContactInfo")
.expand("ContactInfo.Phones")
.expand("ContactInfo.Addresses")
.expand("PhysicianNotes")
.expand("PhysicianSpecialties")
.where("ContactInfo.LastName", "startsWith", lastInitial).take(5);
(note the ContactInfo is a pseudonym of the People object)
What I find is that If I request Contact.Phones to be expanded, I'll get just phones and no Notes or Specialties. If I comment out the phones I'll get Contact.Addresses and no other collections. If I comment out ContactInfo along with Phones and Addresses I'll get Notes only etc. Essentially, it seems like I can only get one collection at a time.
So, Is this a built in 'don't let the programmer shoot himself in the foot'?? safeguard or do I have to enable something?
OR is this graph too complicated?? should I consider a NoSql object store??
Thanks
You need to put all your expand clauses in a single one like this:
var query = new breeze.EntityQuery().from("Physicians")
.expand("ContactInfo, ContactInfo.Phones, ContactInfo.Addresses, PhysicianNotes, PhysicianSpecialties")
.where("ContactInfo.LastName", "startsWith", lastInitial).take(5);
You can see the documentation here: http://www.breezejs.com/sites/all/apidocs/classes/EntityQuery.html#method_expand
JY told you HOW. But BEWARE of performance consequences ... both on the data tier and over the wire. You can die a miserable death by grabbing too widely and deeply at once.
I saw the take(5) in his sample. That is crucial for restraining a runaway request (something you really must do also on the server). In general, I would reserve extended graph fetches of this kind for queries that pulled a single root entity. If I'm presenting a list for selection and I need data from different parts of the entity graph, I'd use a projection to get exactly what I need to display (assuming, of course, that there is no SQL View readily available for this purpose).
If any of the related items are reference lists (color, status, states, ...), consider bringing them into cache separately in a preparation step. Don't include them in the expand; Breeze will connect them on the client to your queried entities automatically.
Finally, as a matter of syntax, you don't have to repeat the name of a segment. When you write "ContactInfo.Phones", you get both ContactInfos and Phones so you don't need to specify "ContactInfo" by itself.