Elastic search - scala case class #timestamp - scala

I want to save data into Elasticsearch using Spark.
I use this connector: https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html#spark-installation
I can save data using saveToEsWithMeta method on RDD with a case class. But when I want to set field named #timestamp I have a problem. I added an attribute name #timestamp into my case class but this attribute is saved with name '$attimestamp' in Elasticsearch instead of '#timestamp'.
I found a workaround using a Map instead of a case class, but do you know a solution using a case class?
Thanks a lot,
BenoƮt

Maybe try this from the documentation you linked to:
For cases where the id (or other metadata fields like ttl or
timestamp) of the document needs to be specified, one can do so by
setting the appropriate mapping namely es.mapping.id. Following the
previous example, to indicate to Elasticsearch to use the field id as
the document id, update the RDD configuration (it is also possible to
set the property on the SparkConf though due to its global effect it
is discouraged):
EsSpark.saveToEs(rdd, "spark/docs", Map("es.mapping.id" -> "id"))

Related

How to dynamically create index on JSON Object properties (JSON Object props are also dynamic)

I have a scenario where I want to dynamically create index on keys of JSON Object (JSON Object attributes will vary). I am able to store the JSON Object as index (by implementing FieldBridge).
eg1: preference:{"sport":"football", "music":"pop")
eg2: preference:{"sport":"cricket", "music":"jazz", "cuisine":"mexican"}
But I am unable to query the individual fields like:
preference.sport
or preference.cuisine
Is there any way / configuration in hibernate search through which we can achieve that?
If your fields are dynamic, there is no pre-defined schema and Hibernate Search is unable to determine how to query these fields. There are significant differences in how a match query should be executed on a text field or a date field, for example.
For that reason, you cannot use the Hibernate Search Query DSL to build your queries.
However, you can use native APIs.
If you're using the Lucene integration, just creating the relevant queries yourself will work fine (as long as you create the right one):
new TermQuery(new Term("sport", "value"))
If you're using the experimental Elasticsearch integration, you can use org.hibernate.search.elasticsearch.ElasticsearchQueries.fromJson( ... ). You will have to write the whole query as JSON, though, and will not be able to take advantage of the Hibernate Search QueryBuilder at all, even for queries on statically defined fields. See https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_queries
Better support for native queries, as well as dynamic fields with pre-defined types, which would be targetable in the Query DSL, is planned for Hibernate Search 6, but it's not there yet. See HSEARCH-3273.

Partial inserts with Cassandra and Phantom DSL

I'm building a simple Scala Play app which stores data in a Cassandra DB using the Phantom DSL driver for Scala. One of the nice features of Cassandra is that you can do partial updates i.e. so long as you provide the key columns, you do not have to provide values for all the other columns in the table. Cassandra will merge the data into your existing record based on the key.
Unfortunately, it seems this doesn't work with Phantom DSL. I have a table with several columns, and I want to be able to do an update, specifying values just for the key and one of the data columns, and let Cassandra merge this into the record as usual, while leaving all the other data columns for that record unchanged.
But Phantom DSL overwrites existing columns with null if you don't specify values in your insert/update statement.
Does anybody know of a work-around for this? I don't want to have to read/write all the data columns every time, as eventually the data columns will be quite large.
FYI I'm using the same approach to my Phantom coding as in these examples:
https://github.com/thiagoandrade6/cassandra-phantom/blob/master/src/main/scala/com/cassandra/phantom/modeling/model/GenericSongsModel.scala
It would be great to see some code, but partial updates are possible with phantom. Phantom is an immutable builder, it will not override anything with null by default. If you don't specify a value it won't do anything about it.
database.table.update.where(_.id eqs id).update(_.bla setTo "newValue")
will produce a query where only the values you've explicitly set to something will be set to null. Please provide some code examples, your problem seems really strange as queries don't keep track of table columns to automatically add in what's missing.
Update
If you would like to delete column values, e.g set them to null inside Cassandra basically, phantom offers a different syntax which does the same thing:
database.table.delete(_.col1, _.col2).where(_.id eqs id)`
Furthermore, you can even delete map entries in the same fashion:
database.table.delete(_.props("test"), _.props("test2").where(_.id eqs id)
This assumes props is a MapColumn[Table, Record, String, _], as the props.apply(key: T) is typesafe, so it will respect the keytype you define for the map column.

Discard values while inserting and updating data using slick

I am using slick with play2.
I have multiple fields in the database which are managed by the database. I don't want to create or update them, however I want to get them while reading the values.
For example, suppose I have
case class MappedDummyTable(id: Int, .. 20 other fields, modified_time: Optional[Timestamp])
which maps Dummy in the database. modified_time is managed by the database.
The problem is during insert or update, I create an instance of MappedDummyTable without the modified time attribute and pass it to slick for create/update like
TableQuery[MappedDummyTable].insert(instanceOfMappedDummyTable)
For this, Slick creates query as
Insert INTO MappedDummyTable(id,....,modified_time) Values(1,....,null)
and updates the modified_time as NULL, which I don't want. I want Slick to ignore the fields while updating and creating.
For updating, I can do
TableQuery[MappedDummyTable].map(fieldsToBeUpdated).update(values)
but this leads to 20 odd fields in the map method which looks ugly.
Is there any better way?
Update:
The best solution that I found was using multiple projection. I created one projection to get the values and another to update and insert the data
maybe you need to write some triggers in table if you don't want to write code like row => (row.id,...other 20 fields)
or try use None instead of null?
I believe that the solution with mapping non-default field is the only way to do it with Slick. To make it less ugly you can define function ignoreDefaults on MappedDummyTable that will return only non default value and function in companion object to MappedDummyTable case class that returns projection
TableQuery[MappedDummyTable].map(MappedDummyTable.ignoreDefaults).insert(instanceOfMappedDummyTable.ignoreDefaults)

Neo4j batch import __types__ index for Spring Data

I have a problem with initial data import to use Spring Data in future. I use Neo4j (CSV) Batch Importer (https://github.com/jexp/batch-import). The main problem is in '__types__' index usage.
It seems that Spring Data uses 'className' as a key for that index, but the field is called '__type__'. I tried to declare field __type__:string:__types__, but it creates field __type__ and index __types__ with the same key (__type__) as the field name.
Is it possible to set 'className' as a key for the index or to change Spring Data behavior to use '__type__' as index key?
Thank you.
Good question, right now it is not possible to have a different key for the index-field and type value in the compound approach
But you can specify a different index-file, see:
and provide the file for the the types index with a column className and a value of the FQN.
https://github.com/jexp/batch-import#explicit-indexing
https://github.com/jexp/batch-import#examples-1

Rogue query orderAsc with variable field according to its name

I am using Rogue/Lift Mongo record to query MongoDb. I am trying to create different query according to the sort field name. I have therefore a string name of the field that I want to use to sort the results.
I have tried to use Record.fieldByName in OrderAsc:
...query.orderAsc (elem => elem.fieldByName(columnName).open_!)
but I obtain "no type parameter for orderAsc".
How can I make it working? Honestly all the type programming in Rogue is quite difficult to follow.
Thanks
The problem is that you cannot dynamically generate a query with Rogue easily. As solution I used Lift Mongo Db that allows the usage of strings (without compile checking) for these kind of operations that requires dynamic sorting.