How to persist a document in json format using elasticsearch-dsl - elasticsearch-dsl

I am trying to update an existing elasticsearch data pipeline and would like to use elasticsearch-dsl more fully. In the current process we create a document as a json object and then use requests to PUT the object to the relevant elasticsearch index.
I would now like to use the elasticsearch-dsl save method but am left struggling to understand how I might do that when my object or document is constructed as json.
Current Process:
//import_script.py
index = 'objects'
doc = {"title": "A title", "Description": "Description", "uniqueID": "1234"}
doc_id = doc["uniqueID"]
elastic_url = 'http://elastic:changeme#localhost:9200/' + index + '/_doc/ + doc_id
api = ObjectsHandler()
api.put(elastic_url, doc)
//objects_handler.py
class ObjectsHandler():
def put(self, url, object):
result = requests.put(url, json=object)
if result.status_code != requests.codes.ok:
print(result.text)
result.raise_for_status()
Rather than using this PUT method, I would like to tap into the Document.save functionality available in the DSL but I can't translate the examples in the api documentation for my use case.
I have amended my ObjectsHandler so that it can create the objects index:
//objects_handler.py
es = Elasticsearch([{'host': 'localhost', 'port': 9200}],
http_auth='elastic:changeme')
connections.create_connection(es)
class Object(Document):
physicalDescription = Text()
title = Text()
uniqueID = Text()
class Index:
name = 'objects'
using = es
class ObjectsHandler():
def init_mapping(self, index):
Object.init(using=es, index=index)
This successfully creates an index when I call api.init_mapping(index) from the importer script.
The documentation has this as an example for persisting the individual documents, where Article is the equivalent to my Object class:
# create and save and article
article = Article(meta={'id': 42}, title='Hello world!', tags=['test'])
article.body = ''' looong text '''
article.published_from = datetime.now()
article.save()
Is it possible for me to use this methodology but to persist my pre-constructed json object doc, rather than specifying individual attributes? I also need to be able to specify that the document id is the doc uniqueID.
I've extended my ObjectsHandler to include a save_doc method:
def save_doc(self, document, doc_id, index):
new_obj = Object(meta={'id': doc_id},
title="hello", uniqueID=doc_id,
physicalDescription="blah")
new_obj.save()
which does successfully save the object with uniqueID as id but I am unable to utilise the json object passed in to the method as document.

I've had some success at this by using elasticsearch.py bulk helpers rather than elasticsearch-dsl.
The following resources were super helpful:
Blog - Bulk insert from json objects
SO Answer, showing different ways to add keywords in a bulk action
Elastic documentation on bulk imports
In my question I was referring to a:
doc = {"title": "A title", "Description": "Description", "uniqueID": "1234"}
I actually have an array or list of 1 or more docs eg:
documents = [{"title": "A title", "Description": "Description", "uniqueID": "1234"}, {"title": "Another title", "Description": "Another description", "uniqueID": "1235"}]
I build up a body for the bulk import and append the id:
for document in documents:
bulk_body.append({'index': {'_id': document["uniqueID"]}})
bulk_body.append(document)
then run my new call to the helpers.bulk method:
api_handler.save_docs(bulk_body, 'objects')
with my objects_handler.py file looking like:
//objects_handler.py
from elasticsearch.helpers import bulk
es = Elasticsearch([{'host': 'localhost', 'port': 9200}],
http_auth='elastic:changeme')
connections.create_connection(es)
class Object(Document):
physicalDescription = Text()
title = Text()
uniqueID = Text()
class Index:
name = 'objects'
using = es
class ObjectsHandler():
def init_mapping(self, index):
Object.init(using=es, index=index)
def save_docs(self, docs, index):
print("Attempting to index the list of docs using helpers.bulk()")
resp = es.bulk(index='objects', body=docs)
print("helpers.bulk() RESPONSE:", resp)
print("helpers.bulk() RESPONSE:", json.dumps(resp, indent=4))
This works for single docs in a json format or multiple docs.

Related

Extracting data from JSON files

I have three JSON files "Friends" and followers.
friends: contains the information about friends and their tweets
followers: contains information about followers and their tweets
tweets: contains all tweets
I want to extract the following info and store it in a MongoDB collection named "friends"
id_str,
name,
description,
favorites_count,
followers_count,
friends_count,
language,
location,
screen_name,
url,
utc_offset
the tricky part for me is the "Each user (friend or follower) must contain its tweets in a new field tweet"
any suggestions on how to achieve that?
Here what I am doing at the moment:
JsonSlurper slurper = new JsonSlurper()
def friends = slurper.parseText(new File('./friends.json').text)
def followers = slurper.parseText(new File('./followers.json').text)
def tweets = slurper.parseText(new File('./tweets.json').text)
friends.users.forEach{ fr ->
def frnds = mongo.friends << [
[
id_str: fr.id_str,
name: fr.name,
description: fr.description,
favorites_count: fr.favourite_count,
followers_count: fr.followers_count,
friends_count: fr.friends_count,
language: fr.language,
location: fr.location,
screen_name: fr.screen_name,
url: fr.url,
utc_offset: fr.utc_offset
]
]
}
Error: Exception in thread "main" groovy.lang.MissingPropertyException: No such property: friends for class
you can use mongoose population method to display/store object of your user.
for example
followers:[{
type:Schema.Types.ObjectId,
ref:'follower'
}]
you can use reference user id and store in array just id will going to store in followers array and you can populate all ids in to object, so try to use ref in mongoose model.
this might look little bit confusing but consider looking this at mongoose populate method
and also take a look at this video tutorial.
hope it helped!

Is it possible to extract a set of database rows with RestTemplate?

I am having difficulties getting multiple datasets out of my database with RestTemplate. I have many routines that extract a single row, with a format like:
IndicatorModel indicatorModel = restTemplate.getForObject(URL + id,
IndicatorModel.class);
and they work fine. However, if I try to extract a set of data, such as:
Map<String, List<S_ServiceCoreTypeModel>> coreTypesMap =
restTemplate.getForObject(URL + id, Map.class);
this returns values in a
Map<String, LinkedHashMap<>>
format. Is there an easy way to return a List<> or Set<> in the desired format?
Fundamentally the issue is that your Java object model does not match the structure of your json document. You are attempting to deserialize a single json element into a java List. Your JSON document looks like:
{
"serviceCoreTypes":[
{
"serviceCoreType":{
"name":"ALL",
"description":"All",
"dateCreated":"2016-06-23 14:46:32.09",
"dateModified":"2016-06-23 14:46:32.09",
"deleted":false,
"id":1
}
},
{
"serviceCoreType":{
"name":"HSI",
"description":"High-speed Internet",
"dateCreated":"2016-06-23 14:47:31.317",
"dateModified":"2016-06-23 14:47:31.317",
"deleted":false,
"id":2
}
}
]
}
But you cannot turn a serviceCoreTypes into a List, you can only turn a Json Array into a List. For instance if you removed the unnecessary wrapper elements from your json and your input document looked like:
[
{
"name": "ALL",
"description": "All",
"dateCreated": "2016-06-23 14:46:32.09",
"dateModified": "2016-06-23 14:46:32.09",
"deleted": false,
"id": 1
},
{
"name": "HSI",
"description": "High-speed Internet",
"dateCreated": "2016-06-23 14:47:31.317",
"dateModified": "2016-06-23 14:47:31.317",
"deleted": false,
"id": 2
}
]
You should be able to then deserialize THAT into a List< S_ServiceCoreTypeModel>. Alternately if you cannot change the json structure, you could create a Java object model that models the json document by creating some wrapper classes. Something like:
class ServiceCoreTypes {
List<ServiceCoreType> serviceCoreTypes;
...
}
class ServiceCoreTypeWrapper {
ServiceCoreType serviceCoreType;
...
}
class ServiceCoreType {
String name;
String description;
...
}
I'm assuming you don't actually mean database, but instead a restful service as you're using RestTemplate
The problem you're facing is that you want to get a Collection back, but the getForObject method can only take in a single type parameter and cannot figure out what the type of the returned collection is.
I'd encourage you to consider using RestTemplate.exchange(...)
which should allow you request for and receive back a collection type.
I have a solution that works, for now at least. I would prefer a solution such as the one proposed by Ben, where I can get the HTTP response body as a list of items in the format I chose, but at least here I can extract each individual item from the JSON node. The code:
S_ServiceCoreTypeModel endModel;
RestTemplate restTemplate = new RestTemplate();
JsonNode node = restTemplate.getForObject(URL, JsonNode.class);
JsonNode allNodes = node.get("serviceCoreTypes");
JsonNode oneNode = allNodes.get(1);
ObjectMapper objectMapper = new ObjectMapper();
endModel = objectMapper.readValue(oneNode.toString(), S_ServiceCoreTypeModel.class);
If anyone has thoughts on how to make Ben's solution work, I would love to hear it.

How to document response fields for an object as Map(HashMap)

I am doing documentation for a REST service returning an object like this:
Map<String, HashMap<Long, String>>
and i find no way to describe response fields for such object.
Let's have a look at my code.
The service:
#RequestMapping(value = "/data", method = RequestMethod.GET)
public Map<String, HashMap<Long, String>> getData()
{
Map<String, HashMap<Long, String>> list = dao.getData();
return list;
}
My unit-test-based documentation:
#Test
public void testData() throws Exception
{
TestUtils.beginTestLog(log, "testData");
RestDocumentationResultHandler document = document(SNIPPET_NAME_PATTERN ,preprocessResponse(prettyPrint()));
document.snippets(
// ,responseFields(
// fieldWithPath("key").description("key description").type("String"),
// fieldWithPath("value").description("value as a Hashmap").type("String"),
// fieldWithPath("value.key").description("value.key description").type("String"),
// fieldWithPath("value.value").description("value.value description").type("String"),
// )
String token = TestUtils.performLogin(mockMvc, "user", "password");
mockMvc
.perform(get(APP_BUILD_NAME + "/svc/data").contextPath(APP_BUILD_NAME)
.header("TOKEN", token)
)
.andExpect(status().is(200))
.andExpect(content().contentType("application/json;charset=UTF-8"))
.andExpect(jsonPath("$").isMap())
.andDo(document);
TestUtils.endTestLog(log, "testData");
}
As you can see the code for response fields is commented out since I haven't had any solution for it yet. I am working on that but i really appreciate your help. Thank you in advance.
Your JSON contains a huge number of different fields. There looks to be over 1000 different entries in the map. Each of those entries is itself a map with a single key-value pair. Those keys all appear to vary as well. Potentially, that gives you over 2000 fields to document:
cancel
cancel.56284
year
year.41685
segment_de_clientele
segment_de_clientele.120705
…
This structure is making it hard to document and is also a strong indicator that it will be hard to consume by clients. Ideally, you would restructure the JSON so that each entry has the same keys and it's only the values the vary from entry to entry. Something like this, for example:
{
"translations": [ {
"name": "cancel",
"id": 56284,
"text": "Exit"
}, {
"name": "year",
"id": 41685,
"text": "Year"
}, {
"name": "segment_de_clientele",
"id": 120705,
"text": "Client segment"
}]
}
This would mean that you only have a handful of fields to document:
translations[]
translations[].name
translations[].id
translations[].text
If that's not possible, then I'd stop trying to use the response fields snippet to document the structure of the response. Instead, you should describe its structure manually in your main Asciidoctor source file.
There are two options:
1> Changing MAP to LIST of objects so that the response fields can be described easily.
2> Putting description manually to index.adoc file.
In my case, I go for option 2 because I have to stick to MAP.

Apigility: How to render embedded objects?

How do I render embedded objects in Apigility? For example, if I have a 'user' object and it composes a 'country' object, should I be rendering the 'country' object as an embedded object? And how should I do this?
I am using the Zend\Stdlib\Hydrator\ArraySerializable. My getArrayCopy() method simply returns an array of properties that I want exposed. The array keys are the property names. The array values are the property values. In the case of user->country, the value is an object, not a scalar.
When I return the user object from UserResource->fetch(), here's how it is rendered:
{
"id": "1",
"firstName": "Joe",
"lastName": "Bloggs",
"status": "Active",
"email": "test#example.com",
"country": {
"code": "AU",
"name": "Australia"
},
"settings": "0",
"_links": {
"self": {
"href": "http://api.mydomain.local/users/1"
}
}
}
Note that 'country' is not in an _embedded field. If it is supposed to be in _embedded, I would have thought that Apigility would automatically do that (since it automatically adds the _links object).
As a related issue, how do I go about returning other rel links, such as back, forward, etc?
The easiest way to get Apigility to render embedded resources is when there is an API/resource associated to the embedded object. What I mean for your example is that you'd have an API resource that has a country entity. In that case, if your getArrayCopy returned the the CountryEntity, Apigility would render it automatically as an embedded resource.
If your getArrayCopy is returning country as an array with code and name, you'll end up with what you saw.
For the other part, the rel links for first, last, prev and next will come from the fetchAll method when you return a Paginator. Your collection extends from this already, but it needs an adapter. The code could look something like this:
public function fetchAll($params)
{
// Return a \Zend\Db\Select object that will retrieve the
// stuff you want from the database
$select = $this->service->fetchAll($params);
$entityClass = $this->getEntityClass();
$entity = new $entityClass();
$hydrator = new \Zend\Stdlib\ArraySerializable();
$prototype = new \Zend\Db\ResultSet\HydratingResultSet($hydrator, $entity);
$paginator = new \Zend\Paginator\Adapter\DbSelect($select, $this->sql, $prototype);
$collectionClass = $this->getCollectionClass();
return new $collectionClass($paginator);
}
There are other paginator adapters as well - an ArrayAdapter which will take in an array of however big and then paginate it so you only get the desired number of results. The downside to this if you use it with database results, you'll potentially be retrieving and discarding a lot of results. The DbSelect paginator will modify the $select object to add the limit and order clause automatically so you only retrieve the bits you need. There are also adapters if you're using DbTableGateway, Iterators or even callbacks. You can also implement your own of course.
Hope this helps. If you have more specific needs or clarification, please comment and I'll do my best.
I posted this example on github.
https://github.com/martins-smb/apigility-renderCollection-example
Hope this helps.

Using backbone collections with a rest route that returns an object

Looking at the example code
var accounts = new Backbone.Collection;
accounts.url = '/accounts';
accounts.fetch();
this works if the route returns an array
[{id:1, name:'bob'}, {id:2, name:'joe'}]
but the REST service I'm using returns an object like this
{
items: [{id:1, name:'bob'}, {id:2, name:'joe'}],
page: 1,
href: '/acounts'
}
How to I go about telling Backbone.Collection that the collection is in items?
Parse function seems appropriate.
From the documentation:
http://backbonejs.org/
"When fetching raw JSON data from an API, a Collection will automatically populate itself with data formatted as an array, while a Model will automatically populate itself with data formatted as an object:
[{"id": 1}] ..... populates a Collection with one model.
{"id": 1} ....... populates a Model with one attribute.
However, it's fairly common to encounter APIs that return data in a different format than what Backbone expects. For example, consider fetching a Collection from an API that returns the real data array wrapped in metadata:
{
"page": 1,
"limit": 10,
"total": 2,
"books": [
{"id": 1, "title": "Pride and Prejudice"},
{"id": 4, "title": "The Great Gatsby"}
]
}
In the above example data, a Collection should populate using the "books" array rather than the root object structure. This difference is easily reconciled using a parse method that returns (or transforms) the desired portion of API data:
var Books = Backbone.Collection.extend({
url: '/books',
parse: function(data) {
return data.books;
}
});
"
Hope it helps.