Using Elasticsearch with Mongodb-River for searching pdf - mongodb

I want to search pdf files by content, but the resulting can't read properly the content of pdf file. It looks as following:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.0,
"hits": [
{
"_index": "mongoindex",
"_type": "files",
"_id": "532595b8f37d5cc2d64a517d",
"_score": 1.0,
"_source": {
"content": {
"content_type": "application/pdf",
"title": "D:/sample.pdf",
"content": "JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhlbi1VUykgPj4NCmVuZG9iag0KMiAwIG9iag0",
"filename": "D:/sample.pdf",
"contentType": "application/pdf",
"md5": "afe70f97bce7876e39aa43f71dc7266f",
"length": 82441,
"chunkSize": 262144,
"uploadDate": "2014-03-16T12:14:48.542Z",
"metadata": {}
}
}
}
]
}
}
Could you please help me find my mistake?
Here is the link I used:
http://v.bartko.info/?p=463

Your attachment has been encoded in BASE64.
You need to decode it on a client level.

Related

Grafana JSON API datasource - parse response

I succesfully get data from datasource using plugin: https://grafana.github.io/grafana-json-datasource/
And my response looks like this (I see that in Chrome console log):
{
"count": 4,
"results": [
{
"time": "2022-12-06T17:52:30.142Z",
"deviceId": "3021EF",
"humidity": 49,
"messageId": "1ed82601-e98a-4b78-a580-0c8a5ddd1e99"
},
{
"time": "2022-12-06T17:52:30.142Z",
"humidity": 45,
"deviceId": "3021EF",
"messageId": "1ed82601-e98a-4b78-a580-0c8a5ddd1e98"
},
{
"time": "2022-12-06T18:27:34.768Z",
"deviceId": "3021EF",
"humidity": 49,
"messageId": "1ed82601-e98a-4b78-a580-0c8a5ddd1e97"
},
{
"time": "2022-12-06T18:27:34.768Z",
"deviceId": "3021EF",
"temperature": 21.6,
"messageId": "1ed82601-e98a-4b78-a580-0c8a5ddd1e96"
}
],
"fields": [
"time",
"deviceId",
"humidity",
"messageId"
]
}
But I can not parse this data using explore:
What is wrong?

ElasticSearch autocomplete for keywords from a string

My document looks like:
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "test_db2",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"name": "very cool shoes",
"price": 26
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "2",
"_score": 1,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_db2",
"_type": "test",
"_id": "3",
"_score": 1,
"_source": {
"name": "shirt",
"price": 25
}
}
]
}
How to create autocomplete in elasticsearch like for example:
I put in input word "sh" , after that I should see result
shoes
shampoo
shirt
.....
Example of what I need
Take a look at ngrams. Or actually, edge ngrams are probably all you need.
Qbox has a couple of blog posts about setting up autocomplete with ngrams, so for a more in-depth discussion I would refer you to these:
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
But just very quickly, this should get you started.
First I set up the index:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"edgengram_filter"
]
}
},
"filter": {
"edgengram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
},
"price":{
"type": "integer"
}
}
}
}
}
Then I indexed your documents:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"name": "very cool shoes","price": 26}
{"index":{"_id":2}}
{"name": "great shampoo","price": 15}
{"index":{"_id":3}}
{"name": "shirt","price": 25}
Now I can get autocomplete results with a simple match query:
POST /test_index/_search
{
"query": {
"match": {
"name": "sh"
}
}
}
which returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.30685282,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.30685282,
"_source": {
"name": "shirt",
"price": 25
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.19178301,
"_source": {
"name": "great shampoo",
"price": 15
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"name": "very cool shoes",
"price": 26
}
}
]
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/0886488ddfb045c69eed67b15e9734187c8b2491

How get date from elasticsearch with one format?

I have create an index like that:
PUT twitter
PUT twitter/_mapping/myType
{
"myType" : {
"properties" : {
"message" : {"type" : "date",
"date_detection": true,
"store" : true }
}
}
}
Then I put several documents:
POST twitter/myType
{
"message":123456
}
I have this document and other with message values: "123456",-123456,"2014-01-01","-123456" (Note string and numeric difference here). Only document with value "12#3454" failed to put.
So now I execute:
GET twitter/myType/_search?pretty=true&q=*:*
And results are:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5JvFsHvhUOO_5MdfCv",
"_score": 1,
"_source": {
"message": -123456
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5Ju6aOvhUOO_5MdfCs",
"_score": 1,
"_source": {
"message": "123456"
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5Ju0KOvhUOO_5MdfCq",
"_score": 1,
"_source": {
"message": "2014-01-01"
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5JvDiGvhUOO_5MdfCu",
"_score": 1,
"_source": {
"message": "-123456"
}
}
]
}
}
Why I get these value in date fields instead of string value - ISODateTimeFormat.dateOptionalTimeParser? Is there a way to get all date with one format (e.g. string or millis)?
Elasticsearch version is 1.4.3
That's the _source you are seeing, meaning the exact JSON you indexed, no formatting, nothing.
If you want to see what actually ES indexed (meaning the date in milliseconds), you can use fielddata_fields:
GET /twitter/myType/_search
{
"query": {
"match_all": {}
},
"fielddata_fields": [
"message"
]
}
And the answer to your question is that is not actually available out-of-the-box. You need to use script_fields:
GET /twitter/myType/_search
{
"query": {
"match_all": {}
},
"fielddata_fields": [
"message"
],
"_source": "*",
"script_fields": {
"my_script": {
"script": "new Date(doc[\"message\"].value)"
}
}
}
Also, your mapping is wrong: date_detection should be put in the type not in the field:
PUT twitter
{
"mappings": {
"myType": {
"date_detection": true,
"properties": {
"message": {
"type": "date",
"store": true
}
}
}
}
}
And from the output below you'll see how ES treats those numbers you put in there:
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk5",
"_score": 1,
"_source": {
"message": "123456"
},
"fields": {
"message": [
3833727840000000
],
"my_script": [
"123456-01-01T00:00:00.000Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk4",
"_score": 1,
"_source": {
"message": 123456
},
"fields": {
"message": [
123456
],
"my_script": [
"1970-01-01T00:02:03.456Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk8",
"_score": 1,
"_source": {
"message": "-123456"
},
"fields": {
"message": [
-3958062278400000
],
"my_script": [
"-123456-01-01T00:00:00.000Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk7",
"_score": 1,
"_source": {
"message": "2014-01-01"
},
"fields": {
"message": [
1388534400000
],
"my_script": [
"2014-01-01T00:00:00.000Z"
]
}
},
{
"_index": "twitter",
"_type": "myType",
"_id": "AU5J93Q-I7tQJ10g6jk6",
"_score": 1,
"_source": {
"message": -123456
},
"fields": {
"message": [
-123456
],
"my_script": [
"1969-12-31T23:57:56.544Z"
]
}
}
]

Custom API in Elastic search

Building my first RESTful api, and thought I'd try elasticsearch for a base. Is there a way customize the API in Elasticsearch to only return certain fields from results of a query. For instance if I have data with fname, lname, city, state, zip, email and I only want to return a list of fnames and cities for every query matching the city field. So something like this:
curl -XPOST "http://localhost:9200/custom_call/_search" -d'
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
}
}'
Would ideally return something like:
{"took": 52, "timed_out": false, "_shards": {
"total": 35,
"successful": 35,
"failed": 0
}, "hits": {
"total": 1,
"max_score": 0.375,
"hits": [
{
"_index": "persons",
"_type": "person",
"_id": "6",
"_score": 0.375,
"_source": {
"fname": "Bob",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "13",
"_score": 0.375,
"_source": {
"fname": "Sue",
"city": "Toronto",
}
},
{
"_index": "persons",
"_type": "person",
"_id": "21",
"_score": 0.375,
"_source": {
"fname": "Jose",
"city": "Toronto",
}
}
]
}}
Not sure if Elasticsearch is set up to do this or even if you would want it to. My first foray into building a RESTful API. I figure if NPR StackOverflow like it, its worth a shot! Thanks for the help.
Yes you can, I think you haven't tried to find out on your own.
Here is how to do that,
POST localhost:9200/index/type/_search
{
"query": {
"query_string": {
"query": "Toronto",
"fields": ["city"]
}
},
"_source" :["fields_you_want_to_get"]
}
The term you are looking is source filtering.

Elasticsearch nested query

I'm new to elasticsearch, managed to set it up and import recordset from my mongodb collection using the river plugin. For a start, I want to query against the "desc" field but just can't manage to get the query .. not sure if the problem is driven by the way index was defined.. can anyone help please?
Sample recordset in elastic search looks like this
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 107209,
"max_score": 1,
"hits": [
{
"_index": "shiv",
"_type": "shiv",
"_id": "iG1eIzN7RGO7hFfxTlnLuA",
"_score": 1,
"_source": {
"_id": {
"$oid": "50901d7f485bf7bd1c000021"
},
"brand": "",
"category": {
"$ref": "categories",
"$id": {
"$oid": "4fbd2221758cb11d14000174"
}
},
"comments": [],
"count_comment": 0,
"count_fav": 2,
"count_hotness": 1.46,
"count_rekick": 0,
"count_share": 0,
"country": {
"$ref": "countries",
"$id": {
"$oid": "4fec98f7758cb18c6e0002c9"
}
},
"currency": "pound",
"desc": "A men's automatic watch, this Seamaster Bond model features a Co-Axial escapement and date function. Its blue dial is teamed with a stainless steel case and bracelet for a look that's sporty and refined.",
"gender": "male",
"ident": "omega-seamaster-diver-bond-men-s-automatic-watch---ernest-jones-1351622015",
"img_url": "http://s7ondemand4.scene7.com/is/image/Signet/5735793?$detail$",
"lifestyles": [
{
"$ref": "lifestyles",
"$id": {
"$oid": "508ff6ca485bf73112000060"
}
}
],
"location": "United Kingdom",
"owner": {
"$ref": "accounts",
"$id": {
"$oid": "50742fd8485bf74b7a00213f"
}
},
"price": 2400,
"store": "ernestjones.co.uk",
"tags": [
"ernest-jones",
"bond"
],
"timestamp_creation": 1351622015,
"timestamp_exp": 1356825600,
"timestamp_update": 1351622015,
"title": "Omega Seamaster Diver Bond men's automatic watch - Ernest Jones",
"url": "http%3A%2F%2Fwww.ernestjones.co.uk%2Fwebstore%2Fd%2F5735793%2Fomega%20seamaster%20diver%20bond%20men%27s%20automatic%20watch%2F%3Futm_source%3Dgooglebase%26utm_medium%3Dfeedmanager%26cm_mmc%3DFroogle-_-CKB-_-nurses_fobs-_-watches%26cm_mmca1%3Domega%26cm_mmca2%3Dmale%26cm_mmca3%3Dadult"
}
}
]
}
}
The mapping of the index "shiv" looks like
{
"shiv": {
"properties": {
"$oid": {
"type": "string"
}
}
}
}
Thanks again
There are lots of ways to query, have you tried a match query?
Using curl or a rest client of your choice...
http://[host]:9200/[index_name]/[doc_type]/_search
{
"query" : {
"match" : {
"desc" : "some value you want to find in desc"
}
}
}