Getting a JSONParseException when indexing fields from MongoDB collection in SOLR using DataImportHandler - mongodb

I am seeing this exception while I am trying to index data from MongoDB collection :
Exception while processing: products document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: com.mongodb.util.JSONParseException:
{idStr,name,code,description,price,brand,size,color}
^
at org.apache.solr.handler.dataimport.MongoEntityProcessor.initQuery(MongoEntityProcessor.java:46)
at org.apache.solr.handler.dataimport.MongoEntityProcessor.nextRow(MongoEntityProcessor.java:54)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:244)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:481)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:462)
Caused by: com.mongodb.util.JSONParseException:
{idStr,name,code,description,price,brand,size,color}
^
at com.mongodb.util.JSONParser.parseString(JSON.java:387)
Following is my data-source-config file in dataimport directory in conf folder of my core :
<dataConfig>
<dataSource name="mymongodb" type="MongoDataSource" database="mongodb://*.*.*.*/testdb" />
<document name="data">
<entity
name="products"
processor="MongoEntityProcessor"
query="{idStr,name,code,description,price,brand,size,color}"
collection="products"
datasource="mymongodb"
transformer="MongoMapperTransformer" >
<field column="idstr" name="idstr" mongoField="idStr"/>
<field column="name" name="name" mongoField="name"/>
<field column="code" name="code" mongoField="code"/>
<field column="description" name="description" mongoField="description"/>
<field column="price" name="price" mongoField="price"/>
<field column="brand" name="brand" mongoField="brand"/>
<field column="size" name="size" mongoField="size"/>
<field column="color" name="color" mongoField="color"/>
<entity
name="categories"
processor="MongoEntityProcessor"
query="{'idStr':'${categories.idstr}'}"
collection="categories"
datasource="mymongodb"
transformer="MongoMapperTransformer">
<field column="type" name="type" mongoField="type"/>
</entity>
</entity>
</document>
</dataConfig>
I am trying to join the field idStr of categories collection with the idStr of products collection(field name => idstr) and get the above fields ( name,description,... from products and type field from categories).
Any comments/solution on this exception would be really appreciated.Thanks!

Your SOLR field is declared as idstr but you are referencing it in the query attribute of dataConfig as idStr (camelcase difference).

I was able to resolve this ...
Following is the working configuration in the data-source-config file :
<entity
name="products"
query="select idStr,name,code,description,price,brand,size,color from products">
<field name="prodidStr" column="idStr" />
<field name="name" column="name" />
<field name="code" column="name" />
<field name="description" column="description" />
<field name="price" column="price" />
<field name="brand" column="brand" />
<field name="size" column="size" />
<field name="color" column="color" />
<entity
name="categories"
dataSource="mongod"
query="select idStr,ancestors from categories where idStr = '${products.idStr}'">
<field name="catidStr" column="idStr" />
<field name="ancestors" column="ancestors" />
</entity>
</entity>

Related

xpath extract field name and "column" name from jdo mapping

First time dealing with xpath and XML data. I have below xpath query that I got through some Stack Overflow answers. Below, I want to extract all the column names
with t(x) as (
values
('<?xml version="1.0" encoding="UTF-8"?>
<mapping>
<package name="mypackage">
<class name="mytable">
<jdbc-class-map type="base" pk-column="id" table="public.mytable" />
<jdbc-version-ind type="version-number" column="version" />
<jdbc-class-ind type="myclass" column="jdoclass" />
<field name="majorVersion">
<jdbc-field-map type="value" column="majorversion" />
</field>
<field name="minorVersion">
<jdbc-field-map type="value" column="minorversion" />
</field>
<field name="patchVersion">
<jdbc-field-map type="value" column="patchversion" />
</field>
<field name="version">
<jdbc-field-map type="value" column="version0" />
</field>
<field name="webAddress">
<jdbc-field-map type="value" column="webaddress" />
</field>
</class>
</package>
</mapping>'::xml)
)
select
unnest(xpath('./package/class/field/text()', x)) as "fieldname",
unnest(xpath('./package/class/field/jdbc-field-map/text()', x)) as "columns"
from t
The above query returns fieldname empty and coluns as null. I understand there is some problem with the XML path.
I expect to see field name and column lists
fieldName columns
--------------------------
majorversion majorversion
minorversion minorversion
...
If you want to turn XML into a "table", this is typically done much easier using xmltable()
select info.*
from t
cross join xmltable('/mapping/package/class/field' passing x
columns fieldname text path '#name',
"column" text path './jdbc-field-map/#column') as info
Online example
I was able to achieve the result by
with myTempTable(myXmlColumn) as (
values ('<?xml version="1.0" encoding="UTF-8"?>
<mapping>
<package name="mypackage">
<class name="mytable">
<jdbc-class-map type="base" pk-column="id" table="public.mytable" />
<jdbc-version-ind type="version-number" column="version" />
<jdbc-class-ind type="myclass" column="jdoclass" />
<field name="majorVersion">
<jdbc-field-map type="value" column="majorversion" />
</field>
<field name="minorVersion">
<jdbc-field-map type="value" column="minorversion" />
</field>
<field name="patchVersion">
<jdbc-field-map type="value" column="patchversion" />
</field>
<field name="version">
<jdbc-field-map type="value" column="version0" />
</field>
<field name="webAddress">
<jdbc-field-map type="value" column="webaddress" />
</field>
</class>
</package>
</mapping>'::xml))
SELECT
unnest(xpath('//package/class/field/jdbc-field-map/#column', myTempTable.myXmlColumn))::text AS columns,
unnest(xpath('//package/class/field//#name', myTempTable.myXmlColumn))::text AS fieldName
FROM myTempTable
result
fieldName columns
--------------------------
"majorversion" "majorVersion"
"minorversion" "minorVersion"
"patchversion" "patchVersion"
"version0" "version"
"webaddress" "webAddress"

Solr Dataimport nested entity from PostgreSQL

I wanna create nested entity with DataImportHandler.
I use Solr 8.6, Postgress 12, openjdk-11.
My config (schema.xml) looks like this:
<schema name="products" version="1.5">
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_root_" type="int" indexed="true" stored="false"/>
<uniqueKey>id</uniqueKey>
<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="price" type="float" indexed="true" required="true" stored="true"/>
<field name="categories" type="int" indexed="false" stored="true" required="true" multiValued="true"/>
<field name="pictures" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="pid" type="int" indexed="true" stored="true" />
<field name="previewUrl " type="string" indexed="true" stored="true" />
</schema>
data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource"
driver="org.postgresql.Driver"
url="jdbc:postgresql://${db.host}/myDB"
user="user"
password="myPassword"
/>
<document>
<entity name="products"
pk="id"
transformer="DateFormatTransformer"
query="SELECT * from products"
deltaQuery="SELECT id FROM products WHERE updated > '${dataimporter.last_index_time}'::timestamp"
deltaImportQuery="SELECT * FROM products WHERE id=${dataimporter.delta.id}"
/>
<field column="id" name="id"/>
<field column="price" name="price"/>
<entity name="categories"
query="SELECT category_id FROM product_category WHERE product_id='${products.id}'">
<field column="category_id" name="categories"/>
</entity>
<entity name="pictures"
child="true"
pk="pid"
query="SELECT * FROM pictures WHERE product_id='${products.id}'"
>
<field column="id" name="pid"/>
<field column="preview_url" name="previewUrl"/>
</entity>
</entity>
</document>
</dataConfig>
This is the result I expect:
[
{
"id":1,
"price": 10,
"categories": [1, 2]
"pictures": [
{
"pid":1,
"previewUrl":"/url"
},
{
"pid":2,
"previewUrl":"/url"
},
]
"_version_":1674819829308063744
}
]
But I get the following error:
org.apache.solr.common.SolrException: [doc=null] missing required field: price
What am I doing wrong?

FIX 4.2 Tag Not Defined

20180216-17:21:04.640 : 8=FIX.4.2;9=115;35=V;34=3;49=SNDJ;52=20180216-17:21:04.640;56=BROKER;55=EUR/USD;146=1;262=676;263=1;264=1;265=1;266=Y;267=1;269=0;10=061;
20180216-17:21:04.641 : 8=FIX.4.2;9=119;35=3;34=3;49=BROKER;52=20180216-17:21:04.641;56=SNDJ;45=3;58=Tag not defined for this message type;371=55;372=V;373=2;10=237;
I receive 'tag not defined for this message type' rejects (35=3) when attempting to send 35=V messages. I have added and removed ValidateUserDefinedFields, ValidateFieldsOutOfOrder.
I have razed the group structure, re-added it, redefined both Symbol and NoRelatedSym types (to string, symbol, int, numingroup etc.), changed the symbol being sent to EURUSD, TEST, etc. and nothing works.
Have I missed something very simple here? It seems related to the fact that the request message puts the symbol tag ahead of the group, but I do not know why.
MarketDataRequest.h:
FIELD_SET(*this, FIX::NoRelatedSym);
class NoRelatedSym: public FIX::Group
{
public:
NoRelatedSym() :
FIX::Group(146,55,FIX::message_order(55,65,48,22,167,200,205,201,2
02,206,231,223,207,106,348,349,107,350,351,336,0)) {}
FIELD_SET(*this, FIX::Symbol);
....
};
My current FIX 4.2 .xml set up for MarketDataRequest messages looks like:
<message name='MarketDataRequest' msgtype='V' msgcat='app'>
<field name='MDReqID' required='Y' />
<field name='SubscriptionRequestType' required='Y' />
<field name='MarketDepth' required='Y' />
<field name='MDUpdateType' required='N' />
<field name='AggregatedBook' required='N' />
<group name='NoMDEntryTypes' required='Y'>
<field name='MDEntryType' required='Y' />
</group>
<group name='NoRelatedSym' required='Y'>
<field name='Symbol' required='Y' />
<field name='SymbolSfx' required='N' />
<field name='SecurityID' required='N' />
<field name='IDSource' required='N' />
<field name='SecurityType' required='N' />
<field name='MaturityMonthYear' required='N' />
<field name='MaturityDay' required='N' />
<field name='PutOrCall' required='N' />
<field name='StrikePrice' required='N' />
<field name='OptAttribute' required='N' />
<field name='ContractMultiplier' required='N' />
<field name='CouponRate' required='N' />
<field name='SecurityExchange' required='N' />
<field name='Issuer' required='N' />
<field name='EncodedIssuerLen' required='N' />
<field name='EncodedIssuer' required='N' />
<field name='SecurityDesc' required='N' />
<field name='EncodedSecurityDescLen' required='N' />
<field name='EncodedSecurityDesc' required='N' />
<field name='TradingSessionID' required='N' />
</group>
</message>
Configuration Settings:
[DEFAULT]
BeginString=FIX.4.2
ReconnectInterval=60
SocketAcceptPort=7091
SenderCompID=SNDJ
TargetCompID=BROKER
SocketNodelay=Y
PersistMessage=Y
FileStorePath=cache
FileLogPath=log
[SESSION]
ConnectionType=acceptor
StartTime=00:30:00
EndTime=23:30:00
ReconnectInterval=30
HeartBtInt=15
SocketAcceptPort=7091
SocketReuseAddress=Y
DataDictionary=spec/FIX42.xml
AppDataDictionary=spec/FIX42.xml
SenderCompID=BROKER
TargetCompID=SNDJ
FileStorePath=cache
FileLogPath=log
[SESSION]
BeginString=FIX.4.2
ConnectionType=initiator
StartTime=00:30:00
EndTime=23:30:00
ReconnectInterval=15
HeartBtInt=15
SocketConnectPort=7091
SocketConnectHost=127.0.0.1
DataDictionary=spec/FIX42.xml
AppDataDictionary=spec/FIX42.xml
SenderCompID=SNDJ
TargetCompID=BROKER
FileStorePath=cache
FileLogPath=log
Thanks
The message that you are sending is invalid per your own DD.
Look at the first body fields after the header ends:
55=EUR/USD;146=1;262=676;...
That 55 field is supposed to be inside the 146 repeating group, but its placement puts it prior to the group.
I suspect your config may be to blame. If you update your question to include the config, I will probably be able to see what's wrong and update this answer.
UPDATE:
You are missing UseDataDictionary=Y from your config, though that's not the cause of your problem. (You need it to receive messages correctly, though.)
Also, you don't need AppDataDictionary -- that's only for FIX 5+.

Data type not working in Solr

I wanna fetch records including a date type from Cassandra in solr, the following are my codes:
in dataconfig.xml:
<entity name="artist" query="SELECT artist_id, name, email, total_jobs, created FROM artist_list">
<field column="artist_id" template="ARTIST_${artist.artist_id}" name="id"/>
<field column="created" name="artist_created" />
</entity>
in schema.xml:
<fieldType name="tdate" class="solr.TrieDoubleField" omitNorms="true" />
<field name="artist_created" type="tdate" indexed="false" stored="true"/>
But the result did not contain created field. Is there anyone can tell me what the problem is? Thanks very much!
You are defining tdate data type as solr.TrieDoubleField. That's why result don't contain artist_created data.
Change your schema to :
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<field name="artist_created" type="date" indexed="false" stored="true"/>

Solr indexing of MongoDB collection

Suppose I have a test application representing some friends list. The application uses a collection where all documents are in the following format:
_id : ObjectId("someString"),
name : "George",
description : "some text",
age : 35,
friends : {
[
{
name: "Peter",
age: 30
town: {
name_town: "Paris",
country: "France"
}
},
{
name: "Thomas",
age: 25
town: {
name_town: "Berlin",
country: "Germany"
}
}, ... // more friends
]
}
... // more documents
How can I describe such collection in the schema.xml ? I need to produce facet queries like: "Give me countries, where George's friends live". Another use case may be - "Return all documents(persons), whose friend is 30 years old." etc.
My initial idea is to mark "friends" attribute as text field by this schema.xml definition:
<fieldType name="text_wslc" class="solr.TextField" positionIncrementGap="100">
....
<field name="friends" type="text_wslc" indexed="true" stored="true" />
and try to search for eg. "age" and "30" words in the text, but it is not a very reliable solution.
Please, leave aside not logically well-formed architecture of the collection. It is only an example of similar problem I am just facing.
Any help or idea will be highly appreciated.
EDIT:
Sample 'schema.xml'
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="text-schema" version="1.5">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
<fieldType name="trInt" class="solr.TrieIntField" precisionStep="0" omitNorms="true" />
<fieldType name="text_p" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="_id" type="string" indexed="true" stored="true" required="true" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_ts" type="long" indexed="true" stored="true"/>
<field name="ns" type="string" indexed="true" stored="true"/>
<field name="description" type="text_p" indexed="true" stored="true" />
<field name="name" type="text_p" indexed="true" stored="true" />
<field name="age" type="trInt" indexed="true" stored="true" />
<field name="friends" type="text_p" indexed="true" stored="true" /> <!-- Here is the problem - when the type is text_p, all fields are considered as a text; optimal solution would be something like "collection" tag to mark name_town and town as descendant of the field 'friends' but unfortunately, this is not how the solr works-->
<field name="town" type="text_p" indexed="true" stored="true"/>
<field name="name_town" type="string" indexed="true" stored="true"/>
<field name="town" type="string" indexed="true" stored="true"/>
</fields>
<uniqueKey>_id</uniqueKey>
As Solr is document-centric you will need to flatten as much as you can down. According to the sample you have given, I would create a schema.xml like the one below.
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="friends" version="1.0">
<fields>
<field name="id"
type="int" indexed="true" stored="true" multiValued="false" />
<field name="name"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="description"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="age"
type="int" indexed="true" stored="true" multiValued="false" />
<field name="town"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="townRaw"
type="string" indexed="true" stored="true" multiValued="false" />
<field name="country"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="countryRaw"
type="string" indexed="true" stored="true" multiValued="false" />
<field name="friends"
type="int" indexed="true" stored="true" multiValued="true" />
</fields>
<copyField source="country" dest="countryRaw" />
<copyField source="town" dest="townRaw" />
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="int" class="solr.TrieIntField"
precisionStep="0" positionIncrementGap="0" />
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
</types>
</schema>
I would go with the approach to model each person for itself. The relationship between two persons is modelled via the attribute friends, which translates into an array of IDs. So at index time you would need to fetch the IDs of all friends for a person and put them into that field.
Most of the other fields are straight forward. Interesting are the two Raw fields. Since you said that you want to facet on the country you will need the country unchanged or optimized for faceting. Usually the types of fields differ depending on their purpose (searching for them, faceting by them, autosuggesting them, etc.). In this case country and town are indexed just as they are given in.
Now to your use cases,
Give me countries, where George's friends live
This can then be done by faceting. You could query
for the ID of George
facet on countryRaw
Such a query would look like q=friends:1&rows=0&facet=true&facet.field=countryRaw&facet.mincount=1
Return all documents(persons), whose friend is 30 years old.
This one is harder. First off you will need Solr's join feature. You need to configure this in your solrconfig.xml.
<config>
<!-- loads of other stuff -->
<queryParser name="join" class="org.apache.solr.search.JoinQParserPlugin" />
<!-- loads of other stuff -->
</config>
The according join query would look like this q={!join from=id to=friends}age:[30 TO *]
This explains as follows
with age:[30 TO *] you search for all persons that are of age 30 or older
then you take their id and join it on the friends attibute of all others
this will return you all persons that have the ids matched by the initial query within their friends attribute
As I have not written this off of my mind, you may have a look on my solrsample project on github. I have added a test case there that deals about the question:
https://github.com/chriseverty/solrsample/blob/master/src/main/java/de/cheffe/solrsample/FriendJoinTest.java