I am new to SOLR and MONGODB.
I am trying to index data from mongodb into SOLR using DataImportHandler but I could not find the exact steps that I need to follow.
Could you please help me in getting the exact steps to index MongoDB into Solr using DataImportHandler?
SolrVersion - solr-4.6.0
MongoDB version- 2.2.7
Late to answer, however thought people might find it useful.
Below are the steps for importing data from mongodb to Solr 4.7.0 using DataImportHandler.
Step 1:
Assume that your Mongodb has following database and collection
Database Name: Test
Collection Name: sample
The sample collection has following documents
db.sample.find()
{ "_id" : ObjectId("54c0c6666ee638a21198793b"), "Name" : "Rahul", "EmpNumber" : 452123 }
{ "_id" : ObjectId("54c0c7486ee638a21198793c"), "Name" : "Manohar", "EmpNumber" : 784521 }
Step 2:
Create a lib folder in your solrhome folder( which has bin and collection1 folders)
add below jar files to lib folder. You can download solr-mongo-importer from here!
- solr-dataimporthandler-4.7.0.jar
- solr-mongo-importer-1.0.0.jar
- mongo-java-driver-2.10.1.jar (this is the mongo java driver)
Step 3:
Declare Solr fields in schema.xml(assumed that id is already defined by default)
add below fields in schema.xml inside the <fields> </fields> tag.
<field name="Name" type="text_general" indexed="true" stored="true"/>
<field name="EmployeeNumber" type="int" indexed="true" stored="true"/>
Step 4:
Declare data-config file in solrconfig.xml by adding below code inside <config> </config> tag.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
Step 5:
Create a data-config.xml file in the path collection1\conf\ (which by default holds solrconfig.xml and schema.xml)
data-config.xml
<?xml version="1.0"?>
<dataConfig>
<dataSource name="MyMongo" type="MongoDataSource" database="Test" />
<document name="import">
<!-- if query="" then it imports everything -->
<entity processor="MongoEntityProcessor"
query="{Name:'Rahul'}"
collection="sample"
datasource="MyMongo"
transformer="MongoMapperTransformer" name="sample_entity">
<!-- If mongoField name and the field declared in schema.xml are same than no need to declare below.
If not same than you have to refer the mongoField to field in schema.xml
( Ex: mongoField="EmpNumber" to name="EmployeeNumber"). -->
<field column="_id" name="id"/>
<field column="EmpNumber" name="EmployeeNumber" mongoField="EmpNumber"/>
</entity>
</document>
</dataConfig>
Step 6:
Assuming solr (I have used port 8080) and mongodb are running, open the following link http://localhost:8080/solr/dataimport?command=full-import in your browser for importing data from mongodb to solr.
fields imported are _id,Name and EmpNumber(MongoDB) as id,Name and EmployeeNumber(Solr).
You can see the result in http://localhost:8080/solr/query?q=*
You can try using SolrMongoImporter, it ask you to import 2 libraries into your solr proyect and create a data-config.xml.
You probably will need to import in your solrconfig.xml the following libraries if you don't have it
<lib dir="../../../contrib/dataimporthandler/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" />
If you have followed everything above and still facing the issue: check whether you have multiple jars in different locations.
i.e.
{solr-home}/dist/solr-dataimporthandler-8.10.0.jar
{solr-home}/server/libs/solr-dataimporthandler-8.10.0.jar
{solr-home}/server/solr/core/libs/solr-dataimporthandler-8.10.0.jar
If so, remove the jar files from everywhere but the location you have configured in the solrconfig.xml file.
Related
We are using standalone-vdb.xml domain to create a vdb and then make it accessible through Jupiter for other users.
Now based on the xml file below as an example, we created the VIEW "customer_view"
from the table "Export2.customer_table" and they are both accessible from the Jupiter.
However, we only want the VIEWS to be accessible and not the physical tables
which property can be used to hide the tables and only expose the VIEWS for the end user.
Any one have a clue which property can do that? I tried to find it from the documentation but couldn't find any mentioning for that.
we are using WildFly Full 17.0.1 through the HAL management interface in a Docker container environment and Postgresql database.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vdb name="stock" version="1">
<description>The VDB</description>
<property name="UseConnectorMetadata" value="true" />
<model visible="true" name="Export2">
<property name="importer.useFullSchemaName" value="false"/>
<property name="importer.schemaPattern" value="public"/>
<property name="importer.tableTypes" value="TABLES,VIEW"/>
<source name="stockDS" translator-name="postgresql" connection-jndi-name="java:jboss/datasources/stockDS"/>
</model>
<model visible="true" name="Data" type="VIRTUAL">
<metadata type="DDL"><![CDATA[
CREATE VIEW customer_view (
field_names string,
field_description string
) AS
SELECT variable_name, variable_description
FROM Export2.customer_table;
]]> </metadata>
</model>
<data-role name="RoleA" any-authenticated="true">
<description>Allow Reads and Writes to tables and procedures</description>
<permission>
<resource-name>Export2.customer_table</resource-name>
<allow-create>true</allow-create>
<allow-read>true</allow-read>
<allow-update>true</allow-update>
</permission>
<mapped-role-name>Admin</mapped-role-name>
</data-role>
</vdb>
see http://teiid.github.io/teiid-documents/master/content/reference/r_xml-deployment-mode.html
you need to define the model with visibility to false like
<model visible="false" name="Export2">
note that this will remove the metadata exposure from any APIs, however, if someone knows the schema they still can use the same connection to issue the query and see the data. If you want to avoid that then you need to look into data security policies to avoid any access.
I am trying to upgrade a Solr 6.2.1 single node instance to a SolrCloud setup using the Solr version 6.6. The issue I am facing is that while performing data import from mongodb using the solr-mongo-importer-1.1.0.jar and mongo-java-driver-2.14.3.jar, the _id field is being imported as "_id":"org.bson.types.ObjectId:585a53d109ed44343743ebd1" instead of "_id":"585a53d109ed44343743ebd1" as in the Solr 6.2.1 instance. (The jars are the same version in both the cases)
The schema contains the following (same in both versions):
<fieldType name="string" class="solr.StrField"sortMissingLast="true"/>
<field name="_id" type="string" indexed="true" stored="true"/>
Is there any change in the fieldType in the new version or am I missing something?
Solr fields should be declared in schema.xml.
It is possible that id is already defined!
Alternatively, Please try setting column attribute with name -
<field column="_id" name="id"/>
<field column="OtherNumber" name="OtherNumber" mongoField="OthNumber"/>
OR
Try using this directive for specifying _id field
<uniqueKey>
Hope it helps!
Trying to increase the number of threads in embedded jetty running in karaf .Im changing the jetty.xml with the following properties as described in the POST .
<Configure class="org.eclipse.jetty.server.Server">
<Call name="addConnector">
<Arg>
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">1000</Set>
</New>
</Arg>
</Call>
</Configure>
And also having org.ops4j.pax.web.cfg file in karaf ,with below properties :
org.ops4j.pax.web.config.file=${karaf.home}/jetty.xml
so to refer the external configurations (Jetty). But Im not able to increase/decrease the default thread size of the server . So What am I missing ?
With the latest Pax-Web 4.2.0 it's possible to configure those settings via configuration admin. The following three new settings can be used:
org.ops4j.pax.web.server.maxThreads
org.ops4j.pax.web.server.minThreads
org.ops4j.pax.web.server.idleTimeout
Can anyone help me on how to configure/create a Custom Data Source and using WSO2 4.0.2
Here is a sample wso2-dss-connector for Mongo DB ( Link : https://github.com/wso2/wso2-dss-connectors/tree/master/mongodb) . How to deploy this with WSO2. While building this project,we can get a jar ,so how to integrate this with wso2 for creating a custom datasource
I am new to wso2,i didn't get a clear picture from the official doc
Thanks in advance
If you want a connector for mongo db you can use the above link you mentioned and build this jar and you can put this in the DSSHOME/repository/component/dropins folder and restart the server. Once you have added the jar to dropins you can use the following database descriptor file (dbs) to test this
In this example we create a data source call "mongo_ds" and our query will be called mongo_find, which will retreive set of Documents elements.
<config id="mongo_ds">
<property name="custom_query_datasource_class">org.wso2.dss.connectors.mongodb.MongoDBDataSource</property>
<property name="custom_datasource_props">
<property name="servers">localhost</property>
<property name="database">mydb</property>
</property>
</config>
<query id="mongo_find" useConfig="mongo_ds">
<expression>things.find()</expression>
<result element="Documents" rowName="Document">
<element column="document" name="Data" xsdType="string"/>
</result>
</query>
If you want to write custom data sources for other data sources please refer the following guide.
I recently switched from ODP Unmanaged to ODP Managed (in conjunction with Entity Framework).
The Unmanaged drivers were working fine after adding the necessary information in the web.config section. I could add the stored procedures and generate the complex types using the Function Import - Get Column information (I'm trying to import a stored procedure with an OUT refcursor parameter).
After the switch the config section was updated to reflect the new format and everything works at runtime (so the format is correct).
However when I try to generate the complex types again (or add a new Function Import) I just get a System.notSupportedException Message: The specified type is not supported by this selector) Without any indication which type/selector it is (obviously)...
Google has turned up nothing and the thread on the Oracle Forums has gathered no response as well.
Versions:
ODP.Net (ODAC) : v12.1 (Production release; DLL v4.121.1.0)
EF v5
.NET v4.5
Config file (trimmed a bit):
<configSections>
<section name="oracle.manageddataaccess.client" type="OracleInternal.Common.ODPMSectionHandler, Oracle.ManagedDataAccess"/>
</configSections>
<oracle.manageddataaccess.client>
<version number="*">
<edmMappings>
<edmMapping dataType="number">
<add name="bool" precision="1"/>
<add name="byte" precision="2" />
<add name="int16" precision="5" />
<add name="int32" precision="10" />
<add name="int64" precision="38" />
</edmMapping>
</edmMappings>
<implicitRefCursor>
<storedProcedure schema="ECOM" name="SHP_API_ORDERS.CREATE_ORDER">
<refCursor name="O_RS">
<bindInfo mode="Output"/>
<metadata columnOrdinal="0" columnName="COL1" nativeDataType="Number" providerType="Decimal" allowDBNull="false" numericPrecision="10" numericScale="0" />
<metadata columnOrdinal="1" columnName="COL2" nativeDataType="Date" providerType="Date" allowDBNull="true" />
<metadata columnOrdinal="2" columnName="COL3" nativeDataType="Varchar2" providerType="Varchar2" allowDBNull="false" columnSize="10" />
</refCursor>
</storedProcedure>
</implicitRefCursor>
</version>
</oracle.manageddataaccess.client>
<entityFramework>
<defaultConnectionFactory type="System.Data.Entity.Infrastructure.SqlConnectionFactory, EntityFramework" />
</entityFramework>
<system.data>
<DbProviderFactories>
<remove invariant="Oracle.ManagedDataAccess.Client" />
<add name="ODP.NET, Managed Driver"
invariant="Oracle.ManagedDataAccess.Client"
description="Oracle Data Provider for .NET, Managed Driver"
type="Oracle.ManagedDataAccess.Client.OracleClientFactory, Oracle.ManagedDataAccess, Version=4.121.1.0, Culture=neutral, PublicKeyToken=89b483f429c47342" />
</DbProviderFactories>
</system.data>
The implicit ref cursor config file format is different between Unmanaged ODP.NET and Managed ODP.NET. That might be part of your problem.
To save yourself from pulling your hair out, install the latest Oracle Developer Tools for Visual Studio (ODT) and use the new feature that automatically generates this config:
1) Install ODT 12.1 if you haven't already
2) Find the stored procedure in server explorer, right click it and run it, and enter input parameters.
3) For the output ref cursor that represents the return value for your Entity Function, choose "Add to Config" checkbox.
4) Then select either "Show Config" (and then cut and paste) or "Add to Config".
Here is a screenshot of what I am talking about:
http://i.imgur.com/t1BfmUP.gif
If this doesn't fix the problem, play around with that boolean mapping. I am not 100% sure of this as of this writing, but I remember hearing that support for booleans is another difference between managed and unmanaged ODP.NET. I'm sure it's buried in the release notes or doc somewhere.
Christian Shay
Oracle
Two things you would want to try which might potentially solve the issue:
Ensure the case of the schema name, stored procedure name and the
column names in the config are the same as that in the Oracle.
Try mapping the native type to a more conformant provider type, like
the first column COL1 - map an int32 providerType to the
number(10,0) nativeDataType as enforced by your edmmapping, instead of
the Decimal that you currently have. And so forth for the other
columns (like remove the column lengths) until you do not see the error or get a different one.
I've got the same error and I think my problem is a providerType of DOUBLE or DECIMAL. But, I got one to work that has your 3 column types. Your problem is that a number(10,0) should be a providerType of "Int64".
Stored Procedure:
create or replace PROCEDURE "PROC_ESCC_FIELDS" (p_recordset OUT SYS_REFCURSOR)
AS
BEGIN
OPEN p_recordset FOR
SELECT COL1, COL2, COL3
FROM MyTable;
END PROC_ESCC_FIELDS;
This works and returns the cursor:
<oracle.manageddataaccess.client>
<version number="*">
<implicitRefCursor>
<storedProcedure schema="SERFIS" name="PROC_V_SERFIS_ESCC_FIELDS">
<refCursor name="P_RECORDSET">
<bindInfo mode="Output" />
<metadata columnOrdinal="0" columnName="COL1" providerType="Int64" allowDBNull="false" nativeDataType="Number" />
<metadata columnOrdinal="1" columnName="COL2" providerType="Date" allowDBNull="true" nativeDataType="Date" />
<metadata columnOrdinal="2" columnName="COL3" providerType="Varchar2" allowDBNull="false" nativeDataType="Varchar2" />
</refCursor>
</storedProcedure>
</implicitRefCursor>
</version>
</oracle.manageddataaccess.client>
Click here for a list of the providerType and nativeDataType, etc. ENUMS: