How to use Solr on Postgresql and index a table - postgresql

I am new to Solr with the specific need to crawl existing database table and generate results.
Any online example/tutorial so far only explains about you give documents and it gets indexed, but not any indication of how to do same on database.
Can anyone please explain steps how to achieve this ?
Links like this wiki shows everything with jdbc driver and mysql so I even doubt if Solr supports this with .NET or not. My tech boundries are in C# and Postgresql

You have stumpled over the included support for JDBC already, but you have to use the postgres JDBC driver. The example will be identical with the MySQL one, but you'll have to use the proper URL for postgres instead and reference the JDBC driver (which will depend on which Postgres JDBC driver you use).
jdbc:postgresql://localhost/test
This is a configuration option in Solr, and isn't related to .NET or other external dependencies.
However, the other option is to write the indexing code yourself, and this can often be a good solution as it makes it easier to pre-process the content and apply certain logic before storing content in Solr. For .NET you have Solrnet, a Solr client, that'll make it easy to both query from and submit documents to Solr.

Related

Using Slick with Kudu/Impala

Kudu tables can be accessed via Impala thus its jdbc driver. Thanks to that it is accessable via standard java/scala jdbc api. I was wondering if it is possible to use slick for it. Or if not is any other high level scala db framework supporting impla/kudu.
Slick can be used with any JDBC database
http://slick.lightbend.com/doc/3.3.0/database.html
At least, for me, Slick is not fully compatible with Impala Kudu. Using Slick, I can not modify db entities, can not create, update or delete any item. It works only to read data.
There are two ways you could use Slick with an arbitrary JDBC driver (and SQL dialect).
The first is to use low-level JDBC calls. The SimpleDBIO class gives you access to a JDBC connection:
val getAutoCommit = SimpleDBIO[Boolean](_.connection.getAutoCommit)
That example is from the Slick manual.
However, I think you're more interested in working at a higher level than that. In that case, for Slick, you'd need to implement a custom Profile. If Impala is similar enough to an existing database profile, you may be able to extend an existing profile and adjust it to account for any differences. For example, this would allow you to customize how SQL is formatted for Impala, how timestamps are represented, how column names are quoted. The documentation on Porting SQL from Other Database Systems to Impala would give you an idea of what needs to change in a driver.
Or if not is any other high level scala db framework supporting impla/kudu.
None of the main-stream libraries seem to support Impala as a feature. Having said that, the Doobie documentation mentions customising connections for Hive. So Doobie may be worth quickly trying Doobie to see if you can query and insert, for example.

Any way to get Meteor using a native ACID compliant db?

I am seriously considering Meteor framework for building every POC and apps in the future...but, I can't get ride of an ACID compliant database as I have few usages of multi-documents atomic transaction that require this compliance.
Meteor strongly rely on MongoDB syntax and storage engine at the moment (it means there are no "Transaction" related syntax available...)
I am currently evaluating any solution allowing this ACID capability :
Using a MySQL native driver for Meteor (different syntax than MongoDB?)
Using a PostgreSQL native driver for Meteor (SQL syntax)
Using a TokuMX (a MongoDB fork with ACID compliance...same syntax than MongoDB appart from transaction related commands that would be required to add)
Those 3 solutions are good candidates for the Meteor roadmap as shown here
What pros/cons about those solution ? Which are the most advanced one ?
What would you although consider as a solution to keep Meteor while storing documents in a NoSQL like ACID compliant db ?
sqlAndMeteor
If you are like me, you love Meteor but hate Mongo. In Meteor's Trello Roadmap (https://trello.com/b/hjBDflxp/meteor-roadmap), the most voted feature is SQL Support, either PostgreSQL or MySQL.
Since there is no date for that in Meteor, here I summarize the partial solutions I have found.
1.- Use SQL only for client-side querys.
Let's face it, Mongo sucks on common data operations, so having the ability to use SQL to query data (with JOINS, GRUP BY and so on) would relief a lot of pain. There are packages which let you use SQL in the client, at least for querys: The simplest one is a old (2010) utility, SqlLike (http://www.thomasfrank.se/sqlike.html). The new player in town is alaSQL, which is actively developed by #agershun (https://github.com/agershun/alasql). The SqlLike advantage is that it only has 10k. AlaSQL, is a lot more powerful, of course, but for using SQL to replace mongo sintax in unions and aggregations, SqlLike is OK.
With both of them you can do something like this in your helper:
productsSold:function(){
var customerSalesHistory=salesHistory.find({cutomerId:Session.get('currentCustomer')}).fetch();
var items=products.find().fetch();
return alasql("select item.name, sales.ordered as sumaVentas from ? sales, ? items
where items.Id=sales.itemId",[customerSalesHistory,items]);
}
2.- Experiment with direct SQL support.
Some packages try to replace Mongo (and minimongo) with MySql or PostgreSQL. #numtel's MySql package is Meteor-MySql https://github.com/numtel/meteor-mysql, and PostgreSQL is Meteor-pg (https://github.com/numtel/meteor-pg). Both are good attempts to solve the problem, but have some issues yet and are somehow cumbersome to adapt.
A team from Hack Reactor has formed Meteor Stream, and its first product is a PostgreSql integration with Meteor, meteor-postgres (https://github.com/meteor-stream/meteor-postgres). It looks very good and uses alaSql on the client to replace minimongo.
Both approaches are good, but they have some problems:
They broke deployment to meteor.
They are very, very young and not near production ready AFAIK
They still require tweaks to the usual pub-sub sintax we are used to, which could raise compatibility issues with other meteor packages.
3.- Still use Mongo, but as simple repository for your MySql database.
This option maintains all Meteor's characteristics and uses Mongo as a temporal repository for your MySql or PostgreSql databases.
A brilliant attempt to that is mysql-shadow by #perak (https://github.com/perak/mysql-shadow). It does what it says, keeps Mongo synchronized both ways with MySql and let's you work your data in MySql.
The bad news is that the developer will not continue maintaining it, but what is done is enough to work with simple scenarios where you don't have complex triggers that update other tables or stuff like that.
For a full featured synchronization you can use SymmetricsDS (http://www.symmetricds.org), a very well tested database replicator. This involves setting up a new java server, of course, but is by far the best way to be sure that you will be able to convert your Mongo database in a simple repository of your real MySql, PostgreSQL, SQL Server , Informix database. I have to check it myself yet.
For now MySQL Shadow seems like a good enough solution.
One advantage of this approach is that you can still use all standard Meteor features, packages, meteor deployment and so on. You donĀ“t have to do anything but set up the synch mechanism, and you are not breaking anything.
Also, if someday the Meteor team uses some of the dollars raised in SQL integration, your app is more likely to work as is.
If MySQL works for you, I've used the meteor-mysql package and it works well.
I finally forged my own conclusion...
I will have TWO platforms :
a Meteor front with business data in PostgreSQL and some front data or easy-to-replicate data in MongoDB
a Java data backend (server2server only) handling all atomic operations on my business data in PostgreSQL...plus technical adapters (SAP, Salesforce), a BPMN 2.0 workflow engine (Actility), and any registered SOA needed from other systems
Any comments are still very welcomed and will be considered and answered

Data virtualization with SQL Server DB using Marklogic

I would like to use data from a SQL Server database in Marklogic without moving it physically. I have read about data virtualization in Marklogic but cannot get any example or documentation explaining how to go about it. Please point me to any reference that may help me.
I have already tried reading data using MLSAM. Is this the only way and is this virtualization?
MarkLogic introduced the concept of Views to allow data visualization tools to connect to MarkLogic through ODBC, executing SQL against MarkLogic. These views are fed from XML content within MarkLogic through range indexes. So, I think that is the other way around for what you are looking for. In general, MarkLogic will need data inside its own databases, to allow indexing it.
MLSAM can be a way to pull such data in, executing SQL statements from within XQuery against external sources (contrary to xdmp:sql, which runs against the Views inside MarkLogic). Tools like RecordLoader, XQsync, and XMLSh might be worth looking at as well. See
http://developer.marklogic.com/code
HTH!

Do voltDB or NuoDB need ElasticSearch?

Currently I am working in a project that use PostgreSQL + ElasticSearch. However I recently found VoltDB, and I was wondering if we still need ElasticSearch for doing searches with VoltDB.
If I am ok, elasticSearch get the data from PostgreSQL of from another relational Database, and them it reindexes the data to make faster Queries instead using the relational Database indexes. This is because the data stored in ElasticSearch is not completely trusted because ElasticSearch is not ACID compliant.
VoltDB is very fast and is excellent at parallelizing work across hardware resources. It doesn't contain any kind of full-text-indexing functionality. Any kind of full-text search on VoltDB will be at least mostly brute force. That doesn't mean it won't meet your needs, but it really depends on the kind of queries you want to run.
Based on my (limited) knowledge of ElasticSearch, it appears that it is a search server that would work in conjunction with the database and is used primarily to search and index document files.
If this is correct, I don't think that NuoDB would be a replacement for ElasticSearch but could likely work in conjunciton with it similar to PostgreSQL.
Also, similar to Volt, NuoDB doesn't have full-text-indexing functionality.

Mongodb access through sql like syntax

Is there any library where i can access mongodb by using sql like syntax.
Example
use db
select * from table1
insert into table1 values (a,b,c)
delete from table
select a,b,count(*) from table1 group by a,b
select a.field1,b.field2 from a,b where a.id=b.id
Thanks
Raman
The learning curve is small only if you are only doing extremely simple sql queries. If the extent of your SQL querying is "select * from X", then MongoDB looks like a brilliant idea to cut through all the too-complicated SQL. But if you need to perform left outer joins, test for null, check for ranges, subselects, grouping and summation, then you will soon end up with a round concave dent in your desk after being moved to Mongo. The sick punchline is that half the time, the thing you are trying to do can't be done in the Mongo interface. Mongo represents a bold new world where instead of databases doing things like aggregation and query optimization, it just stores data and all the magic is done by retrieving everything, slowly, storing it in app memory, and doing all that stuff in code instead.
YES!
A company called UnityJDBC makes a JDBC driver for mongodb. Unlike the mongo java driver, this JDBC driver allows you to run SQL queries against MongoDB and the driver is supported by any Java appliaction that uses JDBC.
to download this driver go to...
http://www.unityjdbc.com/mongojdbc/mongo_jdbc.php
Its free to download too!
hope this helps
MoSQL might satisfy your needs. It'll require you to run a new PostgreSQL instance but from there you can query your entire Mongo dataset with SQL.
"MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL."
Have a look at this recent project: http://www.mongosql.com/. I've been looking at it over the last few weeks and it looks very promising.
For those of you who have questioned the usefulness of SQL against MongoDB, consider the large number of not-very-technical users in many organizations, like business analysts, who may know SQL, but don't want to make the leap to JavaScript and JSON. Tools like mongoSQL can help push the adoption of MongoDB in an organization.
There are a few solutions out there, but nearly all of them fail to truly represent the MongoDB data model in a way that the "relationally" minded ODBC/JDBC applications and users desire/require. A recent commercial product was released that addresses these challenges
ODBC:
http://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/mongodb
JDBC:
http://www.progress.com/products/datadirect-connect/jdbc-drivers/data-sources/mongodb
To address the need for ODBC/JDBC (SQL) access...While there are strong arguments for writing new applications using Mongo's clients, there is still a strong need in the marketplace for quality ODBC/JDBC and SQL based access to MongoDB. This need largely arises from all the reporting, analytic, and BI applications that rely on ODBC/JDBC connectivity and do not offer native integration with MongoDB.
Free NoSQL Viewer supports conversion of SQL queries to MongoDB shell syntax. Furthermore, in SQL Viewer you can even use SQL SELECT statements to query MongoDB collections data without knowing MongoDB query syntax. Check out NoSQL Viewer here www.spviewer.com/nosqlviewer.html
Mongodb and its current driver do not support direct SQL like syntax.
However, all operations are easily doable with the driver specific operations.
Here is a brief mapping of mongodb operations to corresponding SQL like query :
http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
There are a couple projects underway to emulate a SQL interface for MongoDB. While they provide a familiar interface, in general they should be avoided. They operate on a fundamentally flawed premise in that they parse strings and translate them into method calls.
Once you work with MongoDB you will find the approach of using classes and methods a much more accessible interface as it works exactly like all other parts of your application. Yes there is a small learning curve as you first start, but for the most part, the interface in MongoDB works how you would expect it to.