Importing mail messages into Solr (no IMAP) - email

How would I go about importing email message text files into Solr without connecting to IMAP? For instance, if I use the API, how should I set it up to index a mail message correctly?
I am new to configuring Solr. I am aware of the MailEntityProcessor, but my application is bespoke and does not use IMAP directly. I would like to import the raw text email message into Solr as they arrive.
Any help will be much appreciated.

Write a program that gathers information from whatever source you need (like your email system), constructs a Solr document, and batches up documents to send to Solr.
Solr comes with a Java client (SolrJ) that is actually part of Solr itself, and many third-party clients have been written for other programming languages.

Related

Query message store of mirth connect

Can I use mirth connect to store millions of HL7v2 messages (pipe delimited) and query them programmatically by our third party software application at a later point of time?
What's the best way to do that? Is mirth's REST API capable to query its message store efficently?
Unfortunatly I need a running mirth connect instance to browse the REST API documentation according to the manual at page 368. (If it wouldn't require to have a running instance of mirth to browse the documentation of the REST API I wouldn't have asked that question. Is there a mirth connect instance available on the internet to play with? Or would somebody be so kind to post the relevant REST API documentation for that question?)
So far, those are the scenarios I came up yet:
Mirth is integration engine, and its strength is processing messages. Browsing historical messages can be at times difficult or slow, depending on the storage settings for the channel and whether or not you take care to pull additional information out during processing to store in "custom metadata" fields. The custom metadata fields are not indexed by default, but you can add your own (mirth supports several back-end databases, including postgres, mysql, oracle, and mssql.) Searching the message content basically involves doing a full-text search and scanning. Filter options to reduce scan time, apart from the custom metadata you create, are mostly related to the message properties (datetime received, status, etc..) and not the content.
So, I would not recommend it for the use-case you are suggesting.
However, Mirth could definitely be used to convert your messages (batched from files or live) to xml which could be put in a database designed to handle and query large volumes of xml documents. I assume when you say HL7 you mean the ER7 (pipe delimited) format of HL7v2. Mirth automatically does the conversion to xml for those types of messages as they are handled as xml during processing. You could easily create a new parent node that holds both the converted xml and the original message string as children.
If the database you choose has a JDBC driver, Java SDK, or HTTP/REST API, mirth can likely directly insert the converted messages for you as it processes them.
There are two misconceptions here:
HL7v2 message is triggered by the real-world event, called the trigger event, on the placer (sender) side. It expects some activity to happen on the filler (receiver) side by either confirming the message, replying with the query response, etc. I.e., HL7v2 supports data flow among systems.
Mirth Connect is HL7 interface engine aimed at transforming incoming feeds in one format (e.g., HL7v2 in ER7 format) into outgoing feeds in another format (which could be another HL7v2, or XML, or database, etc.). It does not store anything except a configured portion of messages for audit purposes.
Now, to implement a solution you outlined, Mirth Connect or any other transformation mechanism has to implement two flows: receive, convert if needed and store incoming messages; provide an interface to query those messages.
This is obviously can be done with Mirth Connect but your initial question if Mirth is capable in storing millions of records is incorrect. In fact it's recommended to keep as less messages as possible to speed up Mirth processing (each processed message is stored in the Mirth internal database several times depending on configuration). Thus, all transformed messages are going into the external public or private message storage exactly as shown on your diagrams.

Apache CXF user input sanitation

I am developing some Apache CFX Web Services (note: I am not using Spring) and I am concern about injections and security in general. I think I am safe when it comes to SQLi once the code reaches my database, but how can I sanitise the input that comes from SOAP?
(ie. How can I prevent any other type of injection? Is there a way to read the raw number of bytes the user has sent? Becuse when objects are sent it is very unflexible to go one by one manually checking the size of each attribute)
Ideally I am looking for a library that supports the sanitisation of user input from SOAP.

is it possible to write record as NO-UNDO in transaction?

we are making some loging issue, where we need write the logentries in the DB. But the process run in a transaction and by rollback are our new logentries also deleted. can I make a write in DB out of the transaction? something like write in temptable with NO-UNDO option...? that the new logentries still remain in DB...?
Another possibility would be to use an app server. Transactions on app server sessions are independent from transactions in the original session (that's what the optional and redundant "DISTINCT TRANSACTION" syntax is all about).
Another option would be to use a simple messaging system. One very easy to setup and use option is STOMP. It is platform neutral and very easy to get going with.
Julian Lyndon-Smith posted the following on PEG about a month ago, and it really is as easy to setup and use as he says (I've tried it, I used ApacheMQ which is also very easy to setup and use):
Following on from presentations in Boston and Finland, dot.r is
pleased to announce the open source Stomp project, available
immediately.
Download from either http://www.dotr.com or
https://bitbucket.org/jmls/stomp , the dot.r stomp programs allow you
to connect your progress session to any other application or service
that is connected to the same message broker.
Open source, free message brokers that support Stomp are:
Fuse
(http://fusesource.com/products/fuse-mq-enterprise/) [a Progress company now owned by Red Hat inc]
Fuse MQ Enterprise is a standards-based, open source messaging platform that deploys with a very small footprint. The lack of license
fees combined with high-performance, reliable messaging that can be
used with any development environment provides a solution that
supports integration everywhere
ActiveMQ
Apache ActiveMQ (tm) (http://activemq.apache.org/)is the most popular
and powerful open source messaging and Integration Patterns server. Apache
ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes
with easy to use Enterprise Integration Patterns and many advanced features
while fully supporting JMS 1.1 and J2EE 1.4.
Apache ActiveMQ is released under the Apache 2.0 License.
RabbitMQ
RabbitMQ is a message broker. The principal idea is pretty simple: it
accepts and forwards messages. You can think about it as a post
office: when you send mail to the post box you're pretty sure that Mr.
Postman will eventually deliver the mail to your recipient. Using this
metaphor RabbitMQ is a post box, a post office and a postman.
The major difference between RabbitMQ and the post office is the fact
that it doesn't deal with paper, instead it accepts, stores and
forwards binary blobs of data - messages.
Please feel free to log any issues on the
https://bitbucket.org/jmls/stomp issue system, and fork the project in
order to commit back all those new features that you are going to add
...
dot.r Stomp uses the permissive MIT licence
(http://en.wikipedia.org/wiki/MIT_License)
Have fun, enjoy !
Julian
Every change to the database must be part of a transaction. If you do not explicitly start one it will be implicitly started for you and scoped to the next outer block with transaction capabilities.
However and although I would not recommend you to, work with sub-transactions. You can invoke a sub transaction by explicitly specifying a DO TRANSACTION within the transaction scope. Although the database will never know about it, the client can roll back the sub transaction while the database can commit the transaction.
But in order to implement something like this you must master the concepts of transaction scope, block behavior and error handling.
RealHeavyDude.
Write your log entries to a no-undo temp-table.
When the code will commit a transaction, or transactions aren't active (transactionID = ?) have your code write the log entries out.
I don't think there is any way to do this in ABL as you planned either efficiently (sprinkling temp-table flushes or other tidbits all over the place is gross) or reliably (what if the application crashes with an un-flushed temp-table?), as others have mentioned. I would suggest making your complicated logging less coupled to your app by making the database writes asynchronous, occurring outside of your application if possible.
Since you're on Windows, you could change your logging to use the .NET log4net library instead of ABL constructs. log4net has a few appenders that would be useful:
AdoNetAppender which lets you log directly to a database
RemoteSyslogAppender which uses the syslog protocol, letting you log to an external Unix syslog or rsyslog daemon (rsyslog supports writing log messages to databases)
UDPAppender which sends the log messages via UDP packets somewhere else to be handled (e.g. a logFaces server, which supports writing to databases)
If you must do it in ABL then you could use a named output stream specifically for your log messages (OUTPUT TO STREAM) which writes to a specific location where an external process is listening to handle it. This file could be a pipe created by something like mkfifo or just a regular text file that is monitored for changes with inotify (not sure what the Windows equivalents of these are). This external process would handle parsing the messages and writing them to the database (basically re-inventing rsyslog).
I like the no-undo temp-table idea, just be sure to put the database write part in a "FINALLY" block in case of unhandled exceptions.

Maintaining a distributed contact list and accessing in an email client

I am working on a project. I have many databases in which email contacts are maintained. Each database is location on a different server and each of them is used by different personnels to send emails. I want to unify these contacts, keep them sync and available to all the personnel who broadcast mails.
My solution: This is a distributed solution. Maintain continuously updated XML file on each server. When a broadcast mail is composed, fetch all files, merge them etc.
My question: How can I integrate this solution with an email client. Which opensource client is suitable for this? I want the phonebook to be updated when compose button is clicked.
I am open to other suggestions. May be some opensource solution that works out of the box for me.

What's the easiest way to process incoming email and add it to a queue on a Linux server?

What's the easiest way to process incoming email? Our objective is to get email into a Resque queue. We've explored and integrated a lot of options, like piping email through Postfix into Ruby (which turned out to be unreliable), piping email through Google App Engine back to our server (which turned out to be shaky), and using Sendgrid (which is expensive.)
I'm trying to explore other ways to get email processed. Any ideas?
If the email is received and hosted by an IMAP-capable server (Gmail is good enough), then the messages can be retrieved and processed once and again from almost any programming language using standard libraries.