Actions:
|
2011-02-23 15:37 AEST by Arthur Barrett - Server should have integration with Unstructured Information Management Architecture (UIMA).
UIMA is an industry standard for content analytics (an OASIS standard), apparently the only standard.
From wikipedia: an example is a logistics analysis software system that could convert unstructured data
such as repair logs and service notes into relational tables. These tables can then be used by automated
tools to detect maintenance or manufacturing problems.
I think this is the sort of feature that commercial users of SCCM would find useful - and therefore a
good future enhancement.
The UIMA diagram even includes source code and issue management in the diagram:
http://uima.apache.org/images/UimaIs.png
Previously the Bugzilla/Bonsai/Tinderbox toolchain included glimpse, which we proposed as a search
solution / knowledge management for customers including meetings in LA back in 2004. Glimpse
always was challenging because it was very unix centric (though we did have it running on our old NT4
server with Bugzilla 2.18 and CVSNT) and commercially licensed (dial licensed from memory). The UIMA
alternative is under an Apache license so is friendly for both commercial and LGPL/GPL.
The UIMA constsist of:
* components
* infrastructure
* frameworks
The UIMA components includes:
* Annotators - extracting structured information from unstructured data.
* repositories
The UIMA infrastructure includes:
* tooling
* server
The UIMA framework is available as C++ (plus Java and UMIA-AS/JMS (Java Messaging
Services/ActiveMQ).
The Frameworks run the components. Additional infrastructure support components include a simple
server which makes results of UIMA processing available in a simple, XML-based format (i.e.: as a REST
service).
The major goal of UIMA is to transform unstructured information to structured information by
orchestrating analysis engines to detect entities or relations and thus to build the bridge between the
unstructured and the structured world.
Apparently there is already a Tika Anlysis Engine for UIMA, so perhaps running Tika on any incoming
committed files?
http://tika.apache.org/0.9/formats.html
and
http://uima.apache.org/sandbox.html#tika.annotator
So in short:
* we integrate with Tika and the Tika Annotator during commit
( basically offering a UIMA Collection Reader)
* we integrate with UIMA Analysis in cvscontrol/evsmanager to use the results
Based on research conducted today I think that the loading of the data into the UIMA is practical already
- but querying it and displaying the results is still a distant promise, see:
SemanticSearch
http://www.alphaworks.ibm.com/tech/uima |