Towards Document Process Preservation: Xerox Launches Document Process Modelling Technology 'Xeproc©'by Thierry Jacquin, Hervé Déjean, Jean-Pierre Chanod
Developed at the Xerox Research Centre Europe in the context of the EU Integrated Project SHAMAN, Xeproc© technology lets you define and design document processes while producing an abstract representation that is independent of the implementation. These representations capture the intent behind the workflow and can be preserved for reuse in future unknown infrastructures. Xeproc© is available under Eclipse Public Licence.
Xeproc© technology can be used to build a wide range of applications based on document processing, including transformation, extraction, indexing and navigation. It can be easily integrated with more global business processes and customized to match specific requirements and infrastructures. In the spirit of service-oriented architecture (SOA), Xeproc© embeds references to services and documents and provides loose coupling not only to services but also to data resources, with respect to both their location and format.
Xeproc© was developed in the context of the Integrated Project SHAMAN (Sustaining Heritage Access through Multivalent Archiving), co-funded by the European Union within the FP7 Framework. SHAMAN aims at developing a long-term digital preservation framework and tools to analyse, ingest, manage, access and reuse digital objects.
More specifically, within the context of SHAMAN and digital preservation, Xeproc© models XML pipelines and XML validation checkpoints. These capture the intent behind the workflow irrespective of the implementation at a given point in time. These abstract representations are preserved, so that the Xeproc© models can be seen as independent specifications to be instantiated and deployed over time and as technology evolves. These logical and persistent descriptions, when associated with the accurate components, are interpreted or translated into any SOA orchestration language to produce logically structured documents (typically XML). These make explicit how the source document content is logically and semantically organized.
Available on Eclipse 3.5.1 under the Eclipse Public License, Xeproc© combines a domain-specific language (DSL), an associated graphic designer and extension APIs (application programming interfaces).
The Xeproc© DSL: extensible, easy to use and focussed
The Xeproc© Domain-Specific Language (DSL) is used to describe the document process you want to design. It specifies a chain of processing steps, which may point to components such as document services or project-specific resources. All components take a document as input and generate another document as output.
Figure 1: representation of a designed process with Xeproc©.
To take full advantage of Xeproc©, the designer links processing steps with validation resources. While validations are traditionally exploited just before deployment, the Xeproc© Designer is conceived in such a way that they are exploited throughout the design process. Thanks to a continuous monitoring mechanism, validations not only verify but also specify, and lead the design process from the specification to instantiation.
In addition, processing steps can be linked to visualization specifications, highlighting selected outputs. These views, which are captured on demand and throughout the entire monitoring of the process, make it easier to identify and pinpoint errors, undertake corrections or consult the relevant experts.
The Xeproc© DSL is open enough to support any document format, validation syntax and resource location.
The Xeproc© Graphic Designer
The Xeproc© Graphic Designer is a user-friendly Eclipse plug-in editor which allows the user to manipulate abstract representations of objects relevant to the Xeproc© application domain.
The Designer provides an intuitive representation of underlying Xeproc© models and the ability to draw, rearrange and tune document-processing chains. This is achieved by combining project-specific resources (processing components, validations and views) with generic document services organized in a palette. The processing elements are represented as boxes, intermediate documents as arrows and validation constraints and views as icons on boxes.
The Designer was generated from the Xeproc© model using the EMF/GMF (Eclipse Modelling Framework and Graphical Modelling Framework) technologies provided by Eclipse. Model-Driven Architecture methodologies supported by the Object Management Group were applied.
A document transformation project will typically create an Eclipse project, share it amongst all the technical partners and initialize it with the reference resources such as documents, requirements and schemas to be validated. The process designer will consider the context and customize the palette of components with those considered useful from a site update. From there (s)he will start the building process and may drag and drop from the component palette or from the project workspace, quickly drawing specific logical and persistent pipelines for document analysis and transformation.
The extension APIs allow the palette to be enlarged with components and associated validations and views published as document services. Such resources need to stipulate the interpretation engine responsible for injecting the input documents.
The extension APIs also allow you to extend your Xeproc© Designer with new resource processors, be they component, validation or view engines. The uploaded resources will then declare the processor type required at runtime. The Xeproc© Designer will dynamically delegate the operation to the right processor if plugged in.
This coupling of resource with interpreter makes it possible to realize amazing combinations, including WSDL/SoapClient, main.java/JVM, python.py/python.exe, XSLT Stylesheet/XSLT processor, or any other combination one may care to imagine. Mechanisms are provided to plug processing resources and processing engines into the Xeproc© Designer.
Xerox Research Centre Europe, France
Tel: +33 4 76 61 50 75