the myTea ProjectHome | myTea vLab | myTea vBench Close Up (Geek View)| vLab Software Archtecture | Downloads myTea vLab ArchitectureOverviewOpen Open Open!!!The goal of our approach in the myTea system is to provide myTea as an open platform for e-Science developers. To that end, the system is deployed as plug-in style client-server architecture, as shown in Figure 7. This approach allows application developers in the e-Science space either to utilize the services provided by the myTea environment by writing wrappers for applications that already exist, or by directly integrating myTea services into their own application architectures. Communication between the myTea service and the client tools is implemented using Java RMI. This means that tools in myTea environment can use either Java RMI or Web Services to communicate with the myTea system.
Semantic Web TechnologiesIn terms of implementation, we are using Semantic Web technologies and languages for data communication, discovery and storage. This includes triple stores for storage, RDF for describing the data, ontologies to support inference over the data and OWL to describe the ontologies. For storage, we are using a local triple store the contents are represented by a combination of the myGrid ontology, the myTea ontology and what we are calling the myTea-BioJava ontology that is based on the BioJava class hierarchy. The myTea ontology represents concepts unique to myTea, such as jobs, sequence collections etc. The second is a bioinformatics ontology that uses the properties of the well-known and well-used BioJava class data (such as sequence data) exactly as BioJava stores them in memory, just in triples and hence semantically accessible. The use of the myGrid ontology as well as our BioJava ontology means that any application that is written for the widely used myGrid workflows and myGrid data stores, or uses the BioJava libraries can then easily also access data in the myTea data store. The rationale for using the Semantic Web approach rather than a database only is encapsulated in the potential for the Semantic Web to make it easier for applications developers to connect researchers with other data sources and researchers with other researchers. For instance, we are connecting concepts from a variety of services that we wish to be able to integrate in the Bench. By using ontologies to define these concepts and the triple store to hold these concepts, it becomes easy for developers to build on top of these collections, and infer new knowledge from what is stored. When data is asserted into the triple store and is annotated in one of our ontologies, then the triple store can infer links between them automatically, rather than having to create the link manually. This is a powerful effect. When reviewing his or her experiment holdings, for example, the aggregation of triples through mechanisms such as the Life Science Identifier (LSID) [4] it might be noticed that much activity is centered about a particular sequence. This can reveal the importance of that sequence to a bioinformatician and the context and semantic annotation recorded by the myTea system can enable him or her to realize why. Client SideThe client side interface to the myTea architecture consists of three distinct components, the Events API, the Job API and the Data API as shown in Figure 3, above. In terms of the scenario of a scientist processing a sequence, data flows through the client side architecture in the following way: the data is stored using the myTea data store API. The events API is used to generate an event that says that some sequence data was retrieved from the Web (this is associated with the data in the data store automatically). The system then lets the user run a process on the sequences, such as an alignment Web-service that tries to automatically align the sequences. This is executed using the BenchÕs Job API. The application generates an event again to say that the user has performed an alignment and then stores the results using the data store API. Events APIThe events API allows the client application to post event notifications to the myTea environment. These events are recorded in the data store to be used by the myTea system to generate the user reports. An example of an event may be "a collection of sequences was created" accompanied by some annotations made by the scientist about the reasoning behind this and a link to the data in the form of a URI, a file path, an LSID, or a MyTeaID. Job APIThe job API allows external applications to execute jobs through the myTea environment using data stored in the myTea repository (or any data specified externally). Also, applications using the myTea environment can execute jobs within applications that implement this API. Data APIThe data API allows applications to store and retrieve data from the local myTea data store. The data store uses the myTea ontology, myGrid ontology and LSIDs to provide as unified an approach to classifying objects within the bioinformatics domain as possible. Server SideReport CreatorThe Report creation system works on the server side, as the Report is created from events registered with the myTea system using the Events Interface. Events consist mainly of a meaningful title, an annotation added by the scientist and data associated with the event. An example might be "a number of sequences were downloaded from a database". The scientist then puts together a Report template which is a structured display of selected events. The Report Generator then takes the template and fills out information with data stored in the myTea data store or the contents of files or Web pages. The Report can be used as a reference for scientists about what work they've done recently or in the past, or as a means of creating reports for their supervisor for example. 4.2.2 Job Executor The job executor provides the means by which jobs (which can be local applications, Web services or myGrid workflows) can be given data, executed, and the results retrieved. The advantage of executing these through the myTea environment is that the chain of provenance between the source data and any final results can be maintained throughout an entire project while not having to pre-specify what jobs will be done. It allows the research scientist flexibility in the work practices. Data StoreThe data store is a triple store based on the Sesame API [3]. Data is stored and inferences made across it using the myGrid and the myTea OWL ontologies. Comments?Have thoughts about the myTea vLab architecture? Please post your comments to the myTea forum. |