Background

Literature and learning from past

Scientific knowledge has been accumulating for hundreds--even thousands--of years. This "accumulation" takes place almost entirely in written texts, most recently in the scientific literature.

When that accumulated knowledge can be accessed, we can develop it further--all science takes place with reference to what is already known (or believed). However, the scientific literature is growing so rapidly that it has become essentially impossible for human researchers to stay abreast of it, even within their own specialties.

This mismatch between the enormous availability of knowledge and the striking difficulty that researchers have in accessing it is the root motivation for the Biomedical Linked Annotation Hackathons. While there are other community efforts with similar motivation, the BLAH hackathon series put an emphasis on linking individually-developed efforts and resources for literature mining, and on developing an infrastructure which may be able to make a breakthrough in the productivity of the community.

Compared to conferences in a conventional form, hackathons put an emphasis on implementing things in a free, collaborative culture, which often ended up producing publicly useful resources. It could be said that recent emerge of hackathon-style conferences in academic communities represents a growing importance of engineering for science.

The goal of the series of BLAH hakathons is to bring together leading experts and active researchers and developers around literature mining to collaboration for linking of literature mining efforts and resources, so that they can more easily interoperate with each other.

Structured databases

As accumulation of human knowledge increases, instant access to necessary pieces of knowledge becomes more and more important. Many and various structured databases have been developed, to make instant access to knowledge pieces efficient. Particularly, life science is an area with rich public databases, e.g. Entrez Gene, UniProt, PubChem.

To fully benefit from the various databases, however, it is desired that relevant entities across multiple databases are to be interlinked to each other.

Linked Data

Linked Data (LD) is emerging as a new way of data publication. LD enables relevant data pieces across multiple databases to be linked to each other through a standard protocol. It may be said that while the amount of databased data pieces increased during the development of structured databases (mostly relational databases), the linkage between the data pieces is being significantly improved thanks to the technology of LD and Semantic Web.

Compared to knowledge represented in scientific literature, however, the pieces of knowledge in structured databases or linked data often miss their contexts, e.g., experimental environments.

Linked Annotation

Often, literature is the only place where the contexts of an individual data piece can be found, and this is why database curators want to record references to relevant pieces of literature, e.g., PMID for individual data entries. Still, many databases miss such references, or have largely incomplete references. Finding the contexts of database entries from literature, and linking them to each other is thus an important task, raising the utility of databases. From a perspective of literature, it is called annotation (plus normalization or grounding), and it improves the accessibility to the content of literature, and increases the chance for mining across literature and structured databases.

Google Map vs Linked Annotation

It is conceptually similar to Google Map which links various entities and structures to 2-dimensional unstructured data (map).

Linked Annotation is to link various entities and their structures to 1-dimensional unstructured data (text).

PubAnnotation

We recognize Google map is one of the most successful public-sourcing annotation systems: users can easily create annotations (geographical annotations), and share them with anyone else.

PubAnnotation (http://pubannotation.org) is an annotation repository which is developed to implement a Google map-like system for public-sourcing and publishing of annotations to literature.

BLAH

BLAH is organized to develop linked literature annotation as a community effort. The BioNLP community has made substantial progress for the last decades to produce various annotations to the biomedical literature. Now it is time to put more effort to improve accessibility to the invaluable resources. Through the linked annotation effort, we believe accessibility and also productivity of annotation may be significantly improved.