<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet title="XSL formatting" type="text/xsl" href="http://blog.isavoir.com/feed/rss2/xslt" ?><rss version="2.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:wfw="http://wellformedweb.org/CommentAPI/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
  <title>DNA MANIA</title>
  <link>http://blog.isavoir.com/</link>
  <description>Bioinformatic, Text Mining, Biological Text Mining, Name entity recognition, Genomic, System Biology, Semantic, Computational Biology, Semantic Web, Knowledge management, Biomedicine, Ontology, Thesaurus, Terminology, Corpora, Content management</description>
  <language>en</language>
  <pubDate>Mon, 05 May 2008 16:11:09 +0200</pubDate>
  <copyright>iSavoir @ 2007 copyright reserved</copyright>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Dotclear</generator>
  
    
  <item>
    <title>Ontology, thesaurus, taxonomy meta-model and semantic Web.</title>
    <link>http://blog.isavoir.com/post/2007/04/08/Why-do-we-need-a-controlled-vocabulary</link>
    <guid isPermaLink="false">urn:md5:cf4d0b2f910c5acacc9e343783276afa</guid>
    <pubDate>Sun, 08 Apr 2007 14:50:00 +0200</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>Semantic</category>
        <category>controlled vocabulary</category><category>meta-model</category><category>ontology</category><category>taxonomy</category><category>thesaurus</category>    
    <description>&lt;p&gt;The first common question for the neophyte can be : What is the purpose of
having a vocabulary ? The man for a few centuries likes to organize, cut out,
structure, treat on a hierarchical basis. Sometimes this hierarchisation is so
excessive that one loses the direction of them first as often arrived oneself
among the naturalists of the 19th century. &lt;strong&gt;To have a particular
vocabulary to describe a field allows to organize your
knowledge.&lt;/strong&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2&gt;Ontology&lt;/h2&gt;
&lt;p&gt;Let us approach a first painful point immediately to thus evacuate it and
concentrate on the subject. The origin of the word &lt;q&gt;ontology&lt;/q&gt; such as
defined in the majority of the dictionaries will not find any reference to data
processing.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://blog.isavoir.com/tag/ontology&quot;&gt;ontology&lt;/a&gt; |änˈtäləjē| noun&lt;br /&gt;
The branch of metaphysics dealing with the nature of being.&lt;br /&gt;
ORIGIN early 18th cent.: from modern Latin ontologia, from Greek ōn, ont-
‘being’ + -logy .&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In Philosophy.&lt;/strong&gt; Part of the metaphysics which applies to the
being as being, independently of its particular determinations.&lt;br /&gt;
“&lt;a href=&quot;http://en.wikipedia.org/wiki/Being_and_Nothingness&quot;&gt;Being And
Nothingness, phenomenologic test of ontology&lt;/a&gt;”, Jean-Paul Sartre.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The world of the semantic Web and information sciences especially adapted
this term, I do not know by which intermediary, but it is a fact and with the
largest prejudice of many philosophers.&lt;/p&gt;    &lt;h2&gt;Difference between controlled vocabularies, thesaurus, taxonomy and
ontology.&lt;/h2&gt;
&lt;p&gt;A &lt;strong&gt;&lt;a href=&quot;http://blog.isavoir.com/tag/controlled%20vocabulary&quot;&gt;controlled
vocabulary&lt;/a&gt;&lt;/strong&gt; is a list of terms that have been enumerated
explicitly. This list is controlled by and is available from a controlled
vocabulary registration authority. All terms in a controlled vocabulary should
have an unambiguous, non-redundant definition. This is a design goal that may
not be true in practice. It depends on how strict the controlled vocabulary
registration authority is regarding registration of terms into a controlled
vocabulary. At a minimum, the following two rules should be enforced:&lt;/p&gt;
&lt;pre&gt;
  1. If the same term is commonly used to mean different concepts in different contexts, then its name is explicitly qualified to resolve this ambiguity.
  2. If multiple terms are used to mean the same thing, one of the terms is identified as the preferred term in the controlled vocabulary and the other terms are listed as synonyms or aliases.
&lt;/pre&gt;
&lt;p&gt;A &lt;strong&gt;&lt;a href=&quot;http://blog.isavoir.com/tag/taxonomy&quot;&gt;taxonomy&lt;/a&gt;&lt;/strong&gt; is a collection of
controlled vocabulary terms organized into a hierarchical structure. Each term
in a taxonomy is in one or more parent-child relationships to other terms in
the taxonomy. There may be different types of parent-child relationships in a
taxonomy (e.g., whole-part, genus-species, type-instance), but good practice
limits all parent-child relationships to a single parent to be of the same
type. Some taxonomies allow poly-hierarchy, which means that a term can have
multiple parents. This means that if a term appears in multiple places in a
taxonomy, then it is the same term. Specifically, if a term has children in one
place in a taxonomy, then it has the same children in every other place where
it appears.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;&lt;a href=&quot;http://blog.isavoir.com/tag/thesaurus&quot;&gt;thesaurus&lt;/a&gt;&lt;/strong&gt; is a networked
collection of controlled vocabulary terms. This means that a thesaurus uses
associative relationships in addition to parent-child relationships. The
expressiveness of the associative relationships in a thesaurus vary and can be
as simple as “related to term” as in term A is related to term B.&lt;/p&gt;
&lt;p&gt;People use the word ontology to mean different things, e.g. glossaries &amp;amp;
data dictionaries, thesauri &amp;amp; taxonomies, schemas &amp;amp; data models, and
formal ontologies &amp;amp; inference. A formal ontology is a controlled vocabulary
expressed in an ontology representation language. This language has a grammar
for using vocabulary terms to express something meaningful within a specified
domain of interest. The grammar contains formal constraints (e.g., specifies
what it means to be a well-formed statement, assertion, query, etc.) on how
terms in the ontology’s controlled vocabulary can be used together.&lt;/p&gt;
&lt;p&gt;People make commitments to use a specific controlled vocabulary or ontology
for a domain of interest. Enforcement of an ontology’s grammar may be rigorous
or lax. Frequently, the grammar for a &amp;quot;light-weight&amp;quot; ontology is not completely
specified, i.e., it has implicit rules that are not explicitly documented.&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;&lt;a href=&quot;http://blog.isavoir.com/tag/meta-model&quot;&gt;meta-model&lt;/a&gt;&lt;/strong&gt; is an explicit
model of the constructs and rules needed to build specific models within a
domain of interest. A valid meta-model is an ontology, but not all ontologies
are modeled explicitly as meta-models. A meta-model can be viewed from three
different perspectives:&lt;/p&gt;
&lt;pre&gt;
  1. as a set of building blocks and rules used to build models
  2. as a model of a domain of interest, and
  3. as an instance of another model.
&lt;/pre&gt;
&lt;p&gt;When comparing meta-models to ontologies, we are talking about meta-models
as models (perspective 2).&lt;/p&gt;
&lt;p&gt;Note: Meta-modeling as a domain of interest can have its own ontology. For
example, the CDIF Family of Standards, which contains the CDIF Meta-meta-model
along with rules for modeling and extensibility and transfer format, is such an
ontology. When modelers use a modeling tool to construct models, they are
making a commitment to use the ontology implemented in the modeling tool. This
model making ontology is usually called a meta-model, with “model making” as
its domain of interest.&lt;/p&gt;
&lt;p&gt;Bottom line: Taxonomies and Thesauri may relate terms in a controlled
vocabulary via parent-child and associative relationships, but do not contain
explicit grammar rules to constrain how to use controlled vocabulary terms to
express (model) something meaningful within a domain of interest. A meta-model
is an ontology used by modelers. People make commitments to use a specific
controlled vocabulary or ontology for a doma&lt;/p&gt;</description>
    
    
    
          <comments>http://blog.isavoir.com/post/2007/04/08/Why-do-we-need-a-controlled-vocabulary#comment-form</comments>
      <wfw:comment>http://blog.isavoir.com/post/2007/04/08/Why-do-we-need-a-controlled-vocabulary#comment-form</wfw:comment>
      <wfw:commentRss>http://blog.isavoir.com/feed/rss2/comments/96215</wfw:commentRss>
      </item>
    
  <item>
    <title>Semantic Web: which tools for the Power? Part I</title>
    <link>http://blog.isavoir.com/post/2007/04/01/Semantic-Web%3A-which-tools-for-the-Power-Part-I</link>
    <guid isPermaLink="false">urn:md5:1d4e0a60f30c3f82e86cc7d827749ba9</guid>
    <pubDate>Sun, 01 Apr 2007 23:40:00 +0200</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>Semantic</category>
        <category>FOAF</category><category>Semantic Web tools</category><category>SIOC</category>    
    <description>    &lt;p&gt;Well, well, Well. One day, a friend of mine told me about Semantic Web tools
and technologies. It's all about how machine can figure out, interpret and used
natural language contents on the Web. There are lot of languages so complex to
deal with information. You can have a look &lt;a href=&quot;http://en.wikipedia.org/wiki/Semantic_Web&quot; hreflang=&quot;en&quot;&gt;here&lt;/a&gt; to discove
the big cake! However, I think the idea to manage web content by the machine
could be sexy. When I read through this last link I thought what is this
soup?&lt;/p&gt;
&lt;p&gt;So I think, the first step in my case should be to build massive, huge free,
open repositories of information through collaborative way.&lt;/p&gt;
&lt;p&gt;My last blog contact, &lt;a href=&quot;http://pbeltrao.blogspot.com/&quot;&gt;Pedro
Beltrao&lt;/a&gt; (thanks Pedro), gave me an example of this kind of repository:
&lt;a href=&quot;http://www.freebase.com/&quot;&gt;freebase&lt;/a&gt; I don't have any invitation
yet. I will inform you in next weeks. I hope.&lt;/p&gt;
&lt;p&gt;Come back to my friend. Finally, I asked him where is a frontier between
Semantic Web tools and Web 2.0 tools?&lt;/p&gt;
&lt;p&gt;We will discuss this point next week...&lt;/p&gt;
&lt;p&gt;Some new vocabulary you can try with your girlfriend :) Good luck&lt;/p&gt;
&lt;p&gt;- FOAF (&lt;a href=&quot;http://www.foaf-project.org/&quot;&gt;http://www.foaf-project.org/&lt;/a&gt;) - SIOC
(&lt;a href=&quot;http://sioc-project.org/&quot;&gt;http://sioc-project.org/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Christophe&lt;/p&gt;</description>
    
    
    
          <comments>http://blog.isavoir.com/post/2007/04/01/Semantic-Web%3A-which-tools-for-the-Power-Part-I#comment-form</comments>
      <wfw:comment>http://blog.isavoir.com/post/2007/04/01/Semantic-Web%3A-which-tools-for-the-Power-Part-I#comment-form</wfw:comment>
      <wfw:commentRss>http://blog.isavoir.com/feed/rss2/comments/93954</wfw:commentRss>
      </item>
    
  <item>
    <title>Bringing Semantic Web to real life?</title>
    <link>http://blog.isavoir.com/post/2007/03/25/Bringing-Semantic-Web-to-real-life</link>
    <guid isPermaLink="false">urn:md5:7bd0de9f0449df7543a21fe442619e0f</guid>
    <pubDate>Sun, 25 Mar 2007 23:24:00 +0200</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>Semantic</category>
        <category>Web 2.0</category>    
    <description>    &lt;p&gt;Thanks to the cool Apps with &lt;a href=&quot;http://blog.isavoir.com/tag/Web%202.0&quot;&gt;Web 2.0&lt;/a&gt; features
that promote people's thinkings, I end up to believe there is a road to
Semantic Web aka Web 3.0. The potential applicability of the Semantic Web is
very broad.&lt;/p&gt;
&lt;p&gt;So, what 's the Semantic Web? The Wikipedia defines the &lt;a href=&quot;http://en.wikipedia.org/wiki/Semantic_web&quot; hreflang=&quot;en&quot;&gt;Semantic Web&lt;/a&gt; as a
project that intends to create a universal medium for information sharing by
transforming documents with computer-processable meaning on the Web. The core
idea is to create the meta data describing the data, which will enable
computers to process the meaning of things. Once computers are equipped with
semantics, they will be capable of solving complex semantical optimization
problems. Meaning, computers help us to go through information to avoid
time-consuming tasks.&lt;/p&gt;
&lt;p&gt;Semantic Web is Personalized Web. So far people use mediation tools, for
example, annotations to define information into flexible and open information,
persistent personal preferences, even relevant data as filters. That's the key,
actually. Others tools are Knowledge Discovery, Knowledge Management and
Service-Oriented Architectures (SOA). Plus, a new challenge is to give for all
of us end-user apps which integrate, combine and deduce information neede to
assist people in performing tasks.&lt;/p&gt;
&lt;p&gt;Could Semantic Web give the Power to the People? Not quite but one
day...&lt;/p&gt;
&lt;p&gt;Christophe&lt;/p&gt;
&lt;p&gt;Next week: Semantic Web: which tools for the power?&lt;/p&gt;
&lt;p&gt;Further Information:&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;- - &lt;a href=&quot;http://www.ryerson.ca/~dgrimsha/courses/cps720_02/resources/Scientific%20American%20The%20Semantic%20Web.htm&quot; hreflang=&quot;en&quot;&gt;The Semantic Web&lt;/a&gt; by Time Berners-Lee.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;- - &lt;a href=&quot;http://jena.sourceforge.net/&quot; hreflang=&quot;en&quot;&gt;Jena&lt;/a&gt;.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;- - &lt;a href=&quot;http://www.joseki.org/&quot; hreflang=&quot;en&quot;&gt;Joseki&lt;/a&gt;.&lt;/p&gt;</description>
    
    
    
          <comments>http://blog.isavoir.com/post/2007/03/25/Bringing-Semantic-Web-to-real-life#comment-form</comments>
      <wfw:comment>http://blog.isavoir.com/post/2007/03/25/Bringing-Semantic-Web-to-real-life#comment-form</wfw:comment>
      <wfw:commentRss>http://blog.isavoir.com/feed/rss2/comments/91877</wfw:commentRss>
      </item>
    
  <item>
    <title>How Google Books Search can change Academic Science</title>
    <link>http://blog.isavoir.com/post/2007/03/23/How-Google-Book-Search-can-change-Academic-Science</link>
    <guid isPermaLink="false">urn:md5:f27622b95a31ca5fa550df4a95bcd7d9</guid>
    <pubDate>Fri, 23 Mar 2007 13:17:00 +0100</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>Open Science</category>
        <category>Education</category><category>Scholarship</category>    
    <description>    &lt;p&gt;In a follow-up from this post i've picked up: &lt;a href=&quot;http://radar.oreilly.com/archives/2007/03/how_google_book.html&quot; hreflang=&quot;eng&quot;&gt;O'Reilly radar&lt;/a&gt; , here's the future of Scholarship and education, I
truly think.&lt;/p&gt;
&lt;p&gt;Google Books Serch has been undoubtedly useful but, they don't seem to be
digitising classic books in law, medicine and various sciences, which are very
famous and now in public domain.&lt;/p&gt;
&lt;p&gt;However, publishers who have transcribed and published them, have put in
Google Books where only few pages/lines can be accessed.&lt;/p&gt;
&lt;p&gt;Books are horribly expensive, especially classic legal tomes, which have a
smaller market than computer books, to make it economically viable for
companies.&lt;/p&gt;
&lt;p&gt;In the same post, a &lt;a href=&quot;http://landscape.blogspot.com/2007/03/how-google-books-is-changing-academic.html&quot; hreflang=&quot;eng&quot;&gt;Berkeley grad student disses the experience of the Berkeley
library system and lauds Google&lt;/a&gt;.&amp;quot; Jo Guldi, the author of that blog entry,
wrote:&lt;/p&gt;
&lt;p&gt;I was idle trying a search on &amp;quot;roads&amp;quot; to see what sort of a literature would
turn up for the period of my dissertation research, 1740-1850.&lt;/p&gt;
&lt;p&gt;I didn't expect much. I've spent the last two years wandering through the
Yale, Harvard, and California libraries, the British Library, Britain's
National Archives, and the immense reserves of North American Inter Library
Loan reading every book on London, pavement, or travel I could get my hands
on.&amp;quot;&lt;/p&gt;
&lt;p&gt;For academic historians , this is turbo chargingwith online access to full
text book.&lt;/p&gt;
&lt;p&gt;What about books in Sciences, law, medicine and technologies , Mr Google
?&lt;/p&gt;</description>
    
    
    
          <comments>http://blog.isavoir.com/post/2007/03/23/How-Google-Book-Search-can-change-Academic-Science#comment-form</comments>
      <wfw:comment>http://blog.isavoir.com/post/2007/03/23/How-Google-Book-Search-can-change-Academic-Science#comment-form</wfw:comment>
      <wfw:commentRss>http://blog.isavoir.com/feed/rss2/comments/90873</wfw:commentRss>
      </item>
    
  <item>
    <title>Excellent Overview of Benefits of RDF and SPARQL</title>
    <link>http://blog.isavoir.com/post/2007/03/19/Excellent-Overview-of-Benefits-of-RDF-and-SPARQL</link>
    <guid isPermaLink="false">urn:md5:db0d9de067b30ead87671333040ac998</guid>
    <pubDate>Mon, 19 Mar 2007 14:13:00 +0100</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>Semantic</category>
        <category>RDF</category><category>Semantic Web</category><category>SPARQL</category>    
    <description>    &lt;p&gt;This article on &lt;a href=&quot;http://www.xml.com/pub/a/2007/03/14/a-relational-view-of-the-semantic-web.html?CMP=OTC-TY3388567169&amp;amp;ATT=A+Relational+View+of+the+Semantic+Web&quot; hreflang=&quot;eng&quot;&gt;XML.com&lt;/a&gt; is a very good summary of the benefits of &lt;a href=&quot;http://blog.isavoir.com/tag/RDF&quot;&gt;RDF&lt;/a&gt; and &lt;a href=&quot;http://blog.isavoir.com/tag/SPARQL&quot;&gt;SPARQL&lt;/a&gt; -- two of the key
technologies of the emerging &lt;a href=&quot;http://blog.isavoir.com/tag/Semantic%20Web&quot;&gt;Semantic
Web&lt;/a&gt;.&lt;/p&gt;</description>
    
    
    
          <comments>http://blog.isavoir.com/post/2007/03/19/Excellent-Overview-of-Benefits-of-RDF-and-SPARQL#comment-form</comments>
      <wfw:comment>http://blog.isavoir.com/post/2007/03/19/Excellent-Overview-of-Benefits-of-RDF-and-SPARQL#comment-form</wfw:comment>
      <wfw:commentRss>http://blog.isavoir.com/feed/rss2/comments/89593</wfw:commentRss>
      </item>
    
  <item>
    <title>Is Information Extraction from the scientific litterature ready for Life science ?</title>
    <link>http://blog.isavoir.com/post/2007/03/19/Is-Information-Extraction-IE-from-the-scientific-litterature-ready-for-Life-science</link>
    <guid isPermaLink="false">urn:md5:c391034b37a9022837287421fb3c64d0</guid>
    <pubDate>Mon, 19 Mar 2007 13:30:00 +0100</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>Text Mining</category>
        <category>information Extraction</category><category>Natural language processing</category><category>NLP</category><category>Ontology</category><category>Ontology-driven information extraction</category>    
    <description>&lt;p&gt;For the average biologist, hands-on literature mining currently means a
keyword search in PubMed. However, methods for extracting biomedical facts from
the scientific literature have improved considerably, and the associated tools
will probably soon be used in many laboratories to automatically annotate and
analyse the growing number of system-wide experimental data sets.&amp;quot;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Extract from Nature Review Genetics : Literature mining for the biologist:
from information retrieval to biological discovery by Peer Bork et al. 2006&lt;/p&gt;    &lt;p&gt;Simply put, &lt;a href=&quot;http://blog.isavoir.com/tag/Information%20extraction&quot;&gt;Information
extraction&lt;/a&gt; ( IE) accomplish these tasks :&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;* Take natural language text from a document source , and extract the
essential facts about one or more predefined fact types.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;* Represent each fact as a template whose slots are filled on the basis of
what is found in the text.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;IE is typically carried out in support of other tasks, usually forms part of
application or pipeline of processes. The results of IE is either stored in a
databases or subjected to querying or data mining; integrated in knowledge
bases to allow reasoning or presented to users for annotation or curation tasks
.&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Thus, IE is an application of &lt;a href=&quot;http://blog.isavoir.com/tag/Natural%20language%20processing&quot;&gt;Natural language processing&lt;/a&gt;
(&lt;a href=&quot;http://blog.isavoir.com/tag/NLP&quot;&gt;NLP&lt;/a&gt;). As the term implies, the goal is to extract
information from text , and the aim is to do so without requiring the end user
to read the text. In contrast, information Retrieval (IR) like Search engine is
the activity of finding documents that answer an information need with the help
of an index.&lt;/p&gt;
&lt;p&gt;IE have dealt primarily with news resources , and more recently with
scientific publications. In sciences, general language grammar and dictionary
are not enough. Scientific fields use many technical terms, only a few are
found in common discourses. To some extends, this kind of terms can be listed
in auxiliary terminologies. however, automatic term recognition ( ATR) is
useful for IE to extract named entities on the basis of their internal
structures.&lt;/p&gt;
&lt;p&gt;Regardless of what IE approaches was used in the passed, scientific fields,
especially biology and Biomedicine is not well suited with IE systems that
doesn't make used of &lt;a href=&quot;http://blog.isavoir.com/tag/Ontology&quot;&gt;ontology&lt;/a&gt; and linguistic
lexicons. The best exemple is &lt;a href=&quot;http://www.ims.uni-stuttgart.de/projekte/GenIE/&quot; hreflang=&quot;eng&quot;&gt;GenIE &amp;quot; Genome
Information Extraction&amp;quot;&lt;/a&gt; from the institute for Computational linguistic at
the University of Stuttgart. they uses Ontology-driven information Extraction
technologies that goes behind extracting simple facts from sentences. their aim
is to deal with anaphoric reference and information from each sentence merged
or a relation must ne established between events.&lt;/p&gt;
&lt;p&gt;For instance, if a sentence refers explicitly to a binding action, and the
following sentence is pointing to the gene expression regulation du to the
interaction between binding factors and promoters sequences, then the
dependency between events should be capture.&lt;/p&gt;
&lt;p&gt;A must read &lt;a href=&quot;http://www.nature.com/nrg/journal/v7/n2/abs/nrg1768.html;jsessionid=C3EA31280579A569ED0ED327B540FA2F&quot; hreflang=&quot;eng&quot;&gt;&amp;quot; Literature mining for the biologist: from information
retrieval to biological discovery&amp;quot; by Peer Bork et al. Nature Review Genetics
2006.&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;
&amp;quot;
&lt;/pre&gt;
&lt;p&gt;DNA MANIA&lt;/p&gt;</description>
    
    
    
          <comments>http://blog.isavoir.com/post/2007/03/19/Is-Information-Extraction-IE-from-the-scientific-litterature-ready-for-Life-science#comment-form</comments>
      <wfw:comment>http://blog.isavoir.com/post/2007/03/19/Is-Information-Extraction-IE-from-the-scientific-litterature-ready-for-Life-science#comment-form</wfw:comment>
      <wfw:commentRss>http://blog.isavoir.com/feed/rss2/comments/89581</wfw:commentRss>
      </item>
    
  <item>
    <title>Open Text Mining Interface</title>
    <link>http://blog.isavoir.com/post/2007/03/11/Open-Text-Mining-Interface</link>
    <guid isPermaLink="false">urn:md5:0e6f4cd04f00c7a2e35ae08f5a25d884</guid>
    <pubDate>Sun, 11 Mar 2007 15:04:00 +0100</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>Text Mining</category>
        <category>Open Science</category><category>Open Text Mining Interface</category><category>OTMI</category><category>RSS feeds</category><category>Text Mining</category>    
    <description>&lt;p&gt;&lt;img src=&quot;http://blog.isavoir.com/public/otmi.gif&quot; alt=&quot;OTMI&quot; style=&quot;float:left; margin: 0 1em 1em 0;&quot; /&gt; Nature might not quite be in the Open
Publishing business like PLoS, but they are an important player nevertheless. I
hope the OTMI gets picked up by other publications. It would be nice to have a
publication data standard and as one of the top two scientific journals, Nature
has the clout to make this happen. Being able to mine journals and search for
information is invaluable (open or otherwise), and using standard formats like
OPML is an excellent idea.&lt;/p&gt;    &lt;h3&gt;Being able to share&lt;/h3&gt;
&lt;p&gt;The Open Text Mining Interface (OTMI) is an initiative from Nature
Publishing Group (NPG). It aims to enable scholarly publishers, among others,
to disclose their full text for indexing and text-mining purposes but without
giving it away in a form that is readily human-readable. Here is their &lt;a href=&quot;http://blog.isavoir.com/post/2007/03/11/&quot; hreflang=&quot;en&quot;&gt;wiki page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Open Text Mining Interface provides for a range of structured disclosure
options, from word vectors (list of word occurences with frequency counts) and
the presentation of text 'snippets' out of narrative order, to the presentation
of full text in &amp;quot;raw&amp;quot; or &amp;quot;reduced&amp;quot; form.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;&lt;a href=&quot;http://blog.isavoir.com/tag/Open%20Text%20Mining%20Interface&quot;&gt;Open Text Mining
Interface&lt;/a&gt;&lt;/strong&gt; (OTMI) is aimed to initiate &lt;strong&gt;&lt;a href=&quot;http://blog.isavoir.com/tag/text%20mining&quot;&gt;text mining&lt;/a&gt;&lt;/strong&gt; capabilities in sciences
publications. Something researchers are waiting for decades now. Nature Blog
NASCENT initial demo uses the &lt;a href=&quot;http://www.nature.com/nature/journal/v440/n7083/index.html&quot; hreflang=&quot;fr&quot;&gt;23
March issue of Nature&lt;/a&gt; &lt;a href=&quot;http://blogs.nature.com/wp/nascent/&quot; hreflang=&quot;fr&quot;&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Embedded in the HTML of the abstract and full-text file for each article is
a tag like this:&lt;/p&gt;
&lt;pre&gt;
   &amp;lt;link rel=&amp;quot;OTMI&amp;quot; type=&amp;quot;application/atom+xml&amp;quot; href=&amp;quot;../otmi/otmi-nature04614.xml&amp;quot;/&amp;gt;
&lt;/pre&gt;
&lt;p&gt;which points to an &lt;a href=&quot;http://www.nature.com/nature/journal/v440/n7083/otmi/otmi-nature04614.xml&quot; hreflang=&quot;fr&quot;&gt;OTMI file&lt;/a&gt; — a machine-readable representation of the text.
(Technically, it's an Atom Entry document with various XML namespace extensions
to allow us to include additional information.) As I write this, the example
files for our test issue contain the following information:&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;pre&gt;
   * Bibliographic details (of the kind you might also find in the table &lt;a href=&quot;http://blog.isavoir.com/tag/RSS%20feeds&quot;&gt;RSS feeds&lt;/a&gt;)
   * Word vectors. That is, a list of all the words that appear in the article and the number of occurrences. (There's also a stop-word list of very common words that have been excluded.) This enables the construction of the most basic types of search index.
   * 'Snippets'. Basically sentences, presented out of order, which allows more sophisticated indexing and text mining (e.g., the kind that looks out for common constructions such as &amp;quot;A binds to B&amp;quot; or &amp;quot;X inhibits Y&amp;quot;), but not, of course, anything that looks across sentence boundaries.
&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Note for that for both words and sentences — actually quite hard concepts to
define in strict computational terms — the algorithms used to tokenize the text
are defined in the OTMI file using regular expressions, so anyone — or anything
— examining the file can in principle know exactly how the text was processed
to create the respective lists. Note also that the word vectors will usually be
redundant if you have the sentences, but they include both for the purposes of
this demo (and who know, maybe it's useful to some people if they provide
both).&lt;/p&gt;
&lt;p&gt;There are still a lot of things that could be improved here. For
example:&lt;/p&gt;
&lt;pre&gt;
  1. Allow for text from different sections of an article (e.g., abstract, figure legends) to be labelled as such.
  2. Allow for text to be presented in normal human-readable form for publishers who are willing to provide this.
  3. Add a list of cited articles, providing at least DOIs but perhaps other information too. This would, of course, open up the content to citation analysis.
  4. Add references to the &lt;a href=&quot;http://blog.isavoir.com/tag/OTMI&quot;&gt;OTMI&lt;/a&gt; files from the corresponding RSS feed items (and from the log-in page where content is access-controlled).
  5. Add references to a common stop-word list instead of repeating it in each OTMI file.
  6. Add rights information.
  7. Add references to associated data files and/or database entries.
  8. Provide an actual spec. ;)
&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;They intend to make at least some of these changes (and perhaps others
besides) over the coming weeks, so expect the example files to change before
your eyes. There's also an even more basic issue around whether an Atom entry
document is the right starting point. For example, perhaps an RDF/XML format
would be more useful, at least to some people.&lt;/p&gt;
&lt;p&gt;The example of RSS shows how powerful a relatively simple common standard
can be when it comes to aggregating content from multiple sources (even when
it's messed up as badly as RSS ;). So maybe an approach like OTMI (or a better
one dreamt up by someone else) can help those who want to index and text-mine
scientific and other content. Like RSS, I think publishers might also come to
see this as a kind of advert for their content because it should help
interested readers to discover it. And on the basis that a something is always
better than nothing, it also doesn't force publishers to give away the
human-readable form of their content — they can limit themselves to snippets or
even just word vectors if they want to.&lt;/p&gt;
&lt;p&gt;Frederic&lt;/p&gt;</description>
    
    
    
          <comments>http://blog.isavoir.com/post/2007/03/11/Open-Text-Mining-Interface#comment-form</comments>
      <wfw:comment>http://blog.isavoir.com/post/2007/03/11/Open-Text-Mining-Interface#comment-form</wfw:comment>
      <wfw:commentRss>http://blog.isavoir.com/feed/rss2/comments/87151</wfw:commentRss>
      </item>
    
  <item>
    <title>Wellcome DNA maniac !</title>
    <link>http://blog.isavoir.com/post/2007/03/11/first</link>
    <guid isPermaLink="false">urn:md5:795b74b2e69baa9ed6e3f33ae7ad506e</guid>
    <pubDate>Sun, 11 Mar 2007 13:14:00 +0000</pubDate>
    <dc:creator>Frédéric</dc:creator>
        <category>News</category>
        <category>Bioinformatic</category><category>Molecular Biology</category><category>Semantic</category><category>Semantic Web</category><category>Text Mining</category><category>Web</category>    
    <description>    This is our Blog mainly about our interest in &lt;a href=&quot;http://blog.isavoir.com/tag/Bioinformatic&quot;&gt;Bioinformatic&lt;/a&gt;, &lt;a href=&quot;http://blog.isavoir.com/tag/Molecular%20Biology&quot;&gt;Molecular Biology&lt;/a&gt;, &lt;a href=&quot;http://blog.isavoir.com/tag/Text%20Mining&quot;&gt;Text Mining&lt;/a&gt; and &lt;a href=&quot;http://blog.isavoir.com/tag/Semantic&quot;&gt;Semantic&lt;/a&gt;
&lt;a href=&quot;http://blog.isavoir.com/tag/Web&quot;&gt;Web&lt;/a&gt;&lt;br /&gt;
We wish you to find interesting matters on this blog.&lt;br /&gt;
&lt;br /&gt;
Our Blog's name &amp;quot;DNA Mania&amp;quot; was taken from an humorous post from &lt;a hreflang=&quot;eng&quot; href=&quot;http://www.nodalpoint.org/blog/duncan&quot;&gt;Duncan's Blog&lt;/a&gt; that we
loved to read. In this post, he pointed to some amazing books. I
enjoyed &lt;span class=&quot;sans&quot;&gt;&lt;a hreflang=&quot;eng&quot; href=&quot;http://www.amazon.co.uk/exec/obidos/ASIN/0199295735&quot;&gt;The Music of Life:
Biology Beyond the Genome&lt;/a&gt;&lt;/span&gt; &lt;span class=&quot;sans&quot;&gt;of Denis Noble&lt;/span&gt;
&lt;strong style=&quot;font-weight: normal;&quot; class=&quot;sans&quot;&gt;(2006)  and
r&lt;/strong&gt;&lt;span class=&quot;sans&quot;&gt;ealized we were DNA maniac..:-)&lt;/span&gt;
&lt;p&gt;Frederic &amp;amp; Christophe&lt;/p&gt;</description>
    
    
    
      </item>
    
</channel>
</rss>