|
Scanned cover pages |
|
Editorial
(p. 3),
Alexander Gelbukh |
|
SPECIAL ISSUE:
NATURAL LANGUAGE PROCESSING AND KNOWLEDGE
MANAGEMENT
Guest
Editor: Alexander Gelbukh |
| 1. |
Axel-Cyrille Ngonga Ngomo and Frank Schumacher
(Germany)
Disentangling
the Wikipedia Category Graph for Corpus
Extraction (pp. 5-10)
In several areas of research such as
knowledge management and natural language
processing, domain-specific corpora are required
for tasks such as terminology extraction and
ontology learning. The presented investigations
herein are based on the assumption that
Wikipedia can be used for the purpose of corpus
extraction. It presents the advantage of
possessing a semantic layer, which should ease
the extraction of domain-specific corpora. Yet,
as the Wikipedia category graph is scale-free,
it can not be used as it is for these purposes.
In this paper, we propose a novel approach to
graph clustering called BorderFlow, which we use
and evaluate on the Wikipedia category graph.
Additional possible applications of these
results in the area of information retrieval are
presented. |
| 2. |
In this paper, we propose semantic enterprise
search as promising technical methodology for
improving on accessibility to institutional
knowledge. We briefly discuss the nature of
knowledge and ignorance in respect to web-based
information retrieval before introducing our
particular view on semantic search as tight
fusion of search engine and semantic web
technologies, based on semantic annotations and
the concept of intra-institutionwise distributed
extensibility while still maintaining free
keyword search functionality. Consequently , our
architecture implementation makes strong use of
the Aperture and Lucene software frameworks but
introduces the novel concept of "RDF documents".
Because our prototype system is not yet
complete, we are not able to provide performance
statistics but instead we present a concise
example scenario.
|
| 3. |
Sergey Yablonsky
(Russia)
Semantic Web Framework for
Development of Very Large Ontologies
(pp. 19-26)
This
paper deals with the development of the Semantic
Web framework for very large ontologies. The
Semantic Web is often associated with specific
XML-based standards for semantics, such as RDF
and OWL. Application of lexical ontologies such
as WordNet and others for different tasks on the
Semantic Web requires their representation in
RDF and/or OWL formats with possibility of the
different ontology mappings, semantic workflows,
services and other semantic technologies.
|
| 4. |
Saïd Radhouani, Claire-Lise Mottaz Jiang, and
Gilles Falquet
(Switzerland)
FlexIR: a
Domain-Specific Information Retrieval System
(pp. 27-31)
We present a precise search engine adapted to
professional environments which are
characterized by a domain (e.g. medicine, law,
sport, and so on). In our approach, each domain
has its own terminology (i.e. a set of terms
that denote its concepts: team, player, etc.)
and it is organized along dimensions, such as
person, location, etc. The dimensions, as
described below, are made of concepts and
semantic relationships that represent a
particular perspective or point of view on the
domain. We mainly use the notion of domain
dimension to: i) precisely index document
content, and ii) develop an interactive
interface which allows the user to precisely
describe his or her information need and
therefore precisely access the document
collection.
|
| 5. |
Jianshu Sun, Chong Long , Xiaoyan Zhu,
and Minlie Huang
(China)
Mining
Reviews for Product Comparison and
Recommendation
(pp. 33-40)
Recently,
as the amount of customer reviews grows rapidly
on product service websites, it costs customers
much time to select and compare their favorite
products. Researchers have been aware of this
problem and many studies are investigated to
mine the opinions from the online reviews.
Unfortunately, few previous works give
comparisons or recommendations among the
products. In this paper, we propose an automated
system to address this problem. We first build a
product feature sentiment database from the
reviews. Then we perform the comparison among
various products from both subjective and
objective perspectives on the feature level.
Finally, product recommendations can be
suggested according to the previous comparisons
and an evolution tree constructed from the
reviews. Experiment results demonstrate the
effectiveness of the proposed approach in mining
the digital camera reviews. And now a demo
system is put in to practical use.
|
| 6. |
Cerstin Mahlow and Michael Piotrowski
(Switzerland)
SMM:
Detailed, Structured Morphological Analysis for
Spanish
(pp. 41-48)
We present a morphological analyzer for
Spanish called SMM. SMM is implemented in the
grammar development framework Malaga, which is
based on the formalism of Left-Associative
Grammar. We briefly present the Malaga
framework, describe the implementation decisions
for some interesting morphological phenomena of
Spanish, and report on the evaluation results
from the analysis of corpora. SMM was originally
only designed for analyzing word forms; in this
article we outline two approaches for using SMM
and the facilities provided by Malaga to also
generate verbal paradigms. SMM can also be
embedded into applications by making use of the
Malaga programming interface; we briefly discuss
some application scenarios.
|
| 7. |
Claudiu Mihăilă, Corina Forăscu, and Sabin C.
Buraga
(Romania)
CLAU – A
Service-Oriented System for Complex Language
Alignment: Architectural Aspects
(pp. 49-54)
In the last years,
parallel corpora have become an effective
framework to study how well the linguistic
phenomena and, more specifically, annotation
schemata can be applied when importing the
annotations from one language to the other(s).
In the case of automatic import, the evaluation
and correction are better to be performed by
linguists using specific software. The paper
proposes CLAU – a service-oriented interactive
application allowing users to import, evaluate,
correct, and share XML-based annotations in
parallel texts. The design, general
architecture, and implementation are discussed.
Also, two use cases are presented: temporal
annotations in parallel texts and how CLAU
facilitates social Web interactions between
language scientists.
|
| 8. |
Kamlesh Dutta, Nupur Prakash, and Saroj Kaushik
(India)
Application of Pronominal Divergence and
Anaphora Resolution in English-Hindi Machine
Translation
(pp. 55-58)
So far
the majority of Machine Translation (MT)
research has focused on translation at the level
of individual sentences. For sentence level
translation, Machine Translation has addressed
various divergence issues for large variety of
languages; the issue of pronominal divergence
has been presented only recently. Since
the quality of translation
as required by users follows coherent
multi-sentence discourse structure in a specific
context, the pronominal divergence helps us in
understanding the nuances of translation arising
out of disparity in the languages.
Subsequently using clues from this divergence,
the anaphora resolution system can find the
correct interpretation for the given pronominal
referents and other entities by resolving the
inter-sentential context. In the literature,
researchers have examined the issue and have
proposed ways for their classification and
resolution of anaphora. However for Indic
languages, not many studies are available. In
this paper, we discuss different aspects of
pronominal divergence that affects the anaphora
resolution in English Hindi Machine Translation
(EHMT). The study shall be helpful in developing
approaches that can explicitly use
inter-sentential information in order to resolve
specific types of ambiguity and which can
generate coherent multi-sentence discourse
structure in the target language to produce
higher quality of translation Machine
Translation.
|
| 9. |
Hye-Jin Jeong and Yong-Sung Kim
(South Korea)
E-Learning
Content Design and Implementation based on
Learners’ Levels
(pp. 59-63)
The modern techniques of
content design should not depend on restrictions
of schedules and physical spaces. Still, the
learning that depends on the contents provided
from a server is difficult to implement
effectively without taking into consideration
learners’ levels. The learning should fit the
learners’ abilities. In this study, we propose
the methods of developing learning content that
fits the individual levels. Evaluations for
individual levels are presented as the first
level and the second level. The first level
presents “evaluation learning” for each
paragraph of the learning, while at the second
level evaluations are carried out through
“Trying the following” and “Trying oneself”.
“Checking Test” as part of the “sum of learning”
is carried out during the first evaluation. Also
“Trying oneself” is carried out as commensurate
learning according to learners’ levels.
|
| 10. |
Pilar Manchón, Carmen del Solar, Gabriel Amores,
and Guillermo Pérez
(Spain)
Modeling Multimodal
Multitasking in a Smart House
(pp. 65-71)
This paper belongs to an ongoing series of
papers presented in different conferences
illustrating the results obtained from the
analysis of the MIMUS corpus. This corpus is the
result of a number of WoZ experiments conducted
at the University of Seville as part of the TALK
Project. The main objective of the MIMUS corpus
was to gather information about different users
and their performance, preferences and usage of
a multimodal multilingual natural dialogue
system in the Smart Home scenario. The focus
group is composed by wheel-chair-bound users. In
previous papers the corpus and all relevant
information related to it has been analyzed in
depth. In this paper, we will focus on
multimodal multitasking during the experiments,
that is, modeling how users may perform more
than one task in parallel. These results may
help us envision the importance of
discriminating complementary vs. independent
simultaneous events in multimodal systems. This
gains more relevance when we take into account
the likelihood of the co-occurrence of these
events, and the fact that humans tend to
multitask when they are sufficiently comfortable
with the tools they are handling.
|
|