|
Scanned cover pages
Full
version in PDF |
|
Editorial
(p .3), Grigori Sidorov |
|
SPECIAL SECTION: Natural Language Processing |
| 1. |
He Ruifang, Qin Bing, Liu Ting, Liu Yang, and Li
Sheng
(China)
Iterative
Feedback Based Manifold-Ranking for Update
Summary (pp. 5-13)
The update summary as defined for the DUC2007
new task aims to capture evolving information of
a single topic over time. It delivers focused
information to a user who has already read a set
of older documents covering the same topic. This
paper presents a novel manifold-ranking frame
based on iterative feedback mechanism to this
summary task. The topic set is extended by using
the summarization of previous timeslices and the
first sentences of documents in current
timeslice. Iterative feedback mechanism is
applied to model the dynamically evolving
characteristic and represent the relay
propagation of information in temporally
evolving data. Modified manifold-ranking process
also can naturally make use of both the
relationships among all the sentences in the
documents and relationships between the topic
and the sentences. The ranking score for each
sentence obtained in the manifold-ranking
process denotes the importance of sentence
biased towards topic, and then the greedy
algorithm is employed to rerank the sentences
for removing the redundant information. The
summary is produced by choosing the sentences
with high ranking score. Experiments on dataset
of DUC2007 update task demonstrate the
encouraging performance of the proposed
approach. |
| 2. |
The selection of words chosen for a query,
crucial for the quality of results obtained by
the query, can be substantially improved by
using various lexical resources. Thus, for
example, morphological dictionaries enable
morphological expansion of queries, which is
very important in highly inflective languages,
such as Serbian. This paper discusses issues
related to improvement of queries using a rule
based procedure implemented in WS4LR, a
workstation for manipulating heterogeneous
lexical resources developed by the Human
Language Technology Group at the University of
Belgrade. The procedure is used for automatic
production of lemmas for a morphological
dictionary from a given list of compounds, and
its evaluation on several different sets of data
is given. Several examples illustrate how this
procedure can be used for improvement of queries
for web search engines. Results obtained for
these examples show that the number of documents
obtained through a query by using our approach
can be remarkably increased.
|
| 3. |
Asif
Ekbal and Sivaji Bandyopadhyay
(India)
Web-Based
Bengali News Corpus for Lexicon Development and
POS Tagging (pp. 21-30)
Lexicon development and Part of Speech (POS)
tagging are very important for almost all
Natural Language Processing (NLP) applications.
The rapid development of these resources and
tools using machine learning techniques for less
computerized languages requires appropriately
tagged corpus. We have used a Bengali news
corpus, developed from the web archive of a
widely read Bengali newspaper. The corpus
contains approximately 34 million wordforms.
This corpus is used for lexicon development
without employing extensive knowledge of the
language. We have developed the POS taggers
using Hidden Markov Model (HMM) and Support
Vector Machine (SVM). The lexicon contains
around 0.128 million entries and a manual check
yields the accuracy of 79.6%. Initially, the POS
taggers have been developed for Bengali and
shown the accuracies of 85.56%, and 91.23% for
HMM, and SVM, respectively. Based on the Bengali
news corpus, we identify various word-level
orthographic features to use in the POS taggers.
The lexicon and a Named Entity Recognition (NER)
system, developed using this corpus, are also
used in POS tagging. The POS taggers are then
evaluated with Hindi and Telugu data. Evaluation
results demonstrates the fact that SVM performs
better than HMM for all the three Indian
languages.
|
| 4. |
Maher Daoud and Christian Boitet
(France)
Methods for
Handling Spontaneous E-commerce Arabic SMS:
CATS, an Operational Proof of Concept (pp.
31-41)
The purpose of this paper is to show that it is
necessary and possible to build (multilingual)
NL-based e-commerce systems with mixed
sublanguage and content-oriented methods. The
analysis of the sublanguage and the integration
of content-oriented methods will definitely
increase the accuracy and robustness of the
processing. To verify this assumption, we built
an experimental system as a proof of concept.
The system is a SMS-based classified ads selling
and buying platform. To analyze the sublanguage,
we first used a web based corpus to build the
basic system. A content representation language
is defined to capture the meaning of a
classified ad post. The semantic grammars of
content extraction are coded using the EnCo.
Response generation is based on semantic
matching (“looking for” and “sell” posts) and
reasoning and is able to handle “no answer
situations”. CATS is currently deployed in
Jordan by Fastlink (the largest mobile
operator). Testing the content extraction
component with a real noisy free texts shows a
90% F-measure.
|
| 5. |
Vimal Mishra and R. B. Mishra
(India)
Study of
Example Based English to Sanskrit Machine
Translation (pp. 43-54)
Example based machine translation (EBMT) has
emerged as one of the most versatile,
computationally simple and accurate approaches
for machine translation in comparison to rule
based machine translation (RBMT) and statistical
based machine translation (SBMT). In this paper,
a comparative view of EBMT and RBMT is presented
on the basis of some specific features. This
paper describes the various research efforts on
Example based machine translation and shows the
various approaches and problems of EBMT. Salient
features of Sanskrit grammar and the comparative
view of Sanskrit and English are presented. The
basic objective of this paper is to show with
illustrative examples the divergence between
Sanskrit and English languages which can be
considered as representing the divergences
between the order free and SVO
(Subject-Verb-Object) classes of languages.
Another aspect is to illustrate the different
types of adaptation mechanism. |
|
REGULAR PAPERS |
| 6. |
Magdalena Marciano Melchor, María Aurora Molina
Vilchis, Juan Carlos Herrera Lozada
(Mexico)
Aberración Óptica
(pp. 55-56)
El estudio de las aberraciones ópticas radica en
la evaluación de las imágenes que produce un
sistema óptico. Este fenómeno se
debe a la geometría del sistema. En este
artículo se tiene la finalidad de presentar en
forma aproximada las ecuaciones analíticas que
describen a un frente de onda esférico afectado
por aberración “coma” en un sistema óptico con
simetría.
Optic Aberration
The study of optic
aberrations is related to evaluation of the
images produced by an optic system. This
phenomenon is related to the geometry of the
system. In this paper, we present approximate
analytical equations that describe the front of
the spherical wave affected by the aberration in
an optic system with symmetry. |
| 7. |
Yi Wang
(U.K.)
Applying
Dynamic Causal Mining in Retailing (pp.
57-63)
With the fast development of information
technology, retailers are suffering from the
excess of information. Too much information can
be a problem. However, more information creates
more opportunity. In retailing, information is
the key issue to maximizing revenue. It is now
hard to make timely or effective decisions and
to the right content to the right place, at the
right time and in the right form. This paper is
about managing the information so that the user
can gain more clear insight. It is about
integrating and inventing methods and
techniques. The Semantic Web will provide a
foundation for such a solution. However,
semantics only provide a way of mapping the
content of a web to user defined annotations.
Not many companies have fully utilized the power
of Internet retailing due to the various
technical obstacles have yet to be overcome. The
existing research in e-retailing focuses only on
the traditional retailing including direct and
indirect retailing approaches. This paper
suggests that applying association mining
techniques can further improve the dealing of
information overload in a web oriented retailing
environment. |
| 8. |
Israel Rivera Zarate, Patricia Pérez Romero,
Jesús Pimentel Cruz
(Mexico)
Base de
Conocimientos del Monitoreo de Parámetros
Sanguíneos
(pp. 65-70)
Se propone un sistema capaz de brindar un apoyo
al paciente diabético dado el gran
desconocimiento que la población tiene respecto
a esta enfermedad. La base de conocimientos se
ha tomado gracias a la asesoría de médicos y
laboratorista clínicos. Esta primera versión del
sistema inteligente utiliza como motor de
inferencia lógica difusa dadas sus
características de manejo de incertidumbre. Este
proyecto permitirá llevar un registro preciso de
los niveles de diferentes parámetros sanguíneos
de un paciente así como generar representaciones
gráficas y estadísticas de control de forma que
permita apoyar en la prevención y toma de
decisiones oportunas de la diabetes.
Knowledge Base for Monitoring
of the Blood Parameters
We propose a system capable to help a patient
with diabetes taking into account that in
general the persons have little knowledge about
this disease. This knowledge base was developed
in cooperation with medic personnel. The system
uses a fuzzy logic inference engine and, thus,
is capable of managing uncertainty. This project
allows keeping the records of values of various
blood parameters, graphic representation of data
and statistic information, and it is used in
prevention and decision making for patients with
diabetes.
|
| 9. |
Maria Botsivaly and Basile Spyropoulos
(Greece)
Supporting
the Continuity of Home Care and the
Bidirectional Exchange of Data among Various
Points of Care by Semantically Annotated Web
Services (pp. 71-78)
In this paper we report, first, the
conceptualization and initial design of a system
that creates a structured subset of data,
concerning the most relevant facts about a
patient’s healthcare, organized and
transportable, in order to be employed during
the post-discharge homecare period, enabling
simultaneously the planning and the optimal
documentation of the provided homecare. Second,
we present the actual development and
implementation of the system according to the
ASTM Continuity of Care Record (CCR)
Specification. Finally, we present the
implementation of a semantic-web-based system,
which aims to facilitate the exchange of
Clinical Information among various points of
care, and we also present a solution that
provides for the shared understanding of Medical
Data between diverge information systems, and
overcomes, both, the problems of incompatible
formats in messages and of the use of diverse
vocabularies. |
| 10. |
Mauricio Olguín Carbajal, Israel Rivera Zarate,
Oliver Pozas Quiteria
(Mexico)
Desarrollo de
un Sistema Inmersivo de Realidad Virtual basado
en Cabina Multipersonal y Camino sin Fin
(pp. 79-82)
El presente trabajo reporta los avances del
desarrollo de un sistema inmersivo de realidad
virtual que actualmente se esta desarrollando en
el CIDETEC del IPN. El objetivo principal es
generar un sistema de realidad virtual para el
desarrollo de proyectos de realidad virtual de
parte de estudiantes así como de profesores e
investigadores. También se tiene como objetivo
básico el que el CIDETEC pueda contar con un
área para la enseñanza de la realidad virtual en
un ambiente inmersivo.
Development of the System for
Immersing in Virtual Reality based on the
Endless Walking and Multipersonal Cabin
The present document reports the advances of the
development for a Virtual Reality Inmersive
System based on multipersonal cabin. This
project is actually under development in the
CIDETEC of the IPN. The main objective is to
build a Virtual Reality Lab for the use in
projects for researchers and students in the IPN.
Also one of the basic goals of the project is
development of the platform for development and
teaching of virtual reality applications. |
| 11. |
Jesús Antonio Álvarez Cedillo, Klauss Michael
Lindig Bos, Gustavo Martínez Romero
(Mexico)
Implementación de Filtros
Digitales Tipo FIR
en FPGA
(pp. 83-87)
En este artículo se hace la
descripción del diseño de un filtro digital tipo
FIR con ocho bits de ancho de
datos. Este sistema ha sido implementado en un
FPGA (SPARTAN 3E de XILINX) y posee un software
que realiza el cálculo de los coeficientes del
filtro y la reconfiguración del hardware. Las
pruebas se realizaron usando el programa MATHLAB
para verificar su funcionamiento.
Implementation of Digital Filters of FIR Type in
FPGA
This paper presents the description of
development of digital filter of FIR type with
eight bits data transmission. This system was
implemented in FPGA (SPARTAN 3E by XILINX) and
includes the software for calculation of filter
coefficients and hardware reconfiguration. The
experiments were conducted using simulation in
MATHLAB.
|
|