"POLIBITS"

Research journal
on
Computer  science and computer engineering with applications

Issue 37 (January-June 2008)

Scanned cover pages

Full version in PDF

Editorial (p .3), Grigori Sidorov

SPECIAL SECTION: Natural Language Processing

1. He Ruifang, Qin Bing, Liu Ting, Liu Yang, and Li Sheng (China)

Iterative Feedback Based Manifold-Ranking for Update Summary (pp. 5-13)

The update summary as defined for the DUC2007 new task aims to capture evolving information of a single topic over time. It delivers focused information to a user who has already read a set of older documents covering the same topic. This paper presents a novel manifold-ranking frame based on iterative feedback mechanism to this summary task. The topic set is extended by using the summarization of previous timeslices and the first sentences of documents in current timeslice. Iterative feedback mechanism is applied to model the dynamically evolving characteristic and represent the relay propagation of information in temporally evolving data. Modified manifold-ranking process also can naturally make use of both the relationships among all the sentences in the documents and relationships between the topic and the sentences. The ranking score for each sentence obtained in the manifold-ranking process denotes the importance of sentence biased towards topic, and then the greedy algorithm is employed to rerank the sentences for removing the redundant information. The summary is produced by choosing the sentences with high ranking score. Experiments on dataset of DUC2007 update task demonstrate the encouraging performance of the proposed approach.

2.

The selection of words chosen for a query, crucial for the quality of results obtained by the query, can be substantially improved by using various lexical resources. Thus, for example, morphological dictionaries enable morphological expansion of queries, which is very important in highly inflective languages, such as Serbian. This paper discusses issues related to improvement of queries using a rule based procedure implemented in WS4LR, a workstation for manipulating heterogeneous lexical resources developed by the Human Language Technology Group at the University of Belgrade. The procedure is used for automatic production of lemmas for a morphological dictionary from a given list of compounds, and its evaluation on several different sets of data is given. Several examples illustrate how this procedure can be used for improvement of queries for web search engines. Results obtained for these examples show that the number of documents obtained through a query by using our approach can be remarkably increased.

 

3. Asif Ekbal and Sivaji Bandyopadhyay (India)

Web-Based Bengali News Corpus for Lexicon Development and POS Tagging (pp. 21-30)

Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing (NLP) applications. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. We have used a Bengali news corpus, developed from the web archive of a widely read Bengali newspaper. The corpus contains approximately 34 million wordforms. This corpus is used for lexicon development without employing extensive knowledge of the language. We have developed the POS taggers using Hidden Markov Model (HMM) and Support Vector Machine (SVM). The lexicon contains around 0.128 million entries and a manual check yields the accuracy of 79.6%. Initially, the POS taggers have been developed for Bengali and shown the accuracies of 85.56%, and 91.23% for HMM, and SVM, respectively. Based on the Bengali news corpus, we identify various word-level orthographic features to use in the POS taggers. The lexicon and a Named Entity Recognition (NER) system, developed using this corpus, are also used in POS tagging. The POS taggers are then evaluated with Hindi and Telugu data. Evaluation results demonstrates the fact that SVM performs better than HMM for all the three Indian languages.

4. Maher Daoud and Christian Boitet (France)

Methods for Handling Spontaneous E-commerce Arabic SMS: CATS, an Operational Proof of Concept (pp. 31-41)

The purpose of this paper is to show that it is necessary and possible to build (multilingual) NL-based e-commerce systems with mixed sublanguage and content-oriented methods. The analysis of the sublanguage and the integration of content-oriented methods will definitely increase the accuracy and robustness of the processing. To verify this assumption, we built an experimental system as a proof of concept. The system is a SMS-based classified ads selling and buying platform. To analyze the sublanguage, we first used a web based corpus to build the basic system. A content representation language is defined to capture the meaning of a classified ad post. The semantic grammars of content extraction are coded using the EnCo. Response generation is based on semantic matching (“looking for” and “sell” posts) and reasoning and is able to handle “no answer situations”. CATS is currently deployed in Jordan by Fastlink (the largest mobile operator). Testing the content extraction component with a real noisy free texts shows a 90% F-measure.

 

5. Vimal Mishra and R. B. Mishra (India)

Study of Example Based English to Sanskrit Machine Translation (pp. 43-54)

Example based machine translation (EBMT) has emerged as one of the most versatile, computationally simple and accurate approaches for machine translation in comparison to rule based machine translation (RBMT) and statistical based machine translation (SBMT). In this paper, a comparative view of EBMT and RBMT is presented on the basis of some specific features. This paper describes the various research efforts on Example based machine translation and shows the various approaches and problems of EBMT. Salient features of Sanskrit grammar and the comparative view of Sanskrit and English are presented. The basic objective of this paper is to show with illustrative examples the divergence between Sanskrit and English languages which can be considered as representing the divergences between the order free and SVO (Subject-Verb-Object) classes of languages. Another aspect is to illustrate the different types of adaptation mechanism.

REGULAR PAPERS

6. Magdalena Marciano Melchor, María Aurora Molina Vilchis, Juan Carlos Herrera Lozada (Mexico)

Aberración Óptica (pp. 55-56)

El estudio de las aberraciones ópticas radica en la evaluación de las imágenes que produce un sistema óptico.  Este  fenómeno se debe a la geometría del sistema. En este artículo se tiene la finalidad de presentar en forma aproximada las ecuaciones analíticas que describen a un frente de onda esférico afectado por aberración “coma” en un sistema óptico con simetría.

Optic Aberration

The study of optic aberrations is related to evaluation of the images produced by an optic system. This phenomenon is related to the geometry of the system. In this paper, we present approximate analytical equations that describe the front of the spherical wave affected by the aberration in an optic system with symmetry.

7. Yi Wang (U.K.)

Applying Dynamic Causal Mining in Retailing (pp. 57-63)

With the fast development of information technology, retailers are suffering from the excess of information. Too much information can be a problem. However, more information creates more opportunity. In retailing, information is the key issue to maximizing revenue. It is now hard to make timely or effective decisions and to the right content to the right place, at the right time and in the right form. This paper is about managing the information so that the user can gain more clear insight. It is about integrating and inventing methods and techniques. The Semantic Web will provide a foundation for such a solution. However, semantics only provide a way of mapping the content of a web to user defined annotations. Not many companies have fully utilized the power of Internet retailing due to the various technical obstacles have yet to be overcome. The existing research in e-retailing focuses only on the traditional retailing including direct and indirect retailing approaches. This paper suggests that applying association mining techniques can further improve the dealing of information overload in a web oriented retailing environment.

8. Israel Rivera Zarate, Patricia Pérez Romero, Jesús Pimentel Cruz (Mexico)

Base de Conocimientos del Monitoreo de Parámetros Sanguíneos (pp. 65-70)

Se propone un sistema capaz de brindar un apoyo al paciente diabético dado el gran desconocimiento que la población tiene respecto a esta enfermedad. La base de conocimientos se ha tomado gracias a la asesoría de médicos y laboratorista clínicos. Esta primera versión del sistema inteligente utiliza como motor de inferencia lógica difusa dadas sus características de manejo de incertidumbre. Este proyecto permitirá llevar un registro preciso de los niveles de diferentes parámetros sanguíneos de un paciente así como generar representaciones gráficas y estadísticas de control de forma que permita apoyar en la prevención  y toma de decisiones oportunas de la diabetes.

Knowledge Base for Monitoring of the Blood Parameters

We propose a system capable to help a patient with diabetes taking into account that in general the persons have little knowledge about this disease. This knowledge base was developed in cooperation with medic personnel. The system uses a fuzzy logic inference engine and, thus, is capable of managing uncertainty. This project allows keeping the records of values of various blood parameters, graphic representation of data and statistic information, and it is used in prevention and decision making for patients with diabetes.

 

9. Maria Botsivaly and Basile Spyropoulos (Greece)

Supporting the Continuity of Home Care and the Bidirectional Exchange of Data among Various Points of Care by Semantically Annotated Web Services (pp. 71-78)

In this paper we report, first, the conceptualization and initial design of a system that creates a structured subset of data, concerning the most relevant facts about a patient’s healthcare, organized and transportable, in order to be employed during the post-discharge homecare period, enabling simultaneously the planning and the optimal documentation of the provided homecare. Second, we present the actual development and implementation of the system according to the ASTM Continuity of Care Record (CCR) Specification. Finally, we present the implementation of a semantic-web-based system, which aims to facilitate the exchange of Clinical Information among various points of care, and we also present a solution that provides for the shared understanding of Medical Data between diverge information systems, and overcomes, both, the problems of incompatible formats in messages and of the use of diverse vocabularies.

10. Mauricio Olguín Carbajal, Israel  Rivera Zarate, Oliver Pozas Quiteria (Mexico)

Desarrollo de un Sistema Inmersivo de Realidad Virtual basado en Cabina Multipersonal y Camino sin Fin (pp. 79-82)

El presente trabajo reporta los avances del desarrollo de un sistema inmersivo de realidad virtual que actualmente se esta desarrollando en el CIDETEC del IPN. El objetivo principal es generar un sistema de realidad virtual para el desarrollo de proyectos de realidad virtual de parte de estudiantes así como de profesores e investigadores. También se tiene como objetivo básico el que el CIDETEC pueda contar con un área para la enseñanza de la realidad virtual en un ambiente inmersivo.

Development of the System for Immersing in Virtual Reality based on the Endless Walking and Multipersonal Cabin

The present document reports the advances of the development for a Virtual Reality Inmersive System based on multipersonal cabin. This project is actually under development in the CIDETEC of the IPN. The main objective is to build a Virtual Reality Lab for the use in projects for researchers and students in the IPN. Also one of the basic goals of the project is development of the platform for development and teaching of virtual reality applications.
11. Jesús Antonio Álvarez Cedillo, Klauss Michael Lindig Bos, Gustavo Martínez Romero (Mexico)

Implementación de Filtros Digitales Tipo FIR en FPGA (pp. 83-87)

En este artículo se hace la descripción del diseño de un filtro digital tipo FIR  con  ocho bits de ancho  de datos. Este sistema ha sido implementado en un FPGA (SPARTAN 3E de XILINX) y posee un software que realiza el cálculo de los  coeficientes del filtro y la reconfiguración del hardware. Las pruebas se realizaron usando el programa MATHLAB para verificar su funcionamiento.

Implementation of Digital Filters of FIR Type in FPGA

This paper presents the description of development of digital filter of FIR type with eight bits data transmission. This system was implemented in FPGA (SPARTAN 3E by XILINX) and includes the software for calculation of filter coefficients and hardware reconfiguration. The experiments were conducted using simulation in MATHLAB.