An Integrated Data Model and Web Protocol for Arbitrarily Structured Information

Hdl Handle:
http://hdl.handle.net/11285/572643
Title:
An Integrated Data Model and Web Protocol for Arbitrarily Structured Information
Authors:
Álvarez Cavazos, Francisco
Issue Date:
01/12/2007
Abstract:
Within the Web´s data ecosystem dwell applications that consume and produce information with varying degrees of structuring, ranging from very structured business data to the semistructured or unstructured data found in documents which contain a significant amount of text. Current database technology was not designed for the Web and, consequently, database communication protocols, query models, and even data models are inadequate for the demands of "data everywhere." Thus, a technique to uniformly store, search, transport and update all the variety of information within Web or intranet environments has yet to be designed. The Web context require the data management community to address: (a) data modeling and basic querying to support multiple data models to accommodate many types of data sources, (b) powerful search mechanisms that accept keyword queries and select relevant structured sources that may answer them, and (c) the ability to combine answers from structured and unstructured data in a principled way. In consequence, this dissertation constructively designs a technique to store, search, transport and update unstructured and structured information for Web or intranet-based environments: the Relational-text (RELTEX) protocol. Central to the design of the protocol is an integrated model for structured and unstructured data and its associated declarative language interface, namely, the RELTEX model and calculus. The RELTEX model is constructively defined departing from the relational and information retrieval models and their associated retrieval strategies. The model´s data items are tuples with structured "columns" and unstructured "fields" that further allow idiosyncratic schema in the form of "extension fields", which are tuple-specific name/value pairs. This flexibility allows representation of totally unstructured information, totally structured information, and mixtures of structured and unstructured data, such as tables where tuples have a varying number of fields over time. RELTEX calculus extends tuple relational calculus to consider text fields, similarity matches, match ranking, and sort order. Then, building on top of the formally-defined RELTEX data model and calculus and departing from the architecture of the Web, the RELTEX protocol is defined as a resource-centric protocol to describe and manipulate data and schema of unstructured and structured data sources. An equivalence mapping between RELTEX and the relational and information retrieval models is provided. The mapping suggests a wide range of applicability for RELTEX, thus proving the model´s value. On the other hand, the RELTEX protocol is distinguished from other techniques for data access and storage in the Web since (a) it supports structured and unstructured data manipulation and retrieval, (b) it offers operations to describe and manipulate both common and idiosyncratic schema of data items and (c) it directly federates data items to the Web over a compound key; thus demonstrating novelty and value. The RELTEX protocol, model and calculus are proven feasible by means of a proof-of-concept implementation. Departing from a motivating scenario, the prototype is used to provide representative examples of data and schema operations. Having demonstrated that the RELTEX protocol and model contribute towards the data modeling and basic querying challenge imposed by the Web, we expect that this dissertation benefits researchers and practitioners alike with a novel, valuable, effective and feasible technique to store, search, transport and update unstructured and structured information in the Web environment.
Keywords:
Web; Arbitrariamente
Degree Program:
Graduate Program in mechatronics and information technologies
Advisors:
Dr. José I. Icaza Acereto
Committee Member / Sinodal:
Dr. Juan C. Lavariega Jarquín; Dr. David A. Garza Salazar; Dr. Lorena G. Gómez Martínez; Dr. Susan D. Urban
Degree Level:
Doctor of Philosophy in Information Technologies and Communications Major in Computer Science
School:
Escuela de Investigación Informática
Campus Program:
Campus Monterrey
Discipline:
Ingeniería y Ciencias Aplicadas / Engineering & Applied Sciences
Appears in Collections:
Ciencias Exactas

Full metadata record

DC FieldValue Language
dc.contributor.advisorDr. José I. Icaza Aceretoes
dc.contributor.authorÁlvarez Cavazos, Franciscoes
dc.date.accessioned2015-08-17T11:37:14Zen
dc.date.available2015-08-17T11:37:14Zen
dc.date.issued01/12/2007-
dc.identifier.urihttp://hdl.handle.net/11285/572643en
dc.description.abstractWithin the Web´s data ecosystem dwell applications that consume and produce information with varying degrees of structuring, ranging from very structured business data to the semistructured or unstructured data found in documents which contain a significant amount of text. Current database technology was not designed for the Web and, consequently, database communication protocols, query models, and even data models are inadequate for the demands of "data everywhere." Thus, a technique to uniformly store, search, transport and update all the variety of information within Web or intranet environments has yet to be designed. The Web context require the data management community to address: (a) data modeling and basic querying to support multiple data models to accommodate many types of data sources, (b) powerful search mechanisms that accept keyword queries and select relevant structured sources that may answer them, and (c) the ability to combine answers from structured and unstructured data in a principled way. In consequence, this dissertation constructively designs a technique to store, search, transport and update unstructured and structured information for Web or intranet-based environments: the Relational-text (RELTEX) protocol. Central to the design of the protocol is an integrated model for structured and unstructured data and its associated declarative language interface, namely, the RELTEX model and calculus. The RELTEX model is constructively defined departing from the relational and information retrieval models and their associated retrieval strategies. The model´s data items are tuples with structured "columns" and unstructured "fields" that further allow idiosyncratic schema in the form of "extension fields", which are tuple-specific name/value pairs. This flexibility allows representation of totally unstructured information, totally structured information, and mixtures of structured and unstructured data, such as tables where tuples have a varying number of fields over time. RELTEX calculus extends tuple relational calculus to consider text fields, similarity matches, match ranking, and sort order. Then, building on top of the formally-defined RELTEX data model and calculus and departing from the architecture of the Web, the RELTEX protocol is defined as a resource-centric protocol to describe and manipulate data and schema of unstructured and structured data sources. An equivalence mapping between RELTEX and the relational and information retrieval models is provided. The mapping suggests a wide range of applicability for RELTEX, thus proving the model´s value. On the other hand, the RELTEX protocol is distinguished from other techniques for data access and storage in the Web since (a) it supports structured and unstructured data manipulation and retrieval, (b) it offers operations to describe and manipulate both common and idiosyncratic schema of data items and (c) it directly federates data items to the Web over a compound key; thus demonstrating novelty and value. The RELTEX protocol, model and calculus are proven feasible by means of a proof-of-concept implementation. Departing from a motivating scenario, the prototype is used to provide representative examples of data and schema operations. Having demonstrated that the RELTEX protocol and model contribute towards the data modeling and basic querying challenge imposed by the Web, we expect that this dissertation benefits researchers and practitioners alike with a novel, valuable, effective and feasible technique to store, search, transport and update unstructured and structured information in the Web environment.en
dc.language.isoenen
dc.rightsOpen Accessen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.titleAn Integrated Data Model and Web Protocol for Arbitrarily Structured Informationen
dc.typeTesis de Doctoradoes
thesis.degree.grantorInstituto Tecnológico y de Estudios Superiores de Monterreyes
thesis.degree.levelDoctor of Philosophy in Information Technologies and Communications Major in Computer Scienceen
dc.contributor.committeememberDr. Juan C. Lavariega Jarquínes
dc.contributor.committeememberDr. David A. Garza Salazares
dc.contributor.committeememberDr. Lorena G. Gómez Martínezes
dc.contributor.committeememberDr. Susan D. Urbanes
thesis.degree.disciplineEscuela de Investigación Informáticaes
thesis.degree.nameGraduate Program in mechatronics and information technologiesen
dc.subject.keywordWeben
dc.subject.keywordArbitrariamenteen
thesis.degree.programCampus Monterreyes
dc.subject.disciplineIngeniería y Ciencias Aplicadas / Engineering & Applied Scienceses
All Items in REPOSITORIO DEL TECNOLOGICO DE MONTERREY are protected by copyright, with all rights reserved, unless otherwise indicated.