Trust Management for Service-Oriented Environments

ETrust E Management for Service-Oriented Environments “This page left intentionally blank.” Zaki Malik • Athman Bou...

Author: Zaki Malik | Athman Bouguettaya

23 downloads 911 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

ETrust E Management for Service-Oriented Environments

“This page left intentionally blank.”

Zaki Malik • Athman Bouguettaya

Trust Management for Service-Oriented Environments

Zaki Malik Department of Computer Science Wayne State University 456 State Hall 5143 Cass Avenue Detroit, MI 48202 USA [email protected]

Athman Bouguettaya CSIRO ICT Center Australian National University Computer Sci. & Information Tech. Bldg. North Road Canberra, ACT 2601 Australia [email protected]

ISBN 978-1-4419-0309-9 e-ISBN 978-1-4419-0310-5 DOI 10.1007/978-1-4419-0310-5 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009934434 © Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

To all my family and friends. Zaki Malik

“This page left intentionally blank.”

To my parents. Athman Bouguettaya

“This page left intentionally blank.”

Foreword

Web services and service oriented environments are key enablers for the migration of entertainment, business, sociability, science and health-care from the physical world to the virtual world. The result of this on-going migration is a new “place” very much different from the physical world, one where interconnected services interact with human users, sensors and embedded devices. Yet for this vision to become reality, trust needs to be addressed as members of the global e-society regularly today deal with the question whether they can trust other, unknown parties. Trust is a vital component of internet-based interactions and service-oriented environment, but all too often it is assumed to be an implicit property that exists in the background rather than being an explicit property that is well-defined and quantifiable. While providing trust is challenging in existing computing systems, providing trust in service oriented environments is much more complex due to the dynamic and adaptable nature of these environment which are often large scale and across domains. To date the problem of trust for service oriented environments has been largely unexplored. This book represents the first comprehensive coverage of the principles, methods and systems for trust management and evaluation in service oriented environments. The book provides the key relevant foundations by looking at regulations and common practices, which that are the traditional solutions for establishing trust, and then discussing departures from these conventional solutions. The central part of the book is devoted to reputation-based approaches to trust. Reputation techniques are the most widely used techniques for assessing whether parties in a distributed open system can be trusted. The book provides a well organized systematic treatment of reputation, by organizing the presentation according a notion of reputation life cycle. Based on this notion, the book discusses all aspects related to reputation management from the creation of reputation information to reputation assessment. The book also provides a comprehensive coverage of the robustness of the various reputation techniques and an assessment of their performance. Techniques and systems are described in detail and illustrated with extensive examples. This book is an invaluable reference for researchers in academia and industry who are interested in getting acquainted with the foundations of trust for service ix

x

Foreword

based environments and in exploring open research directions in this important area. Practitioners will find in the book information on recent trends about technology and tools for reputation which will help them in understanding the potential practical applications of this technology. West Lafayette, IN. USA May 2009

Elisa Bertino

Preface

Service Computing holds the promise of transitioning the field of IT to levels of efficiencies and globalization never reached before. The ideal space of operation is without any doubt the Web, where data, documents, and tools/applications are expected to be uniformly represented and managed through the prism of services. The objective is to be able to leverage the vast array of data and applications in the most cost effective way while providing more opportunities for leveraging previous investments. The standardization efforts have played a key role leading to the early adoption of services as the paradigm of choice for IT investments. However, much remains to be done in terms of providing a robust and efficient infrastructure to manage the whole life-cycle of services. Unfortunately, this is mainly due to “over-standardization” efforts. The last few years have seen too many standards, in many cases competing; springing up from various standards bodies to deal with specific and sometimes narrow requirements of service management, organization, and use. This unfortunate situation has created confusion in the targeted markets and stifled innovation in what could be considered as a potentially revolutionizing field. Additionally, this state of affairs had the effect of hampering the wide and speedy deployment of service-oriented solutions. Service computing is undoubtedly the latest emerging paradigm in the computing “food chain”. Computing has gone through several key stages since the first modern computer was designed. Each computing period was governed by a paradigm. For instance, when meaning was added to the data paradigm, the information paradigm was born. When reasoning was added to information, this led to the transition to the knowledge paradigm. The argument is that we are now transitioning to a new paradigm, called “service”, that is result of adding action to “knowledge”. This in effect means that services can be thought of knowledge that is acted upon. It is noteworthy pointing out that each paradigm was supported by a management system that was usually the result of intense research efforts. Our argument is that the time has come to build a foundation for next-generation service management systems using Web services as the technology that embodies that the service paradigm. We propose laying the foundation of the Web Service Management System (WSMS). We identify four core components in a generic WSMS: service composition, service xi

xii

Preface

optimization, service change, and service trust. This book provides a comprehensive analysis on service trust focusing on reputation as a mechanism to implement trust among services. Deploying services on the Web which can be queried and composed requires a dynamic trust infrastructure that adapts to the Web fluctuations and the inherent lack of any single authority. Reputation techniques are well suited to deal with the distrusted and dynamic nature of the Web, coupled with the lack of any single authority for providing trust credentials and enforcement. This book covers entire aspects of reputation that include issues related to reputation bootstrapping, credential gathering and computation, interaction models, composed reputation, and reputation propagation, among other issues. The book provides a holistic treatise of reputation management in service-oriented environments, which includes analytical models and experiments to assess the proposed solutions. Detroit, MI. USA Canberra, Australia May 2009

Zaki Malik Athman Bouguettaya

Acknowledgments

Two persons who deserve most thanks are my mother and father. I would like to thank them for everything they have provided. Their unwavering love and affection has been instrumental in all aspects of my life. I am also indebted to my wife for her support and encouragement. She has helped me through happy and hard times to stay focused. I thank her for all her patience. I would like to thank my daughter, Ayra, for making my life more joyful. I would also like to thank my sister and her family for all their love. Special thanks goes to my brother, Safi, for continuous encouragement and support in all matters concerning me. He always pushed me to be the best and never hesitated to provide me with the necessary help. My everlasting thanks ! Zaki Malik

I would like to acknowledge the support of my family during the preparation of this book: my wife Malika, my children: Zakaria, Ayoub, and Mohamed-Islam. I would also like to thank my employer CSIRO (Australia) for providing me the environment to successfully finish this work. Athman Bouguettaya

xiii

“This page left intentionally blank.”

Contents

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Trust Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.3 Traditional Solutions for Establishing Trust . . . . . . . . . . . . . . 6 1.2 The Role of Reputation in Trust Management . . . . . . . . . . . . . . . . . . . 9 1.3 Example Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Book Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2

Service-Oriented Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 SOE Interaction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Entities and Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Service Interactions: Extension through Ontologies . . . . . . . . . . . . . . 2.2.1 Definition of Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Operational Description of Communities . . . . . . . . . . . . . . . . 2.2.3 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 14 14 18 19 22 28

3

Reputation Information Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Parameters Reflecting the Quality of Web Services . . . . . . . . . . . . . . . 3.2 Reputation Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Option I: Adapting Initial Reputation to Majority Behavior . 3.2.2 Option II: Assigned Initial Reputation . . . . . . . . . . . . . . . . . . .

31 31 34 36 38

4

Reputation Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Web Service Reputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Reputation Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Credibility of Raters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Evaluating Rater Credibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Personalized Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Temporal Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 First-hand Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43 43 45 46 47 52 52 53 xv

xvi

Contents

4.3 Reputation Assessment with Scarce Ratings . . . . . . . . . . . . . . . . . . . . 55 4.4 Reputation Assessment for Service Compositions . . . . . . . . . . . . . . . . 60 4.4.1 Share the Fame / Share the Blame . . . . . . . . . . . . . . . . . . . . . . 62 5

Robustness and Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.1 Reputation Bootstrapping Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.1.1 Comparing Bootstrap Approaches . . . . . . . . . . . . . . . . . . . . . . 69 5.1.2 Evaluating the Adaptive Approach . . . . . . . . . . . . . . . . . . . . . . 70 5.1.3 Comparing the Reputation Assignment Approaches . . . . . . . 72 5.2 Rater Credibility Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.1 Honest Raters in Majority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Equal Number of Honest and Dishonest Raters . . . . . . . . . . . 76 5.2.3 Dishonest Raters in Majority . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3 RATEWeb Evaluation and Comparison . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.1 Honest Raters in Majority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3.2 Equal Number of Honest and Dishonest Raters . . . . . . . . . . . 82 5.3.3 Dishonest Raters in Majority . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.4 Case of Optimistic Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.5 Case of Pessimistic Consumer . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.6 Measuring Transaction Success . . . . . . . . . . . . . . . . . . . . . . . . 87 5.3.7 Changing Rate of Maliciousness . . . . . . . . . . . . . . . . . . . . . . . 88 5.4 Reputation Assessment with Scarce Ratings . . . . . . . . . . . . . . . . . . . . 90 5.4.1 HMM vs ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4.2 No Prediction or Personal Experience Input . . . . . . . . . . . . . . 93 5.4.3 No Prediction but Including Personal Experience . . . . . . . . . . 93 5.5 Composed Services Reputation Evaluation . . . . . . . . . . . . . . . . . . . . . 94 5.5.1 Fuzzy Rules for Composition Reputation . . . . . . . . . . . . . . . . 94 5.5.2 Blame Propagation Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.5.3 Choosing a Blame Forwarding Strategy . . . . . . . . . . . . . . . . . 97 5.5.4 Impact of Orchestrator’s Decision . . . . . . . . . . . . . . . . . . . . . . 99 5.5.5 Blame Forwarding Impact in Relation to Maliciousness . . . . 100 5.6 Reputation Assessment Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6.1 Model Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . 104 5.6.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6

Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1 Reputation Systems for E-business . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1.1 Incentives-based Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.1.2 Credibility-based Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.2 Decentralized Reputation Management . . . . . . . . . . . . . . . . . . . . . . . . 110 6.2.1 Peer-to-Peer Reputation Systems . . . . . . . . . . . . . . . . . . . . . . . 110 6.2.2 Reputation Management on the Grid . . . . . . . . . . . . . . . . . . . . 113 6.2.3 Reputation Management for Multi-Agents Systems . . . . . . . . 114 6.3 Reputation in Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.4 Reputation Management for the Web . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Contents

xvii

6.4.1 Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.4.2 Electronic Mail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5 Reputation in Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

“This page left intentionally blank.”

Acronyms and Abbreviations

ANN Artificial Neural Network BIC Bayesian Information Criterion BPEL Business Process Execution Language B2B Business-to-Business B2C Business-to-Consumer BSR Bootstrap Success Rate BWA Baum-Welch Algorithm CB Car Broker Web service CD Car Dealer Web service CH Credit History Web service CORBA Common Object Request Broker Architecture Cr (tx ) Credibility of Service Rater tx D Defective transaction FI Financing Web service HMM Hidden Markov Model HTML HyperText Markup Language HTTP HyperText Transfer Protocol IN Insurance Web service LC Lemon Check Web service M Majority Rating OpSign Operational Significance OWL Web Ontology Language ρ Pessimism Factor P2P Peer-to-Peer PerEval Personal Evaluation PSL Privacy Significance Level QoWS Quality of Web Service ℜ Rate of Maliciousness RATEWeb Reputation Assessment for Trust Establishment among Web services RMS Reputation Management System RSV Reputation Significance Vector xix

xx

SOAP SOC TP Uf UDDI URI V VE WSDL WSMS XML

Acronyms and Abbreviations

Simple Object Access Protocol Service-Oriented Computing Trip Planner Enterprise Usefulness Factor Universal Description Discovery and Integration Uniform Resource Identifier Current Reported Rating Virtual Enterprise Web Service Definition Language Web Service Management System Extensible Markup Language

Chapter 1

Introduction

The Web was originally created to enable the sharing of information among scientists. Sir Tim Berners-Lee first proposed the idea of creating a “Web of information” when he suggested a solution to integrate three technologies: HTML (HyperText Markup Language), HTTP (HyperText Transfer Protocol), and Web browsers [131]. HTML was proposed as a language to write Web documents. HTTP was proposed as a protocol to transmit Web pages. Web browser software clients were introduced to receive, interpret and display information. Initially, the Web was mainly perceived as a “worldwide bulletin board,” with data sharing and reuse as its prime objectives. Soon after, governments, businesses, and individual users took active interest in the Web by publishing and accessing information of interest. This resulted in voluminous amount of data potentially accessible to millions of users. Consequently, effective techniques and tools were required to retrieve quality information from the Web. Over the years, numerous applications have been developed for efficient data access and reuse. These data-centric solutions were based on the premise of an “interactive” Web where users’ involvement was necessary for virtually any Webbased transaction. In recent years, Web-related research has shifted from how to access such Web content to exploring new forms of what can be published and accessed in the Web. Application reusability is key to enabling this paradigm shift. The Web that had been essentially a repository of static content has started a steady evolution to become a “vibrant” environment where applications can be automatically invoked by other Web clients. These advances have brought revolutionary changes in building distributed applications [125] with the introduction of the Service Oriented Computing (SOC) paradigm. The service-oriented Web represents an attractive paradigm for tomorrow’s interactions spanning a wide range of domains from e-economy to e-science and e-government. For example, enterprises in the new Service Web would no longer represent single monolithic organizations, but rather be a loose coupling of smaller applications offered by autonomous providers. A key milestone in this regard has been the introduction of Web services. A Web service is an autonomous platform-independent computational element that can be described, published, discovered, orchestrated and programmed using standard proZ. Malik and A. Bouguettaya, Trust Management for Service-Oriented Environments, DOI 10.1007/978-1-4419-0310-5_1, © Springer Science + Business Media, LLC 2009

1

2

1 Introduction

tocols for the purpose of building agile networks of collaborating applications distributed within and across organizational boundaries [106]. Web services have also been defined as: • A Web service is a “stand-alone application accessible over the Web that can be automatically discovered and invoked by applications (and humans)” or a “loosely coupled application using open, cross-platform standards and which inter-operate across organizational and trust boundaries” [132]. • A Web service is a “business function made available via the Internet by a service provider, and accessible by clients that could be human users or software applications” [26]. • A Web service is a set of related functionalities that can be programmatically accessed and manipulated through the Web. More precisely, a Web service is a self-describing software application that can be advertised, located, and used across the Web using a set of standards such as WSDL, UDDI, and SOAP [106]. Web services are being adopted by businesses, government agencies, and scientific communities [71] [28]. Businesses are increasingly using Web services to automate their interactions both with their customers (B2C) and each other (B2B). Web services enable businesses to outsource some of the required functionality to other businesses resulting in a service-oriented enterprise[12] [6]. For example, B2B integration through service composition, allows multiple services from different providers to be combined into a value-added composite service [99] [60]. In e-government, work on several research prototypes (e.g., our own WebDG, WebSenior [21, 98], ARGOS [50]) has shown the viability of the Web service approach in providing e-government services. Web services are also being introduced into several established and emerging research areas. Examples include mobile computing [148], grid computing [129] (e.g., the Open Grid Services Architecture (OGSA) [47]) and statistical sciences (e.g., NISS WebSwap [118]). A plethora of Web services competing in offering similar functionalities are expected for the new “service Web”. A key requirement then is to provide mechanisms for the quality access and retrieval of services. In essence, the major challenge lies in providing a trust framework for enabling the selection and composition of Web services based on trust parameters. The fundamental rationale behind the need for trust is the necessity to interact with unknown entities. The lack of a global monitoring system for the service-oriented environment exacerbates the problem of trust. By definition, Web services are autonomous (i.e., provided by independent service providers), highly volatile (i.e., low reliability), and a priori unknown (i.e., new or no prior history) [99] [106]. Web services can have no particular behavior mandated on them and are highly volatile as they are subject to frequent fluctuations during their lifetime (e.g., unavailability, changes in quality and performance). The inherent open and large-scale nature of Web services means that traditional security approaches cannot aid in instilling trust in service-oriented environments completely. There is a growing consensus that the Web service ‘revolution’ would not eventuate until trust related issues are resolved [18]. Several studies have shown that the lack of trust is the most important reason behind people’s reluctance in us-

1 Introduction

3

ing the Web in risk-involving transactions. For many emerging applications, such as e-government, e-commerce, and e-science, Web services will have to automatically determine to which extent they may trust other services before they interact. Hence, resolving trust issues will be a determining key to optimizing the selection of services. Similarly, trust would be a key criterion to ascertaining privacy preservation in Web service selection. Research results show that reliable reputation systems increase users’ trust in the Web [34]. Many online businesses have recognized the importance of reputation systems in improving customers’ trust and, consequently, stimulating sales [112]. Examples include Ebay and Amazon. In fact, managing reputation has itself become a Web business. Examples of online companies whose prime vocation is reputation collection and dissemination include Bizrate, Epinions, Citysearch [34]. Reputation systems are particularly vital in online marketplaces, also known as C2C e-auctions. Many of these marketplaces would probably not have survived without reputation systems [34]. Several studies have investigated and, generally, confirmed that, in these types of business environments, reputation systems benefit both sellers and buyers [69] [74]. Reputation is also important in achieving properties of Web-based interaction that traditional security mechanisms do not ensure. Security mechanisms, e.g., encryption, authorization, authentication enable a provably secure interaction between Web services and their service requesters. They, however, do not provide the type of functionalities that reputation mechanisms provide. For example, authentication only ensures that a party x is actually the party it claims it is and does not provide any indication or prediction on the quality of the outcome of interacting with that party. Also, authorization only ensures that a given party has the credentials to take a given action. It does not ensure that that party is actually the “best” that can take that action [124]. We anticipate that the deployment of reputation systems will have a significant impact on the growth of the different emerging applications such as e-business, e-government, and e-science. Research on reputation is currently gaining an increasing momentum. Different aspects of reputation are being investigated in various disciplines including economics, computer science, marketing, law, sociology, and psychology [34, 140, 23, 112, 10, 116, 20, 76, 151, 145]. Despite the abundance in reputation-related literature, little research has focused on the reputation of Web services for establishing trust. An exception is the work in [92] where the authors present an interesting model where agents disseminate reputations and endorsements for Web services through a specialized agency. Clearly, such a solution that depends on a central authority presents scalability and trust challenges. The reputation-based trust model in [95] requires human participation for rating Web services. Agents that report reputation ratings are also assumed to be trusted parties. Similarly, the common agencies to whom these ratings are communicated for sharing/aggregation are also expected to behave honestly. We propose to address the above issue in this book. For example, we calculate the reputation of a Web service based on the testimonies of both trusted and malicious raters. Another feature that seems to be at least implicitly overlooked in the previous models, but that we plan to use in assessing reputation, is the incorporation of “local historical information” with the “global reputation view.”

4

1 Introduction

In this book, the objective is to investigate approaches and techniques for reputation management on the service Web. Our research aims to provide a Reputation Management System (RMS) as an efficient trust framework for the trustworthy selection and composition of Web services. Our focus is a service-oriented environment where Web services can act as both consumers (i.e., requesters) and providers of services. The resulting architecture will reflect this mode of interactions. In this regard, we will construct a formal framework and evaluation methodology for reputation-based Web services interactions. In the following, we present a thorough problem discussion. We first discuss the significance of trust in online interactions. We then discuss the traditional solutions for establishing trust and show how these may not provide a comprehensive solution for the service Web. This is followed by a discussion on the role of reputation in trust management. An example scenario is presented towards the end for better understanding.

1.1 Trust Management In this section, we present the motivation for our work. Specifically, we discuss the significance of establishing trust on the service Web. We also provide an insight into the traditional solutions for establishing trust and show how these solutions fall short of addressing the problem of trust on the service Web.

1.1.1 Motivation The last decade has seen explosive growth in the number of “Web users.” The Web is slowly becoming the preferred medium for a myriad of activities. These include but are not limited to: getting news, buying and selling of goods and services (e.g. travel services offered by Expedia, Priceline, Orbitz, etc), research, e-learning, egovernance, resource utilization, etc. Online communities and marketplaces have thus gained popularity as they are not limited by geographical and spatial constraints. Moreover, these communities encourage interactions between complete strangers (often based on prior interaction history). To name a few, Ebay, Yahoo Auctions and the Amazon marketplace have fostered such interactions between unfamiliar participants. However, such interactions involve an element of risk: namely fraudulent behavior. Both providers and consumers face this risk. Fraudulent behavior on part of the provider includes inconsistent delivery of goods or services promised, not delivering the promised commodity, or delivering a sub-standard item or service. Similarly, the consumer may engage in fraud by not paying, providing false details of the interaction for damaging the provider, etc. The success of current systems mentioned above (that support interactions between unknown participants) stems from the fact that these systems are centrally

1.1 Trust Management

5

managed. The central authority acts as an intermediary to facilitate the coordination, communication and cooperation between the participants. However, online interactions are no longer limited to centralized systems. The advent of technology and the growth of the Web has resulted in highly decentralized, self-organizing and autonomous systems. The “service Web” is expected to be decentralized, where a large spectrum of applications will become available in the form of “services.” Web services would often have to interact with other unknown entities on the service Web. Since the services would be mutually unfamiliar, the decision to engage in a transaction with another Web service (i.e., the process of service selection) would not be based on any prior “first-hand” knowledge. This naturally makes it challenging to establish, a priori, whether and to which extent a given service may be entrusted to carry out the required functionality. Solving this dilemma is the main objective of our work. Specifically, our research aims at establishing trust between Web service consumers and Web service providers.

1.1.2 Trust Trust has been defined as “an assured reliance on the character, ability, or strength of someone or something.” Establishing trust is therefore a precondition for any transaction [70]. In a service-oriented environment, trust correlates to the ability of a service to perform the required functionality in an acceptable manner. However, the incentives for establishing trust differ according to the role (consumer or provider) a Web service undertakes. For instance, the key incentive to establish trust for a service acting as a consumer is risk reduction. Trust reduces the uncertainty associated with any interaction. It enables the services’ clients to make “informed” decisions about the expected behavior of the provider service. From the service provider’s perspective, the prime incentive to establish consumers’ trust in its service is to establish/increase/maintain its client base. Since the possibility of self-interested and malicious services cannot be overlooked, the cooperation among services relies on the production of trust. Moreover, the individual (perhaps malicious) motives driving service behavior to maximize benefits may tempt a service to defect. Hence, service-oriented environments need to build trust mechanisms that help in answering the question of whom to trust and whom not to, before undertaking a transaction. As we march towards the service Web, Web services will increasingly be the central building blocks in a large spectrum of Web applications. The share of the service Web segment of the Web will become more important. In particular, Web services will be the major enabling technology for B2B interoperability and integration. This will readily extend the service Web to the biggest component of the Web economy: the B2B e-commerce market. According to a report from IDC, worldwide spending on software in support of Web services-based projects will reach $11 billion by 2008. In 2003, this number was only $1.1 billion [62]. Organizations will increasingly provide Web services that will substitute their traditional Web-based interaction with their audience. Many of these services will handle sensitive infor-

6

1 Introduction

mation. The consequence is that establishing trust will become a key requirement for Web services.

1.1.3 Traditional Solutions for Establishing Trust Figure 1.1 shows four major approaches that have been adopted to enable trustbased interactions on the service Web: (i) regulation, (i) self regulation, (iii) thirdparty certification, and (iv) security technologies. These approaches have attempted to attenuate consumers’ mistrust towards service providers. However, they have achieved little success in closing the trust gap between providers and their consumers. The following discussion elicits the insufficiency and/or inadequacy of these approaches for the service Web. We will use an example to illustrate the main points of our discussion. Assume that PharmacyOne is an online pharmacy that sells brand and generic drugs. PharmacyOne does not actually possess drugs but, orders them on behalf of its customers, from other online pharmacies. Purchasing a drug from PharmacyOne is a four-step process. First, a customer accesses PharmacyOne’s Web site to order one or more drugs. The customer specifies the drugs that he/she wants to purchase, the quantities, and his/her location. PharmacyOne then automatically invokes a number of Web services provided by other online pharmacies. The request that PharmacyOne submits to each of these services includes the name and quantities of the drug(s) to be ordered and the zip code of the customer to which the drug(s) would be delivered. The Web service of each of the pharmacies answers PharmacyOne’s request by a response that includes the amount which that pharmacy would charge PharmacyOne for delivering the drug(s) to the customer. PharmacyOne then selects the “best” offer, adds its overhead and profit, and displays the result to the customer. If the customer submits a purchase order, PharmacyOne sends the customer’s delivery information (name, address, phone) to the selected pharmacy along with the payment for that transaction. PharmacyOne’s business model is known to its customers. However, to maintain its competitive edge, PharmacyOne does not reveal the identity of its suppliers to its customers. This example highlights the following issues: • When selecting the best provider, PharmacyOne typically chooses the one offering the least price. This requires that PharmacyOne must trust the provider’s commitment and ability to (i) deliver the requested drugs to the customer within the promised delays, and, more importantly, (ii) handle the customer’s information according to a privacy policy that does not violate PharmacyOne’s commitment to the customer. • Customers do not know which pharmacy will provide the drugs. A customer’s consent to the transaction assumes a transitive trust model, i.e., the customer trusts PharmacyOne and any party that PharmacyOne trusts. The trust chain may obviously include further links.

1.1 Trust Management

7

• Customers’ trust in PharmacyOne depends on their perception of how it’s suppliers fulfill the orders. In particular, violation to a customer’s privacy that may be linked to a prior transaction with PharmacyOne would have a direct impact on his/her trust in PharmacyOne. Therefore, PharmacyOne must be able to accurately assess the trustworthiness of its suppliers. It must be able to determine to which extent each supplier is responsible for the aggregated (dis)trust that a given customer has in PharmacyOne. We now elaborate on the two major factors justifying the proposed research, namely, the inadequacy of traditional trust solutions and the theoretical and experimental evidences for the viability of reputation-based approaches as ascertained in other contexts:

Fig. 1.1 Traditional Trust Approaches

1.1.3.1 Regulations The emergence of the Web has enabled a new type of across-country business transactions. In these transactions, it is difficult to enforce contracts by law. This is more so when these transactions involve Web services. For example, consider a European citizen purchasing a drug from the US-based PharmacyOne of the earlier example. PharmacyOne selects an Asia-based drug supplier to fulfill the request of that citizen. Assume that the provider of the Asia-based service violates the EU citizen’s privacy by selling his/her personal information to an Australian drug maker. In this case, it is difficult for the citizen to know that his/her privacy was violated. The laws that are applicable in one country may not be applicable in the other. More importantly, even if the privacy violation is detected, no explicit global privacy reg-

8

1 Introduction

ulation exists to establish responsibilities and repress violators. Overall regulations are therefore not enforceable on the service Web. 1.1.3.2 Self Regulation In self regulation, service providers present a set of commitments to their consumers. The providers are themselves the authors of these commitments that they advertise to the services’ consumers. The provider of a service may or may not implement those commitments. Moreover, providers may unilaterally change their commitments. These changes may be retrospective, i.e., be applicable to past interactions. It is therefore not possible to establish trust based on a service provider’s commitment. This simple approach was not effective on the traditional Web and it is likely to be less effective on the service Web. Service consumers would have no rationale to trust claims presented by unknown service providers if they are not able to check the validity of those claims. 1.1.3.3 Third-party Certification In third-party certification, service providers get certified by a third party that is, normally, known and trusted by consumers. This approach clearly does not scale to the service Web. The number of Web services, their workflow’s opacity, potential transience, and dynamic composability are factors that make this approach not feasible on the service Web. For example, assume that the service PharmacyOne gets certified by a third party. PharmacyOne’s partners, however, may or may not be certified. It is not easy for PharmacyOne’s consumers to determine to which extent they may trust PharmacyOne, as their interaction with PharmacyOne may involve partners that are not certified. 1.1.3.4 Security Technologies Security solutions focus on protecting individuals and organizations from outsider attacks. But security mechanisms, e.g., encryption, authorization, authentication, are not sufficient to establish trust. Their prime purpose is to secure the interaction between two parties. They do not provide any guarantees about the potential consequences of interacting with a given party. For example, an authentication mechanism may assure a service consumer that the Web service to which it is disclosing a sensitive information is actually the service it claims it is. However, the Web service may or may not actually implement the terms of its privacy policy. Similarly, a Web service may use only encrypted messages to interact with its consumers. However, this only ensures that the communication between the service and its consumers is confidential. Here again, this is not a guarantee for the service consumer that the service may be trusted. In summary, trust and security are two different concepts

1.3 Example Scenario

9

and should not be mixed. Security is mainly about providing a transaction environment free of intrusions or attacks. Whereas trust is the belief that a consumer has in the abilities of a provider to deliver. Both concepts are equally important and may be used to support each other.

1.2 The Role of Reputation in Trust Management In recent years, an abundant theoretical and experimental research has studied the role of reputation in establishing trust in various contexts. Reputation has been defined as the confidence in the ability of a specific subject to fulfill a certain task [114]. It is a subjective assessment of a characteristic or an attribute ascribed to one entity by another based on observations or past experiences. Normally experiences from more than one source are assimilated to derive the reputation. Thus, in context of Web services, we refer to the aggregated perceptions that the community of service requesters have for a given Web service as service reputation. Reputation is regarded as a predictor of future behavior. This stems from the supposition that past behavior is indicative of future behavior [70]. Any Web service with high reputation would be regarded as one that has performed “satisfactorily” in a consistent manner over the past. This would imply that the service can be trusted to perform as expected in the future as well. In essence, trust is dependent on reputation. This is shown in Figure 1.1. Reliable reputation systems increase the user’s trust and encourage cooperation. A Web service that has low reputation would not be recommended for business transactions. Thus, reputation-based trust management will ultimately result in eliminating poor performers and motivating honest behavior among Web services. In recent years, an abundant theoretical and experimental research has studied the role of reputation systems in establishing trust in various contexts. A brief review is provided in Chapter 6. Most of that research confirmed that reliable reputation systems increase users’ trust in the Web (e.g., [112, 34]). An eloquent example is the impact of reputation systems on e-commerce applications in general and online auctions in particular. Consider the case of eBay, the world’s largest online auctioning site. In 2003, 94.9 million users posted 971 million listings with a gross merchandise sales of $24 billion. Several empirical studies attribute much of eBay’s commercial success to to its reputation mechanism, known as eBay’s Feedback Forum, e.g., [113, 57]. This mechanism has been effective in deterring dishonest behavior and, consequently, stimulating eBay’s growth.

1.3 Example Scenario In this section we provide a scenario to illustrate the need for a reputation management system in a service oriented environment. The example will be used through-

10

1 Introduction

out the remaining chapters of this book to illustrate the major requirements that will drive our work. Consider a car brokerage application (Figure 1.2) where a company deploys a Car Broker service (CB) that offers a car sale package. To handle the users’ request, the CB service may outsource from other Web services. Examples of outsourced services include Car Dealer (CD), Lemon Check (LC), FInancing (FI), Credit History (CH), and INsurance (IN). These services are provided by different vendors. Moreover, it is expected that a large number of vendors would compete with each other to provide the same (or similar) functionality. Y ¢Ye ]"£`TL I

y yu T

y / y s z x y s r ¤ p y / ys z x y v ys ¡ v z u r x m | z ru w | v raw s x z u rao z x z | s y v n r u v yr n y s m | z r u \y w y v | s y v n r u v yr n y s m | z r u /u y Km p/rn ys m | z ru r | z q z x m | z ru/r n ys m | z ru

i e7`Tk I H7eTUaL N4L P Q/L I RTS UTL V i l/W

m v } ~rs \s z z u \y x ras as z z u \y x rs J/I S RTS e g"_/S `^ Z I b N4L P Q/L7I R S UTL V J_W

v yu /y s z x y {Kw r | y K GH IXYI Z\[ L I v x y aw o y Ty v | js z y z | y v | \s y N4L P Q/L I RTS UTL V GXYW s y x y z y n y x z mo jq q y z s v x m o x w o m | yam p /y u | v n y x m o Yq q y s v m v } ~ar s s r o y /K yx } / m v } ~ar s Tm p z u z v | rs p mn n o p ~r s ~ z u m u x z u Y

z u v w s mu x y{Kw r | y Y

m nn o p q ras t u v w s mau x y

n s z x y {w r | y G H IKJ/L7H M L I NOL P#Q/L I RTS UTL V GJ/W

n s r o y /Y y x } caL\d#Z e#G f LTU7[ NOL7P Q/L I RTS UTL V cTGW

n m p Kyu | \mo x w o m | rs q z u m u x z u {Kw r | y h7S e7H eTUTS e g y | Tm p z u z v | rs p n m p z u z v | r s p G I L ] S ^K_/S `a^ Z I b N4L7P#Q/L I RTS UTL y | w o z x 7y x rs vn w o z x \y x rs v 4 N L P#Q/L I R S UaL V hji W V G_/W ra z v | r s p y | r z v | r s p u r | z q p \ rw | \w v | rKys \s y z |

s yx y z y jw v | r /yas js y z |

Fig. 1.2 Car Brokerage Application

A customer accesses a CB service to buy a car having a specific make, model, year and mileage. Since car buying involves a number of functionalities, a series of invocations of the above mentioned services would need to take place. The selection of a service by CB at each invocation step can be done in two ways. In a use mode where the reputation of the component Web service is not considered, the customer would start by invoking CB’s sendMePriceQuote operation to get a price quote (step (1)). To get a quote, the CB would transparently interact with a car dealer via CD’s priceQuote operation (step (1.1)). If interested in a used car, the customer would check its history report by invoking CB’s askForProblemCheck operation (step (2)). This operation is processed by outsourcing from LC’s problemCheck operation (step (2.1)). The customer would then apply for financing by invoking the operation applyForFinancing provided by CB (step (3)). Before accepting a financing plan, CB would check the customer’s credit by invoking CH’s payingHistory operation (step (3.1)). If the credit is positive, CB would invoke the financingQuote operation offered by the financing service (step (3.2)). The customer would finally request an insurance quote through CB’s insuranceQuote operation (step (4)). CB would transparently invoke the operation applyforInsurance offered by the insurance ser-

1.3 Example Scenario

11

vice (step (4.1)). This service would outsource from DH’s drivingRecord operation before issuing insurance quotes (step (4.2)). In essence, no guarantees about the delivery of the required functionality could me made before the actual interaction. This implies that any of the services that CB outsources may exhibit irrational behavior. The overall quality of Web service that the customer perceives depends on the behavior of the individual services invoked while answering his request. From the customers’ perspective, the use case described in Figure 1.2 is obviously far from optimal. It does not provide customers the flexibility to make a reputation-based selection of car brokers. Similarly, since the reputation of component services is not considered, any defaulting service may in turn lower the quality of service delivered by CB. In fact, in such a scenario, it is also difficult for a service provider to make a reputation-based composition that results in a CB with the “best” possible quality from the perspective of that provider. Ãº\¨9Ä ¬ §7¦ á/â ã äTå ä æaç èé

¥Á7¦/Æ¦ \¼a§\¦ ¯A§\±"²§\¦ ³7© ´T§

¸7§\¹97º"¥»\§7´7¼ ¯A§\±"²§\¦ ³7© ´a§ µ ¸\¥ ¶· ¸ §j¹ \º"¥»\§\´ ¼ ¯°§7±9²§7¦ ³7© ´T§ µ ¸ ¥É\· ¸7§\¹ \º"¥»7§\´ ¼ ¯°§j±"²§\¦ ³ © ´T§ µ ¸7¥Ê7· ö èaêYðäðäaâð êYî÷âèç ã ñ ï ø ÛÎ Ü Ý Î Ï Ò × Ô Î Þ Ï Ó ß à Ì Í Î Ï ÐÑ Ò Ó Ô Õ Ö ×Ó Ñ Ø Ù Ö Í Ó Ù Ô ×Ñ Ú

¥Á7¦/Æ¦ \¼T§7¦ ¯°§\±9²§\¦ ³7© ´T§

¿ º7¬ À\¦ Á\º7´T§ ¯A§\±9²/§\¦ ³ © ´T§ µ ¿ Â¶T· ¿ º7¬ À7¦ Á7º\´T§ ¯°§j±"²§\¦ ³ © ´T§ µ ¿ ÂÉj· ê\ê\ëTì äaðí â éðîaî ì ëTâaâç ã ñ ï è

¥Á7¦/Æ¦ \¼T§7¦ ¯°§\±"²§\¦ ³7© ´T§

¥Á\¦KÆ¦ j¼a§\¦ ¯A§7±"² §7¦ ³7© ´T§

\ê êjäTëaðì ò\ðè îëaâå èaç ã ñ ì ¥Á\¦K½§\Á7Å §\¦ ¯°§\±9²§7¦ ³\© ´T§ µ ¥½ ¶a·

ËY© º\Á7º\´T© º\¾ ¥ Áj¦K½§\Á7Å §\¦ ¯A§\±"²§\¦ ³7© ´ § ¥¯°¦ §\§\¨7±9© ªK²«§\© ¦ ¬³7ª © \´T¦ § ® ¯È§\±"²/§j¦ ³7© ´T§ \ê õaäTç â ðëaðâ ï îç â â ç æ ã ñ µ ËK¿ ¶a· µ ¥« ¶a· µ ¥½Éj· Ëj© º\Á7º\´T© º\¾ ¥¦ §\¨\© ªY«© ¬ª \¦ ® ¯A§\±"²§\¦ ³7© ´T§ ¥ \ Á ¦ ½ \ § Á\Å §\¦ ¯°§\±9²§\¦ ³7© ´T§ µ Ë/¿ É\· ¯°§j±"²§\¦ ³ © ´T§ µ ¥« É7· µ ¥½Ê7· ½¦ © ³7© º7¾9«© ¬aª \¦ ® ¥ ¦ §7¨\© ªY« © ¬aª \¦ ® ¯°§\±"²§7¦ ³7© ´T§ ¥Á\¦K½§\Á7Å §\¦ ¯A§\±"²§\¦ ³7© ´T§ µ ½«· ¯°§\±"²§7¦ ³7© ´T§ µ ¥«Ê · µ ¥½ Ç\· êjêYì è äóTðç ã ð ôjîç é âã ç äaã ñ ì ñ

Fig. 1.3 A Reputation-driven Composition and Selection of Car Brokers

Consider now a scenario of a reputation-based selection of car brokers (Figure 1.3). In this scenario several companies compete to provide services in the car brokerage business. Each functionality (e.g, checking customer’s credit history) may be satisfied by several alternate Web services. These services have varying reputations, that may fluctuate over time. In a reputation-aware use case (Figure 1.3), CB providers and customers are aware of the reputations of the Web services. The providers may exploit services’ reputation in composing their CBs. Similarly, customers may also select the “best” CB based on the different CBs’ individual reputation. Customers would be able to select only those CBs that have an acceptable reputation, i.e., their past interaction history is satisfactory. In this way, the best results for the outcome of the interaction can be expected. Similarly, CB can guarantee that its own reputation is not tarnished due to the incompetence of component services.

12

1 Introduction

1.4 Book Organization The remainder of the book is organized as follows: In Chapter 2, we present an overview of service-oriented environments. We then present the current model of interactions on the Service Web. We also define an ontological framework for organizing and describing Web services on the Service Web. We use the concept of community to cater for an ontological organization and description of Web services. The developed ontology, serves as a “template” for describing communities and Web services. In Chapter 3, We show how the concept of Quality of Service can be used to create reputation information. We then propose different bootstrapping models for managing the reputation of Web services. We use the concept of community defined in chapter 2 to define two methods for the bootstrapping process. The first approach relies on the cooperation among services, which computes the reputation of newcomers in a P2P manner. The second approach functions under a “super-peer” topology where the community provider is responsible for assigning the newcomer’s reputation. In Chapter 4, we present a heuristics-based model for assessing the reputation of Web services. We define different metrics to aggregate the different consumer ratings in a fair and accurate manner. We also define a Hidden Markov Model based forecasting model to assess the provider reputations for conditions where adequate number of rater feedbacks are not available. The assessment model does not rely on a centralized repository and is distributed in nature. We do not define a single “system-wide” reputation for a service provider, but each consumer is given the control to assess providers’ reputations according to its own perception and preference. We also present a model for assessing the reputation of Web services involved in a composition. We analyze and determine the impact of a component service’s reputation on the reputation of a composition and vice versa. Our techniques address the problem of reputation management in compositions from two perspectives: (i) ensuring reputation assessment accuracy of component services from the point of view of the composition orchestrator, and (ii) ensuring fairness in decreasing/increasing reputations from the point of view of component services such that no service is wrongfully blamed. In Chapter 5, we present the results of the extensive performance study for the proposed techniques. The experimental study is holistic in the sense that we have verified the accuracy of all the modules we defined under RATEWeb: reputation bootstrapping, reputation collection models, rater credibility evaluation, reputation assessment for individual and composed services, and reputation prediction using a hidden Markov model. In Chapter 6 we describe the major techniques proposed for managing the reputation of service providers that are most closely related to research presented in this book. In Chapter 7, we provide some concluding remarks and discuss directions for future research.

Chapter 2

Service-Oriented Environments

The semantic organization and description of Web services is an important requirement for enabling their automatic selection and composition in service-oriented environments (SOEs). A plethora of Web services are expected to compete in offering similar functionalities in SOEs, or the new service Web in general [153]. The continuously changing and large number of Web services, with service providers potentially located in different parts of the world calls for techniques to organize Web services in a manner that can be efficiently understood and outsourced. In this chapter, we present a semantics-centered framework for organizing and describing Web services. We introduce the concept of community to cater for an ontological organization and description of Web services. We develop an ontology, called community ontology, that serves as a template for describing communities. A Service-Oriented Environment (SOE) is one where participants use the provided infrastructure and technology to create and capture value through interactions. In the following, we define the main components of an SOE namely participants, infrastructure & technology, and interactions. We only provide a brief overview of the components in relation to this work, and do not delve in their detailed description. • Participants: An SOE participant can take on one of two roles: (i)consumer, and (ii)provider. The technical representation of these roles may differ from one SOE to the other. For instance, participants can be human users, agents, or Web services. • Infrastructure & Technology: SOE infrastructure mainly refers to the network communication model adopted by the participants. The client-server model is one of the earliest and most commonly used models adopted in e-commerce. However, with the advent of technology, other network models as peer-to-peer, mobile, etc. are being adopted. The Service-oriented Architecture (SOA) paradigm is driving this change, and as mentioned earlier, the Web services technology is at the center of this change. • Interactions: Any type of business activity such as buying, selling, delivery, marketing, information sharing, etc. may be termed as an interaction. The important thing to note is that the interaction should create some value. Moreover, every Z. Malik and A. Bouguettaya, Trust Management for Service-Oriented Environments, DOI 10.1007/978-1-4419-0310-5_2, © Springer Science + Business Media, LLC 2009

13

14

2 Service-Oriented Environments

interaction has several attributes that distinguish it from the other, e.g., context, quality, etc.

2.1 SOE Interaction Model In this section, we present a model of interactions for the SOE. We first enumerate the key components, i.e., participants, of our model. We then show how these components are related to each other and the modes of interaction between them. Finally, we discuss how services interact in the proposed model.

Fig. 2.1 The Service Web Model

2.1.1 Entities and Interactions Typical interactions on the service Web involve four types of entities: (i) Web services, (ii) service providers, (iii) service registries, and (iv) service consumers. • Web services A Web service is a software application identified by a URI (Uniform Resource Identifier), whose interface and binding are defined, described and discovered by XML artifacts, and supports direct interaction with other software applications using XML messages via Internet-based protocols [138]. Conceptually, a Web service may be viewed as a set of operations, where each operation is a “pro-

2.1 SOE Interaction Model

15

cessing unit” that consumes input values (called its parameters) and generates output values called the result of that operation’s invocation. For the sake of focus and clarity, we assume only a single operation per service. The reputation of the operation or the Web service, thus refer to the same thing in our model. We define Web services as business functionalities that are: – Programmatically accessible: Web services are mainly designed to be invoked by other Web services and applications. They are distributed over the Web and accessible via widely deployed protocols such as HTTP and SMTP. Web services must describe their capabilities to other services including their operations, input and output messages, and the way they can be invoked. – Loosely coupled: Communication among Web services is document-based. Web services generally communicate with each other by exchanging XML documents. The use of a document-based communication model caters for loosely coupled relationships among Web services. • Service providers The service provider is the entity that provides the service, i.e., makes it available to consumers. A service provider may be a business, a government agency, an academic institution, etc. A provider may provide one or more services. A service is provided by a single provider. Providers have publicly known identities. The provider owns the service. It may or may not actually manage the service. For example, the provider of a service may outsource the task of actually operating the service to a third party. Service consumers may or may not be able to discern all the parties involved in delivering a given service. In our model, we do not make a distinction between the service provider and the provided Web service. Thus, when we talk about a service provider, it is the Web service that is actually provided. The terms service provider, provider Web service and provider are synonymous in our model. A service provider’s goal is to increase/maintain its client base. Service providers are expected to behave rationally to achieve their goals. However, since the presence of malicious entities cannot be discounted (similar to the “real world”), service providers may change their behavior dynamically. In other words, service providers may behave honestly in one time instance, and in the next they may act under a malicious motive. • Service registries A service registry is a searchable directory that contains a collection of descriptions of Web services. A service registry has two components: a repository of service descriptions and a registry engine that answers the requests sent to the registry by service providers and service consumers. A service registry may be private or public. Any provider may advertise its capabilities by publishing the Web service in a public registry. A private registry may be used only by a limited, known set of providers to publish services. We focus on the use of public registries in our proposed model. Moreover, we assume no limit on the number of registries. In our model, service registries are only used to locate prospective service providers, and the registries do not store any reputation related information.

16

2 Service-Oriented Environments

• Service consumers A service consumer is any entity that invokes a Web service, e.g., an intelligent agent, a Web application, or another Web service. A human user may also invoke a Web service, but we assume that each user is represented by a software component (defined: proxy) in the system. The proxy is thus responsible for all user communication, and managing the functional and non-functional requirements of the user. How this is achieved is not the focus of our work. We assume that the user can generate a proxy in one of two ways: (i) implement a custom proxy using a template, or (ii) download a proxy. In privacy-sensitive cases, users may prefer the first technique while the latter provides ease of use. The template or the actual proxy can be obtained through a registry or portal defined by providers that are generally groups of government agencies, non-profit organizations, and businesses that share a common domain of interest. We refer to such providers as “community providers” in our model. Details follow in Section 2.2. We believe the proxy assumption is reasonable as environments that require minimal human intervention (e.g. the Semantic Web [16, 46]) would necessitate the use of such proxies [99]. Without loss of generality, we will assume a symmetric interaction model where typical interactions involve two Web services: one that provides some functionality and another one, the service consumer, that invokes the first one to request that functionality. We also use the terms consumer and client interchangeably to refer to a service consumer. Three major standardization initiatives have been submitted to the W3C consortium to support interactions among Web services (Figure 2.1): • WSDL (Web Services Description Language): WSDL is an XML-based language for describing operational features of Web services. WSDL descriptions are composed of interface and implementation definitions. The interface is an abstract and reusable service definition that can be referenced by multiple implementations. The implementation describes how the interface is implemented by a given service provider. • UDDI (Universal Description, Discovery and Integration): UDDI defines a programmatic interface for publishing (publication API) and discovering (inquiry API) Web services. The core component of UDDI is the business registry, an XML repository where businesses advertise services so that other businesses can find them. Conceptually, the information provided in a UDDI business registration consists of white pages (contact information), yellow pages (industrial categorization), and green pages (technical information about services). • SOAP (Simple Object Access Protocol): SOAP is a lightweight messaging framework for exchanging XML formatted data among Web services. SOAP can be used with a variety of transport protocols such as HTTP, SMTP, and FTP. A SOAP message has a very simple structure: an XML element (called envelope) with two child elements. The first element, the header includes features such as security and transactions. The second element, the body includes the actual exchanged data.

2.1 SOE Interaction Model

17

The emergence of tools to describe, advertise, and invoke Web services facilitates the development of Web service-based solutions. However, the use of a tagged language such as XML increases the volume of information to be exchanged among Web services. This might overload the network in presence of a large number of services, hence penalizing the scalability of the Web service approach. Additionally, SOAP defines only simple data types. Using complex data types may require the XML parser to get the corresponding XML Schema definitions from remote locations. This might add an overhead for processing SOAP messages. The registry presents another scalability issue. A centralized registry might result in a single point of failure and bottleneck for accessing and publishing Web services. A distributed registry would cater for a more reliable and scalable solution. However, this incurs an additional overhead of managing distributed repositories. An intermediary solution is adopted in UDDI where the registry is physically replicated over multiple nodes. This solution solves the problem of centralized registry. However, it still requires the nodes to exchange data with each other to maintain registry consistency. To make a service available to consumers, a provider publishes the service in a service registry. Service publication is an activity in which the provider of a Web service advertises the capabilities of the service by publishing the service’s description to a service registry. This description specifies information such as the identity of the provider, the service’s address (i.e., URI), its operations, and the number, names, order, and types of each operation’s parameters. Each service provider may publish the service in one or more registries with one or more identities. A service may be published in the same registry more than once with different identities. We consider two instances of the same service with different identities as two distinct services. Service consumers access service registries to discover services. Service discovery is an activity that aims at finding one or more services that may be used to achieve a given functionality. A basic form of service discovery occurs when a client submits a discovery request to a single registry to search for one or more services that may deliver some functionality. The registry engine searches its service directory for one or more services that may be invoked to answer the client’s request. It then sends the description of these services to the client. We assume that registries are neutral, i.e., have an impartial policy vis-`a-vis the providers of the different services. Typically, the directory of a service registry may contain several services that all may achieve a given functionality. Under the assumption of neutrality, when a registry receives a request from a client, it answers with the list of all the potential services that may answer the client’s request. Clients, therefore, must be able to select one among several “equivalent” services. Service selection is a process that determines the “best” service to invoke in a given context. In the proposed model, clients are autonomous with regard to service selection. They may use different criteria and may have different service ranking schemes. In some cases, a client may not find any single Web service that may provide a given functionality. It may then initiate a service composition process. Service composition may be defined as the process of “combining” two or more services (called components) from different providers to produce a value-added composite

18

2 Service-Oriented Environments

service. Service composition may be recursive: a composite services may result from the composition of other atomic (i.e., non-composite) or composite services. Service composition is carried out by entities called service composers. We consider the composer of a composite service as the provider of that service. In our model, service composers are themselves considered as Web services. Service consumers invoke a Web service through one of its operations. The consumer provides appropriate values for the operations parameters and the service returns an output value as a result of the invocation.

2.2 Service Interactions: Extension through Ontologies In this section we show how RATEWeb extends the existing service interaction model. We propose an ontology-based approach for organizing Web services. Ontologies are poised to play a central role to empower Web services with semantics. They are increasingly viewed as key to enabling semantics-driven data access and processing [25]. We introduce the concept of community to cater for an ontological organization and description of Web services. A community is a “container” that clumps together Web services related to a specific area of interest (e.g., auto makers, car dealers, etc). All Web services that belong to a given community share the same area of interest. Communities provide descriptions of desired services (e.g., providing interfaces for INsurance services in our running example) without referring to any actual service. We develop an ontology, called community ontology, that serves as a template for describing communities and Web services. A community ontology is a metadata (domain [147, 117]) ontology which provides concepts that allow the description of other concepts (communities and Web services in our case). This domain ontology “also describes concept relationships in the application domain, and facilitates the semantic markups on the domain-specific aspects of Web services such as service categories, semantic types of parameters, etc” [144]. Figure 2.2 outlines the process of creating a community and registering Web services with it. Communities are defined by community providers as instances of the community ontology. Community providers are generally groups of government agencies, non-profit organizations, and businesses that share a common domain of interest. Additional responsibilities of a community provider may include defining a reputation policy that: (i) sets a reputation threshold for members to maintain, (ii) sets rules applicable when a member’s reputation goes below the specified threshold, e.g., dissemination within the community of its low reputation, temporary suspension of its membership, and (iii) defining reputation requirements for new members.

2.2 Service Interactions: Extension through Ontologies

t| u9v`o u vwp q }9x9l9yoq z {

! 9*w !96

^`_ a-acbd e f g h#d*f _ i _j8g %" %"

! 498K $ t|u4vwo u vwp q }9x9l9yoq z {

19

«¹¬* ®8¯ ° ¯ ± ²³ ´²µ ± ¶4·M«K´º

k-l4mnl9o p q r@l9s Ä-ÅÆ » ¼½@¾¿ À"¿ Á ÂÃ | n o l9u o ppq }9q rl9l o Ä-ÅÇ » ¼!½%¾¿ À"¿ Á ÂÃ | n o l9u o pp q }q r@l4l o

IKJ8LML#N O P Q RSI6U

@ "%%

«$¬®8¯ °4¯± ²³ ´²µ ± ¶4·-«K´ ¸

@ %%@ % @ -Ä Å8È » ¼½%¾¿ À%¿ Á Â!Ã

IKJ8LML#N O P Q RSIT

@ "%%

~!%% """ VW8X Y9P Z W ¥! ! ¡ K¢ £ 9¤! [ 8W \8P ]!Q X R

¥!9 ¨8@! © ¤ £ª6 ¥ ~"% "%

Ä-Å*É » ¼½@¾¿ À@¿ Á ÂÃ Ä-Å*Ê » ¼½@¾¿ À%¿ Á Â!Ã

| no l4u o pp q }9q rl9l o | n o l9u o p pq }q r!l4l o

! ¡94¦ § 9 | no l4u o pp q }9q rl9l o

Fig. 2.2 Community Creation and Service Registration

2.2.1 Definition of Communities A community C i is formally defined by a tuple (Identifieri , Categoryi , G-operationi, Membersi ). The Identifieri clause contains a unique name and a text description that summarizes Ci ’s features. Categoryi describes the area of interest of the community. All Web services that belong to Ci have the same category as C i ’s. Ci is accessible via a set of operations called generic operations. Those are specified in the G-operationi clause. Generic operations are “abstract” operations that summarize the major functions needed by C i ’s members. Community providers define generic operations based on their expertise on the corresponding area of interest that is, C i ’s category. The term “abstract” means that no implementation is provided for generic operations. Community providers only define an interface for each generic operation opik . This interface could subsequently be used and implemented by community members (i.e., actual Web services) interested in offering op ik . We say that those members support or import opik . The execution of opik hence refers to the execution of an actual operation offered by a member that supports opik . The Membersi clause refers to the list of C i ’s members. By being members of C i , Web service providers promise that they will be supporting one or several of C i ’s generic operations. We use OWL-S [137] for describing the proposed ontology. Category The category of a community Ci is formally defined by a tuple (Domaini, Synonymsi, Specializationi, Overlappingi). Domaini gives the area of interest of the community

20

2 Service-Oriented Environments

(e.g., “financing”). It takes its value from a taxonomy for domain names. For flexibility purposes, different communities may adopt different taxonomies to specify their category. We use XML namespaces to prefix categories with the taxonomy in which they are defined. Simply put, XML namespaces provide a method for qualifying element and attribute names used in XML documents by associating them with URI references. Synonymsi contains a set of alternative domain names for Ci . For example, “car” is a synonym of “automobile”. Values assigned to this attribute are taken from the same taxonomy as the one used for domains. Specialization i is a set of characteristics of the Ci ’s domain. For example, “car” and “quote” are specialization of “financing”. This means that Ci provides finance quote services for cars. Communities are generally not independent. They are linked to each other via inter-ontology relationships. These relationships are specified in the Overlapping i attribute. Overlappingi contains the list of categories that overlap with Ci ’s category. It is used to provide a peer-to-peer topology for connecting communities with related categories. We say that categoryi overlaps with category j if composing Ci ’s operations with C j ’s is “meaningful”. By meaningful, we mean that the composition provides a value-added benefit (in terms of categories). For example, an operation that belongs to a community whose domain is office workers may be composed with another operation that belong to a community whose domain is insurance. This would enable providing car insurance for office groups. It should be noted that it is the responsibility of the community providers to identify related categories and assign them to the overlapping attribute. Generic Operations A generic operation is defined by a set of functional and non-functional attributes. Functional attributes describe syntactic and semantic features of generic operations. Syntactic attributes represent the structure of a generic operation. An example of syntactic attribute is the list of input and output parameters that define the operation’s messages. Semantic attributes refer to the meaning of the operation or its messages. We consider two types of semantic attributes: static and dynamic semantic attributes. Static semantic attributes (or simply static attributes) describe noncomputational features of generic operations. Those are semantic attributes that are generally independent of the execution of the operation. An example of static attribute is the operation’s category. Dynamic semantic attributes (or simply dynamic attributes) describe computational features of generic operations. They generally refer to the way and constraints under which the operation is executed. An example of dynamic attribute is the business logic of the operation i.e., the results returned by the operation given certain parameters and conditions. Non-functional attributes, also called qualitative attributes, include a set of metrics that measure the quality of the operation. Examples of such attributes include time, availability, security, cost, etc. Two service providers that support the same generic operation may have different values for their qualitative attributes. Non-

2.2 Service Interactions: Extension through Ontologies

21

functional attributes model the competitive advantage that competitors (i.e., Web services that support the same generic operation) may have on each other. While defining a community, community providers assign values to part of the attributes of their generic operations. The rest of the attributes are assigned either by service providers or third parties during the registration of Web services with Ci . For example, the types of input and output messages (e.g., purchase order, registration confirmation) are defined by community providers. The cost (dollar amount) of executing an operation is service-specific and hence defined by the service provider. The other qualitative attributes (e.g., response time, availability) may be assigned by the consumers, underlying infrastructure, or third parties (e.g., trusted parties, monitoring agencies). The way those parties determine the values to be assigned (e.g., through monitoring) is out of the scope of this book. It is worth noting that the values of some attributes may be assigned by both community and service providers. For example, the content of an input and output message is given by community providers. However, service providers may modify this content by adding and/or removing parameters to input and output messages. Community Members Service providers can, at any time, select a community of interest (based on categories) and register their services with it. We say that those services are members of that community. The registration process requires giving an identifier (WS-ID), name, and description for the Web service. The identifier takes the form of a unique UUID. The description summarizes the main features of the Web service. Service providers specify the list of generic operations supported by their services through the imported attribute. We define three constructs for importing generic operations: projection, extension, and adjustment. The projection and extension constructs allow the addition and deletion of message parameters respectively. Adjustment enables the modification of the content of operation attributes. The invocation of an imported operation is translated into the invocation of an “actual” service operation. The correspondence between imported and actual operations is done through the mapping attribute. For each imported operation, the provider gives the ID of the corresponding actual operation. It also defines a one-to-one mapping between the imported operation’s parameters and actual operation’s parameters. Defining mappings between parameters enables the support of “legacy” Web services. Providers do not need to change the message parameters in their actual service codes. Assume that a service provider SP offers a given operation op. The following three cases are then possible: (i) If there is a community Ci that contains a generic operation opik similar to op, SP would import opik “as is”; (ii) If there is a community Ci that contains a generic operation opik “closely” similar to op (e.g., op has less input parameters than defined in opik ), SP would import opik using projection, extension, and/or adjustment technique(s); (iii) If no community has an operation similar or “closely” similar to op, SP would define a new community C j that has op as a generic operation and SP’s service as a member. The latter case is similar to the

22

2 Service-Oriented Environments

traditional WSDL/UDDI/SOAP Web service model where service providers create descriptions for their services. The difference is that, in our case, SP instantiates the attributes and concepts of the community ontology while in the traditional model, providers define their service descriptions from scratch.

2.2.2 Operational Description of Communities As mentioned previously, a generic operation is described at four different levels: syntactic, static semantic, dynamic semantic and qualitative levels. In this section, we give a detailed description of generic operation attributes for each of those levels. 2.2.2.1 Syntactic Attributes We define two levels for syntactically describing a generic operation: message and operation levels. Attributes at the message level describe the structure of the messages defined within the operation such as the number of parameters within a message. Attributes at the operation level describe general-purpose features of the generic operation such as the name and ID of the operation. Message Syntax Generic operations have input and output messages. Each input or output message contains one or more parameters defined by their names. The name of a parameter is unique within a given message. Let us for example consider a generic operation checkEligibility which checks consumer’s eligibility for a financing option. The input message of this operation contains income, familySize, and zipCode as parameter names. The output message of checkEligibility has approved and duration as parameter names. Although message parameters are pre-defined by community providers, service providers have the ability to add new parameters or remove pre-defined ones. Therefore, the number of parameters within a messages may be changed by service providers. We define two sets In(opik ) and Out(opik ) for each generic operation opik . In(opik ) and Out(opik ) contain the list of input parameters’ and output parameters’ names of opik respectively. We also define two attributes NIi k and NOi k that give the number of input and output parameters in opik respectively. For example, in the case where opik = checkEligibility, In(opik ) = {income, f amilySize, zipCode}, NIi k = 3, Out(opik ) = {approved, duration}, and NOi k = 2.

2.2 Service Interactions: Extension through Ontologies

23

Operation Syntax A generic operation has a unique identifier, called G-op-ID, that takes the form of a Universally Unique ID (UUID). A UUID is an identifier that is unique across both space and time. The operation has also a name and a text description that summarizes the operation’s features. The binding defines the message formats and protocols used to interact with the operation. An operation may be accessible using several bindings such as SOAP/HTTP and SOAP/MIME. The binding of an operation is assigned by the service provider. This is in contrast to the rest of syntactic attributes whose values are pre-defined by community providers. Indeed, the binding attribute is dependent on the way the generic operation is implemented at the service provider side. A provider may offer SOAP/HTTP access to a generic operation supported by its Web service while another provider may prefer to use SOAP/MIME for the same operation. The mode of an operation refers to the order according to which its input and output messages are sent and received. It states whether the operation initiates interactions or simply replies to invocations from other services. We define two operation modes: In/Out or Out/In. One of these values is assigned by community providers to each operation. In/Out operation first receives an input message from a client, process it (locally or forward it to another service), and then returns an output message to the client. Out/In first sends an output message to a server and receives an input message as a result. checkEligibility is an example of In/Out operation. As specified in the WSDL standard, some operations may be limited to an input or an output message. For example, expirationINS is an operation that automatically notifies consumers about the termination of their insurance. 2.2.2.2 Static Semantic Attributes The static semantics of a generic operation describe semantic properties that are independent of the execution of the operation. It specifies the semantics of the operation itself (e.g., what does the operation do) as well as the semantics of input and output messages defined within the operation. Message Semantics Messages must be semantically described so that they can be “correctly” interpreted by service providers and consumers. For that purpose, we associate a message type MT to each message. MT gives the general semantics of the message. For example, a message may represent a “purchase order” or an “invoice”. Vertical ontologies are the ideal concept to describe the type of message. An example of such ontology is RosettaNet’s PIPs (Partner Interface Processes). The message type does not capture the semantics of message parameters. We define the following attributes to model the semantics of message parameters: data

24

2 Service-Oriented Environments

type, business role, unit, and language. The data type gives the range of values that may be assigned to the parameter. We use XML Schema’s built-in data types as the typing system. Built-in types are pre-defined in the XML schema specification. They can be either primitive or derived. Unlike primitive types (e.g., string, decimal), derived types are defined in terms of other types. For example, integer is derived from the decimal primitive type. The business role gives the type of information conveyed by the message parameter. For example, an address parameter may refer to the first (street address and unit number) or second (city and zip code) line of an address. Another example is that of a price parameter. It may represent a total price or price without taxes. Business roles take their values from a pre-defined taxonomy. Every parameter would have a well-defined meaning according to that taxonomy. An example of such taxonomy is RosettaNet’s business dictionary. It contains a common vocabulary that can be used to describe business properties. For example, if the price parameter has an “extendedPrice” role (defined in RosettaNet), then it represents a “total price for a product quantity”. For flexibility purposes, different community providers may adopt different taxonomies to specify their parameters’ business roles. As for categories, we use XML namespaces to prefix business roles with the taxonomy according to which they are defined. The unit attribute refers to the measurement unit in which the parameter’s content is provided. For example, a weight parameter may be expressed in “Kilograms” or “Pounds”. A price parameter may be in “US Dollars”, “Canadian Dollars”, or “Euro”. A time period parameter may be specified in days, weeks, or months. We use standard measurement units (length, area, weight, money code, etc.) to assign values to parameters’ units. If a parameter does not have a unit (e.g., address), its unit is equal to “none”. The content of a message parameter may be specified in different languages. For example, a profession parameter may be expressed in English or Spanish. An English-Urdu-translation operation takes as input, an English word (input parameter) and returns as output, its translation in Urdu (output parameter). We adopt the standard taxonomy for languages to specify the value of this attribute. The content of static semantic attributes is assigned by community providers. The data type, unit, and language attributes may be changed by service providers. This is in contrast to the message type and business role which model the core of the message semantics and hence cannot be altered. Service providers have the flexibility to support a data type, unit, or language different from those specified by community providers. For example, a service provider may decide to support a weight parameter in “Kilograms” although the community providers specified “Pounds” as the measurement unit for this parameter. Operation Semantics The static semantic of an operation is defined by the following attributes: serviceability, provider type, consumer type, purpose, and category. These attributes model

2.2 Service Interactions: Extension through Ontologies

25

the core of the operation’s semantics. Hence, they are exclusively assigned by community providers. The serviceability attribute gives the type of assistance provided by the operation. Examples of values for this attribute are “cash”, “in-kind”, “informational”, and “educational”. In e-government, TANF (Temporary Assistance for Needy Families) is an example of welfare program that provides financial support to needy families. A food stamp is an example of in-kind support available to indigent citizens. Returning the list of senior activity centers is an example of informational support. Enhancing communication skills of visually impaired people is an example of educational support. Other types of support may be mentioned by assigning the value “other” to this attribute. A generic operation may be supported via one or several provider types. A provider may be governmental (“federal”, “state”, “local”, and “tribal”) or non- governmental (“non-profit” and “business”) agencies. Each generic operation performs a certain functionality for a specific area of interest. This is specified through the purpose and category attributes respectively. An operation inherits the category of the community in which it is defined. Hence, all operations that belong to the same community share the same category. The purpose attribute describes the goal of the operation. It is defined by four attributes: function, synonyms, specialization, and overlapping. The function describes the business functionality offered by the operation. Examples of functions are “eligibility”, “registration”, and “mentoring”. Synonyms and specialization attributes work as they do for categories. Overlapping contains the list of purposes that overlap with the purpose of the current operation. Let opik and op jl be two generic operations. We say that purposei k overlaps with purpose j l if composing opik with op jl is “meaningful”. By meaningful, we mean that the composition provides a value-added benefit (in terms of purposes). As for categories, it is the responsibility of the community providers to identify related purposes and assign them to the overlapping attribute. 2.2.2.3 Dynamic Semantics Dynamic semantics allow the description of attributes related to the execution of generic operations. Those attributes may relate the execution of an operation op ik to the execution of other operations (inter-operation attributes) or describe features inherent to the execution of opik (intra-operation attributes). Inter-operation attributes define the execution order of opik with respect to other operations. We identify two inter-operation attributes: pre-operation and post-operation which give the list of operation whose execution precedes and follows opik ’s execution respectively. Intraoperation attribute, also called behavior defines the internal business logic of op ik . The definition of the aforementioned attributes is based on the notion of execution state described below.

26

2 Service-Oriented Environments

Operation Execution States The execution of an operation opik generally goes through four major observable states: Ready, Start, Active, and End. The execution of opik is in the Ready state if the request for executing opik has not been made yet. The Start state means that opik execution has been initiated. It refers to one of the following events: (i) an input message is sent to opik if opik ’s mode is In/Out; or (ii) an output message has been sent from opik if opik ’s mode is Out/In. We say that opik is in the Active state if opik has already been initiated and the corresponding request is being processed. After processing the request, the operation reaches the End state during which results are returned. It refers to one of the following events: (i) an output message is sent to the client if opik ’s mode is In/Out; or (ii) an input message is received from the server if opik ’s mode is Out/In. We define a precedence relationship between states, noted −→t , as follows: S1 −→t S2 if S1 occurs before S2. The execution states are totally ordered according to −→t as follows: Ready −→t Start −→t Active −→t End Pre-Operations Executing a Web service operation may require going through a pre-defined process that involves the execution of several operations called pre-operations. This pre-defined process is dictated by government regulations or the internal business process of the Web service. For example, citizens must first register with an Insurance company via checkRegistration operation before obtaining a vehicle. They may also reflect the business logic of the Web service. Let us consider two generic operations opik and op jl that belong to the same or different communities. We say that opik is a pre-operation of op jl if the invocation of op jl is preceded by the execution of opik . We call opik and op jl source and target operations respectively. An operation may have several pre-operations. It may also be the source (i.e., pre-operation) of several operations. We give below a formal definition of the pre-operation relationship. Definition 2.1 - Pre-operation. Let opik and op jl be two generic operations. opik is a pre-operation of op jl if End(opik ) −→t Ready(op jl ).

The definition of a pre-operation relationship includes a source operation op ik , target operation op jl , and the condition and mandatory attributes. The condition is a predicate over opik ’s input and output parameters. op jl can be invoked only if all its pre-operations reached their End state and their conditions are true. If no condition is specified for a given pre-operation then the default value is “true”. The mandatory attribute takes boolean values and specifies whether executing the source operation is mandatory or optional. If this attribute is true then the relationship between op ik and op jl is obligatory. Otherwise, it is recommended.

2.2 Service Interactions: Extension through Ontologies

27

Post-Operations The execution of a given operation may trigger the invocation of other operations called post-operations. For example, a citizen that registers successfully for a car license tag (registerCar) is required to register the car with his/her county of residence (registerCounty). We say that opik is a post-operation of op jl if the termination of op jl precedes the invocation of opik . We call op jl and opik source and target operations respectively. An operation may have several post-operations. It may also be the target (i.e., post-operation) of several operations. Note that if op ik is a pre-operation of op jl , then op jl is not necessarily a post-operation of opik . Definition 2.2 - Post-operation. Let opik and op jl be two generic operations. opik is a post-operation of op jl if End(op jl ) −→t Ready(opik ).

As for pre-operations, we associate a condition and mandatory attribute to each post-operation relationship. A target operation enters the initiation state if at least one of its source operations has reached its End state and the corresponding condition is true. A post-operation may also be mandatory or optional. 2.2.2.4 Qualitative Properties Multiple Web services that belong to the same community may import the same generic operation. It is hence important to define a set attributes that help select the best Web service supporting a given functionality. For this purpose, we define a Quality of Operation (QoP) model based on a set of qualitative attributes that are transversal to all operations such as the cost and response time. The international quality standard ISO 8402 describes quality as “the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs” [99]. We define QoP as a set of non-functional attributes that may impact the quality of the operations imported by a Web service. There are many QoP attributes important to Web services operations. We organize them into three groups of quantifiable attributes based on type of measurement performed by each attribute: run-time, business, and security. Note that this list is not exhaustive by any means, and is presented only for explanatory purposes. Run-time Attributes - These attributes enable the measurement of properties that are related to the execution of an operation opik ). We identify three run-time attributes: response time, reliability, and availability. The response time measures the expected delay in seconds between the moment when op ik ) enters the Start state (i.e., opik ) is initiated) and reaches the End state (i.e., opik ) gets or sends the results). Time(opik )) is computed using the expression Time p rocess(opik ) + Timer esults(opik ). This means that the response time includes the time to process the operation (Time p rocess) and the time to transmit or receive the results (Timer esults). The reliability of opik ) is the ability of the operation to be executed within the maximum expected time frame. Reliability(opik )) is computed based

28

2 Service-Oriented Environments

on historical data about previous invocations of the operation using the expression Ns uccess(opik )/Ni nvoked(opik) where Ns uccess(opik ) is the number of times that the operation has been successfully executed within maximum expected time frame and Ni nvoked(opik) is the total number of invocations. The availability is the probability that the operation is accessible. Availability(opik ) is measured by the expression UpTime(opik )/TotalTime(opik ) where UpTime is the time opik was accessible during the total measurement time TotalTime. Business Attributes - These attributes allow the assessment of an operation op ik from a business perspective. We identify two business attributes: cost, and regulatory. The cost gives the dollar amount required to execute op ik . The regulatory property is a measure of how well opik is aligned with government regulations. Regulatory(opik) is value within a range (e.g., between 1 and 10). The lowest value refer to an operation that is highly compliant with government regulations. Security Attributes - These attributes describe whether the operation op ik is compliant with security requirements. Indeed, service providers collect, store, process, and share information about millions of users who have different preferences regarding security of their information. We identify four properties related to security and privacy: encryption, authentication, non-repudiation, and confidentiality. Encryption is a boolean that indicates whether opik ’s message are securely exchanged (using encryption techniques) between servers and clients. Authentication is a boolean that states whether opik ’s consumers (users and other services) are authenticated (e.g., through passwords). Non-repudiation is a boolean that specifies whether participants (consumers and providers) can deny requesting or delivering the service after the fact. The confidentiality attribute indicates which parties are authorized to access the operation’s input and output parameters. Confidentiality(op ik) contains opik ’s input and output parameters that should not be divulged to external entities (i.e., other than the service provider). If a parameter does not belong to Confidentiality(opik), then no confidentiality constraint is specified on that parameter. Assume that confidentiality(opik) = {SSN, salary} where SSN and salary are two opik ’s input parameters. The content of this attribute states that those two parameters are kept private by opik ’s provider.

2.2.3 Interactions In RATEWeb, a community is itself a service that is created, advertised, discovered, and invoked as a regular Web service. The providers of a community assign values to the concepts of the community ontology (Figure 2.2 - step a). Each concept is defined by a set of attributes. Communities are published in a registry (e.g., UDDI) so that they can be discovered by service providers (Figure 2.2 - step b). Service providers (e.g., car broker provider) identify the community of interest (Figure 2.2 - step c) and register their services with it (Figure 2.2 - step d). During the

2.2 Service Interactions: Extension through Ontologies

29

registration of a service WS with a community Ci , the service provider specifies the concepts of Ci that are inherited by WS. For example, WS may inherit only some of the operations defined in Ci . Admitting a service to a community is subject to the admission rules specified in the community’s reputation policy. Moreover, to make a service available to consumers, a provider publishes the service in a service registry. Service publication is an activity in which the provider of a Web service advertises the capabilities of the service by publishing the service’s description to a service registry. This description specifies information such as the identity of the provider, the service’s address (i.e., URI), its operations, and the number, names, order, and types of each operation’s parameters. Each service provider may publish the service in one or more registries. A service may be published in the same registry more than once with different identities. We consider two instances of the same service with different identities as two distinct services. Similarly, a Web service may belong to different communities. For example a composite service (WS4 in Figure 2.2) may outsource operations that have different domains of interest (e.g., auto insurance and finance in our scenario). Since these operations belong to two different communities, the composite service is registered with the auto insurance and financing communities (C 1 and C2 in Figure 2.2). When a service is member of multiple communities, it implies that the service simultaneously fulfills the reputation policy of all communities. Service consumers access service registries to discover the communities and providers of their choice. The consumer’s query consists of the operations it wants to invoke. The list of operations is matched with different communities’ capabilities. It may be the case that the required operations are matched to several different communities. Each community in turn searches its directory for the list of providers that have registered their operations. It then sends the description of these services to the consumer. The registered services may be invoked to answer the consumer’s request. We assume that communities and registries are neutral, i.e., have an impartial policy vis-`a-vis the providers of the different services. The service consumer then selects the best service from the list provided. In our model, this selection is based on the reputation of each individual service from the list. We assume that when the consumer queries the community for potential providers’ list, then apart from the “normal” details, the returned description also contains a list of past service consumers that possess feedbacks for the provider being queried. The community thus only acts as a directory of raters and not as a centralized repository of ratings (ratings are kept local with the raters). The consumer may contact these peer consumers to gather the feedbacks, and in turn assess the providers’ reputations. Service consumers then invoke a Web service through one of its listed operations. The consumer provides appropriate values for the operations parameters and the service returns an output value as a result of the invocation. At the end of the interaction, the service consumer rates the provider according to some pre-defined quality attributes. The service consumer also informs the community provider that it possesses the feedback ratings for the provider. These service ratings are used to compute provider reputations accordingly. Note that RATEWeb is not dependent on the proposed community-based ratings collection model, and may be replaced with

30

2 Service-Oriented Environments

other models, as in [134, 98, 20, 111]. In the following we list the main properties of our model that are characteristic of most service-oriented environments, and hence require special solutions to the reputation assessment problem. • Private Interactions: The transaction details between a service provider and consumer are hidden from the rest of the community. Only the interacting parties can themselves report the outcome of the transaction (satisfactory vs. unsatisfactory). • No Centralized/Shared Repository: There is no shared repository where transaction outcome details can be stored. Each participant itself stores the information. • No Learning Behavior: Each Web service follows its own defection/cooperation strategy, which is geared towards providing the maximum benefit, and may change over time. The services cannot learn a new strategy from their peers. • Population Dynamics: Services may enter or leave the system at their will. The volatility of the environment implies that traditional security technologies as digital certificates, encryption, etc. prove inadequate for expressing and capturing the uncertainty of trust.

Chapter 3

Reputation Information Creation

The reputation information is generated after the completion of a transaction between two entities. As mentioned in previous chapters, the entity that provides a specific functionality in the transaction is termed as a service provider and the entity requesting that functionality is known as a service consumer in a service oriented environment. Thus, reputation information is created when a consumer rates the services provided by the service provider according to some pre-defined scale. In this chapter, we provide the details about the various facets of reputation information creation on the service Web and the challenges there in. One of the objectives of our research is to design a new approach for reputation management that avoids the drawbacks of third-party reputation systems based on a centralized reputation manager and provides a reliable assessment of a service’s behavior. In our daily lives, we evaluate the behavior of a subject over several attributes and assess the corresponding reputation according to the perceived quality of each attribute. It is the aggregate of all these quality values that determines the reputation of the subject. Similarly, in our proposed approach we view the reputation of a Web service as a reflection of its quality (QoW S).

3.1 Parameters Reflecting the Quality of Web Services The quality of service (QoS), is defined as “a set of qualities related to the collective behavior of one or more objects” [19]. In other words, QoS is “a set of quantitative and qualitative characteristics of a system necessary to achieve the required functionality of an application” [135]. We adopt this definition of QoS and extend its application to the domain of Web services with related constraints, similar to prior works as [88, 75, 93, 78, 49]. The quality of Web service (QoW S) is a mapping between a set of quality parameters and a set of values or ranges of values. There are two types of QoW S parameters: non-exact and exact parameters. Non-exact QoW S parameters are those to which only approximate and often subjective values may be assigned. Examples Z. Malik and A. Bouguettaya, Trust Management for Service-Oriented Environments, DOI 10.1007/978-1-4419-0310-5_3, © Springer Science + Business Media, LLC 2009

31

32

3 Reputation Information Creation

include security, privacy preservation, and scalability. Exact QoW S parameters are those that may be measured and to which numerical values may be assigned. Examples include a services’ response time, invocation fee, availability, accessibility, reliability, etc [75, 88]. A Web service’s response time measures the delay between the moment when a request is sent and the moment when the service is rendered. The invocation fee is the cost that a service requester pays to the service provider to use the service. A service’s availability represents the probability that a service is operating at any given moment and is available to perform its functions. Accessibility is the degree that a Web service is capable of serving a request. It may be measured by the ratio between the number of requests being sent to the Web service and the number of requests that are effectively served. Note that availability is different from accessibility; a service may be available but, due to its overload, it is not accessible to some requesters. A service’s reliability is the probability that a request is served within a maximum expected time frame defined by the Web service. The number of failures per month or year could be a good measure of reliability. In contrast to availability, reliability is defined in terms of a time interval instead of an instance in time. A highly reliable service is one that will most likely continue to work without interruption during a relatively long period of time. This is a subtle but important difference when compared to availability. If a service is down randomly for one millisecond every hour, it has an availability of over 99.9 percent, but is still highly unreliable. Similarly, a service that never fails but is shut down for two weeks every August has a high reliability but only 96 percent availability [38]. The list of QoW S parameters mentioned above is not exhaustive and some readers may have different views regarding the definition / applicability of a few parameters. In defining the RATEWeb system, we assume that mathematical values can be assigned for each quality parameter that is included in the model [88, 75, 93, 78, 49]. In the context of service-oriented environments, three types of QoW S exist: provider-promised QoW S (QoW Sp ), consumer-expected QoWS (QoWSr ), and servicedelivered QoW S (QoW Sd ). The QoW S p values are those that are advertised by the service provider through the service registry. Several models have been proposed over recent years for this purpose that range from extending the service registries with QoW S information to agent-based frameworks that use ontologies [122, 111, 73, 75, 93, 49]. QoW Sr represents the preference of the service consumer for each quality parameter. QoW Sd represents the actual values that are mapped to the different quality parameters after the consumer interacts with the provider. In other words, QoW Sd represents the consumers’ perceived or assigned values for each quality parameter. For example, in Figure 1.2 the Credit History service (CH) may advertise that it is able to provide credit reports of individuals for “last ten years” (i.e., QoW S p : credit record of last ten years). A Car Broker service (CB) may need to retrieve the credit history of an individual for only the “last seven years” (i.e., QoW Sr : credit record of last seven years). Since CH’s QoW S p offer is available along with the service description, CB can see that CH can fulfill its requirement (QoW Sr ) without actually invoking the service. Assume that when CB does interact with CH, it finds that CH only delivered the “credit record of last three years” (i.e. QoW Sd : credit record of last three years). Clearly this is unacceptable

3.1 Parameters Reflecting the Quality of Web Services

33

for CB, and it may not have interacted with CH had it known the true estimate of CH’s QoW Sd . The reputation of CH provides this estimate. Raters can provide their experiences in how much CH’s QoW S p and QoW Sd differed. If this difference is not large CH is deemed trustworthy, as it delivered what it promised. In contrast, a large difference between QoW S p and QoW Sd means CH did not deliver according to its promise, and hence it is untrustworthy. QoW Sd is an approximation of the actual quality of the parameters. Many QoW Sd parameters will depend on various factors as network traffic, communication infrastructures, etc. Consequently, different consumers may perceive the quality differently even when the provider behaves consistently for all consumers. Thus, an exact estimation of such parameters may not be a realistic assumption. This is in accordance with our real-life reputation gathering methods. For example, to evaluate a subject’s reputation, many parameters are assigned random and subjective quality values instead of exact ones. The human individual then assimilates the varied approximate experiences and assesses a reputation value. Although human intelligence is unmatched and human reputation assessment is a complex process, we intend to emulate the human ability of using approximate quality values for reputation assessment. We assume that consumers agree on the ranges, types, etc. of the values they should assign for each parameter. For instance, the ontologies proposed in [25, 93] can be used for this purpose. How different values are assigned to the QoW Sd parameters is out of the scope of our work. Our focus is only on using these values in context of reputation. Let S and T be the set of provider Web services and the set of service consumers respectively. Let Φ be the universal set of quality parameters. Φ may be represented as a p-element vector (φ1 , .., φ p ) where φk is the kth quality parameter. Each Web service s j ∈ S advertises a promised quality QoW S p (s j ) (e.g., appending the service’s QoW S p to its description in the service registry). QoW S p (s j ) assigns values or ranges of values to each quality parameter φk . When a service requester tx ∈ T invokes the service s j , each quality parameter φk in Φ gets assigned a delivered quality xj value φk (post-transaction completion). For this invocation of service s j , the vector QoW Sd (s j ,tx ) = {φ1x j , .., φ px j } is called the delivered quality of Web service. Since multiple quality parameters that are used, may have varied measuring scales, types, etc., we need to first normalize each φk for accurate quality comparisons. In the following, we show how this can be achieved. We will use the notation of φ k instead of φkx j for the kth quality parameter for clarity. Different approaches exist for normalizing attributes in decision making problems. The widely used Simple Additive Weighting method reaches ranking results very close to those of more sophisticated methods [149]. The QoW S values are scaled to lie in the interval [0, 1]. Different quality attributes have different optimal values in the given interval. For example, if cost is considered as a quality attribute, it’s optimal value is close to the minimum (0). Alternately, reliability and availability have optimal values nearing the maximum (1). We denote the quality attributes that have optimal values close to 0 by φk− and the ones that have optimal values close to 1 by φk+ . The values are then normalized as:

34

3 Reputation Information Creation

norm(φk− ) =

φkmax − φk φkmax − φkmin

(3.1)

norm(φk+ ) =

φk − φkmin φkmax − φkmin

(3.2)

where φkmax is the maximum value for the kth. quality attribute and φkmin represents the corresponding minimum value.

3.2 Reputation Bootstrapping Reputation is a social concept, and its applicability in developing communities or maintaining relationships in social networks has been thoroughly studied [109, 151, 91]. The computer science literature extends and builds upon this study of reputation to theoretical areas and practical applications, as a means to establish trust [34]. However, most of these works have focused on solutions for reputations storage, collection and aggregation, incentives-based schemes to encourage feedback reporting, etc. Little attention has been given to the bootstrapping problem, and majority of the proposed solutions assume a “running-system,” where reputations already exist [7]. We focus on reputation bootstrapping, and assume that existing solutions for other facets of reputation management mentioned above are adequate. Comprehensive literature reviews are available in [91, 141, 7]. In what follows, we give a brief overview of the existing solutions for reputation bootstrapping. Since social networks, agent-based communities, auction portals, and P2P systems employ similar and usually overlapping strategies, we provide a generalized discussion of the solutions. The approaches that consider the bootstrapping problem often adopt solutions that may not be fair to all system participants. For instance, a popular approach is based on assigning neutral or default reputation values to newly deployed participants, or newcomers [91]. This approach has the disadvantage that it can either favor existing participants or favor newcomers. If the initial reputation is set high, existing participants are disadvantaged, as the newcomer would get preference over existing participants who may have worked hard to attain their reputation. This encourages malicious providers to deploy new identities periodically for “white-washing” their bad reputation record. Thus, [156] states that “punishing,” i.e., assigning a low initial reputation is the best alternative. However, existing participants are privileged if low initial values are assigned, as a newcomer may not be able to win a consumer’s favor with its low reputation [92, 141]. To the best of our knowledge, only a couple of previous works have attempted to solve the bootstrapping problem without using a default value. For example, the endorsement principle proposed in [92] states that a participant with unknown reputation may acquire reputation through the endorsement of other trusted participants (ones with high credibility), and the endorsee’s actions directly affect the endorser’s

3.2 Reputation Bootstrapping

35

credibility. However, this technique may prove to be problematic as it would not be easy for a newcomer to get itself endorsed by an existing participant. The technique proposed in [44], aggregates all transaction information on first-time interactions with newcomers. The aggregate information enables a consumer to calculate the probability of being cheated by the next newcomer. This adapts well to the current rate of white-washing in the system. Our approach is inspired by this [44] technique. However, our approach differs from this work in that we do not assume that peer services can monitor each other’s interactions. We believe that such a simplifying assumption is unrealistic for the service Web. The expanse of the service Web and privacy considerations are major impediments in this regard. A Web service exposes an interface describing a collection of operations that are network-accessible through standardized XML messaging [106]. We propose to extend the traditional (publish-discover-access) Web service model, and introduce the concept of community to aid in the bootstrapping process. As mentioned earlier, a community is a “container” that groups Web services related to a specific area of interest (e.g., auto makers, car dealers) together. Communities provide descriptions of desired services (e.g., providing interfaces for services) without referring to any actual service. Ontologies are used to serve as templates for describing communities and Web services. An ontology typically consists of a hierarchical description of important concepts in a domain, along with descriptions of their properties. The notion of concept in ontologies is similar to the notion of class in object-oriented programming. Each concept ci has a set of properties Pi = {pi1 , ..., pim } associated with it that describe the different features of the class. An ontology relates classes to each other through ontology relationships. Examples of relationships include “subclassof”, “superclassof”. Communities are defined by community providers as instances of the community ontology (i.e., they assign values to the concepts of the ontology). Community providers are generally groups of government agencies, non-profit organizations, and businesses that share a common domain of interest. In our model, a community is itself a service that is created, advertised, discovered, and invoked as a regular Web service, so that it can be discovered by service providers. Service providers identify the community of interest and register their services with it. We use the Web Ontology Language (OWL) for describing the proposed ontology. However, other Web ontology standards could also be used. Further details on the use of ontologies for describing communities can be found in [15]. In our model, Web services in a particular domain (registered with the same community) may aid each other in assessing the initial reputation of a newcomer [87]. We propose two reputation bootstrapping approaches. The first approach relies on the cooperation among services, which computes the reputation of newcomers in a P2P manner. The second approach functions under a “super-peer” topology where the community provider is responsible for assigning the newcomer’s reputation. Details of the two approaches follow.

36

3 Reputation Information Creation

3.2.1 Option I: Adapting Initial Reputation to Majority Behavior We propose a reputation bootstrapping technique that adapts according to the behavior of majority of services. Our approach is inspired by the techniques presented in [45] for P2P systems. However, our approach differs from [45] in that we do not assume that peer services can monitor each other’s interactions. We believe that such a simplifying assumption is unrealistic for the service Web. The expanse of the service Web and privacy considerations are major impediments in this regard. Thus, unlike [45], our proposed model does not support a “shared” service interactions history, and the sharing of interaction histories is left at the discretion of the participating services. Also, we do not assume reciprocative actions for Web services (they may engage in one-off transactions). Moreover, we provide options to bootstrap the reputation of newcomers in cases where no service is willing to share its interaction history. In Figure 3.1, we provide the stepwise details of our proposed framework. A newcomer registers with a community to offer it’s services (Step 1 in Figure 3.1). After discovering the newcomer, the consumer asks the existing services for the newcomer’s reputation (Step 2 in Figure 3.1). Since no service has yet interacted with the newcomer, it has no reputation record available (Step 3 in Figure 3.1). At this point, the consumer can bootstrap the newcomer’s reputation to decide whether or not to interact with it.

Fig. 3.1 Reputation Bootstrapping Using an Adaptive Approach

Under the proposed mechanism, the consumer can bootstrap the newcomer’s reputation according to the rate of maliciousness in the community. The rate of maliciousness (denoted ℜ) is defined as the ratio of the number of transactions where the providers defect, to the total number of transactions. Thus, ℜ lies in the range [0, 1].

3.2 Reputation Bootstrapping

37

A provider’s “defection” is measured after each individual transaction, by the service consumer (denoted rater). Multiple interactions with the same provider count as different transactions. If the provider performs satisfactorily in the transaction, the rater can label the transaction as “acceptable.” Alternatively, the transaction is labeled as “defective.” Thus, defection (denoted D) can be represented as a binary. Since each provider may be rated along several quality attributes, the aggregated value of all the quality attributes can be used to estimate the value of D. For instance, if the aggregated quality value is below a certain threshold, D is true, otherwise it is false. Since service raters can differ in their total number of transactions, and the number of defective transactions experienced, we can expect a variation in the value of ℜ across different service raters. In essence, the value of ℜ would depend on each rater’s personal experience, and the manner in which it estimates D after each transaction. The basic idea of the proposed scheme is for the consumer to assign a high initial reputation value when ℜ is low, and a low reputation value when ℜ is high. This allows the consumer to adapt to the state of the system (i.e., defective vs. acceptable transactions). Formally, for each service consumer i, ℜ is defined as: ℜi =

Di Ti

(3.3)

where Di is the number of transactions where providers have defected for consumer i, and Ti is the total number of transactions that consumer i has undertaken. Note that in defining ℜ, we use a rater’s complete transaction record, instead of using only the transactions conducted with newcomers. It may be argued that ℜ should be measured as the ratio of the number of defective transactions with newcomers to the total number of transactions conducted with the newcomers. However, in our opinion this may not produce the ideal results, as it inspires white-washing. For instance, newcomers can stay honest for the first transaction, and act maliciously thereafter to gain undue advantage (even if it is only for a couple of transactions). Then they can leave the system, and join in later with a new identity. Since the first interaction is labeled satisfactory (for all newcomers), ℜ stays low in the system, ensuring a high reputation-bootstrap value for the newcomers. In contrast, estimating ℜ over all the rater transactions can discourage white-washing. Since dishonest behavior is a prerequisite of white-washing, estimating ℜ over all the rater transactions ensures that ℜ will increase with every defective transaction. This in turn brings the reputationbootstrap value for a newcomer down. Moreover, since the severity of defective transactions varies, the service rater can assign relative weights to the transactions. For example, in a “high impact” transaction where the consumer suffers a huge loss, as a consequence of the provider’s defection, the consumer may count two (or more) defective transactions instead of one (while increasing the Ti count by only one), to increase ℜi . The assignment of such weights is left at the discretion of the service rater. Obtaining Di and Ti from the raters poses a privacy risk as the total transactions volume, transactions identification (defect or not and with which provider), etc. may reveal sensitive information about the raters. Thus, in our model only ℜ i is shared

38

3 Reputation Information Creation

between the different consumers, instead of Di and Ti . We also allow the consumers to specify the time over which ℜ is to be calculated. This allows a consumer to adapt to the “current trend” of the system (depending on how current is defined). The consumer can then aggregate the different ratios collected from the raters. A simplistic solution is to compute a weighted average of all ℜi ’s. The weights are decided according to the credibility of the contributing rater, which may be calculated using techniques similar to ones presented in [91, 7, 64]. This is shown as Step 4 in Figure 3.1, where the credible raters (ones with a “C”) are consulted. In Step 5, all ℜ i ’s are aggregated and the consumer decides whether to interact with the newcomer in Step 6a. If sufficient ratings are not submitted in Step 4, the consumer may turn to the community provider for bootstrapping the newcomer’s reputation (Step 6b in Figure 3.1). The community provider may provide the service for free, or charge a nominal fee (Step 7a in Figure 3.1). The community provider can employ a number of techniques in assigning the initial reputation value (Step 7b in Figure 3.1), which will be discussed shortly. The initial reputation value is then communicated to the consumer in Step 8. Based on this, the consumer can make his decision to interact with the newcomer in Step 9. Thus, even in the absence of sufficient ℜ i , the consumer is still able to bootstrap the newcomer’s reputation. Details of Step 7b follow.

3.2.2 Option II: Assigned Initial Reputation The community provider may assign a reputation value for each newcomer registering with the community and convey the assigned value to any prospective service consumer. The community provider may employ one of two ways in bootstrapping the newcomer’s reputation: assign a default value, or evaluate the newcomer (Case I and II in Figure 3.2 respectively). 3.2.2.1 Option II - Case I: Default Initial Reputation We extend the model presented in Figure 3.1, to show how a default value may be assigned as the newcomer’s reputation. Upon registration, the newcomer may present some credentials that enable it to buy initial reputation from the community provider. The newcomer may belong to the same service provider group as an existing reputable service. In this case, presenting the authenticated credentials of the existing service may guarantee initial reputation equal to that of the existing service. This is shown in steps a, b and c in Figure 3.2-Case I. Endorsement techniques [92] can also be incorporated in this strategy where newcomers may present the credentials of any service that is willing to endorse them. Alternatively, an average of all providers’ reputation may be assigned. In [44], it was shown that such an averag-

3.2 Reputation Bootstrapping

39

Fig. 3.2 Reputation Bootstrapping through Assignment

ing technique provides the best results in terms of fairness and accuracy. Without delving into much detail, we also adopt an average model in this case. 3.2.2.2 Option II - Case II: Initial Reputation Evaluation In the second case, the community provider may be asked to evaluate the newcomer. The evaluation period is determined by the community provider, and the newcomer has no knowledge of the time and number of transactions conducted during the evaluation period. It may happen that normal transactions by other consumers (who may have ascertained the newcomer’s reputation through Option-I) are also conducted during the evaluation period. Services with high credibilities (known as elders) are asked to evaluate the newcomer. The feedbacks are weighed according to elder credibilities which are as-

40

3 Reputation Information Creation

sessed using separate techniques [64]. We assume that some form of incentives (monetary or reputation) are present for services to act as evaluators. For instance, during registration (Step 1 in Figure 3.2-Case II), a newcomer may be asked to pay a monetary amount to cover such expenses (Step 2 in Figure 3.2-Case II). Similarly, the reputation incentives may be provided as means to increase the overall reputation of evaluators throughout the community. We assume that service evaluation is a voluntary process and services are not penalized for declining the community provider’s offer to act as evaluators (Step 3 in Figure 3.2-Case II). However, it is expected that with the mentioned incentive mechanisms in place, services will engage in the evaluation process. Our proposed scheme is inspired by the concept of “reviewers” in the academic world. For example, researchers are often asked by the US National Science Foundation (NSF) to act as referees (panelists) for evaluating different funding proposals. The evaluators are provided with nominal monetary and reputation (among their peers) incentives. Although researchers may decline the NSF offer, finding evaluators is guaranteed most of the times. In most situations, the evaluators may be required to pay for the services of the newcomers. “Credit agencies” may aid in such situations so that an evaluator does not suffer a monetary loss by getting a service it did not need in the first place. The evaluators send their “original” account details to the community provider (Step 4 in Figure 3.2-Case II). The community provider informs the credit agency of the evaluation transaction, and the agency is expected to respond with a “disposable” account number which is expected to last for only one transaction (Step 5 in Figure 3.2-Case II). This is analogous to the service started by American Express, where disposable account numbers were assigned to users wishing to engage in e-commerce transactions without disclosing their actual account details [104]. The community provider communicates the generated account number (acc.no.) to the chosen evaluator, to cover the invocation cost (Step 6 in Figure 3.2-Case II). The evaluators collect the required data for the newcomer (Step 8 in Figure 3.2-Case II) through interactions. It is expected that some Web service interactions may require the delivery of tangible products to the consumer (Step 8 in Figure 3.2-Case II). When a transaction is completed, the newcomer is informed by the credit agency about the “fake” charged account (Step 9a in Figure 3.2-Case II). Moreover, the evaluators are required to return the delivered product in a desired amount of time (since this is only an evaluation). Since evaluators are not “charged” (disposable accounts), it is in the best interest of the newcomer to accept the returned product (Step 10 in Figure 3.2-Case II). The delivery and return processes are carried out through a delivery agency which is expected to function in an honest manner since it is not competing with either the newcomer or the evaluator. To ensure transparency, the delivery agency is required to inform the community provider about any product delivery or return (Steps 9a and 11 in Figure 3.2-Case II). This prevents both newcomers and evaluators from lying about the product delivery status. This is analogous to the services provided by delivery agencies as UPS, FedEx, USPS, etc. In cases where the evaluator defaults by not sending the product back in the desired amount of time, the community provider can charge the evaluator’s original account and pay the newcomer. Since the newcomer has no knowledge of the authenticity of the account

3.2 Reputation Bootstrapping

41

prior to, or during the transaction, it is expected to behave normally, i.e., the newcomer cannot “pretend” to act fairly to mislead evaluation. Therefore, the evaluation mechanism provides fair and accurate results (at the expense of the evaluation time period).

“This page left intentionally blank.”

Chapter 4

Reputation Assessment

In this chapter, we describe the assessment module of the RATEWeb framework: a Reputation Assessment framework for Trust Establishment among Web services. The focus is on providing a comprehensive solution for assessing the reputation of service providers in an accurate, reliable, and decentralized manner. Since reputation forms an integral part of service-oriented environments in relation to the dynamic selection of services, on-the-fly composition of value-added enterprises, and optimization of service tasks, we have chosen Web services as a representative domain. However, RATEWeb can be extended and used in other contexts and domains. The proposed framework takes into account the presence of malicious raters that may exhibit oscillating honest and dishonest behaviors. Previous solutions for reputation assessment make simplifying assumptions that may not apply in a service-oriented environment. For example, [66] relies on pre-existing trusted parties, in [31] and [20] data needs to be distributed according to a certain statistical distribution, a common set of past providers is required in [146] for evaluating rater credibility, and in [95] human intervention is required, meaning the assessment process is not fully automated. Other similar solutions either do not consider all facets of reputation [1, 65, 116] or are focused primarily on efficiency/performance (rather than functionality) [36]. We develop a simple and holistic solution that provides an automated and adaptive reputation mechanism, whereby reputations are evaluated through a number of heuristics with different perspectives providing a fair and accurate assessment [86].

4.1 Web Service Reputation RATEWeb’s reputation model is distributed in nature. In contrast to third-partybased traditional approaches for reputation management, no single entity is responsible for collecting, updating, and disseminating the reputation of Web services. Each service consumer records its own perceptions of the reputation of only the services it actually invokes. This perception is called personal evaluation. For each Z. Malik and A. Bouguettaya, Trust Management for Service-Oriented Environments, DOI 10.1007/978-1-4419-0310-5_4, © Springer Science + Business Media, LLC 2009

43

44

4 Reputation Assessment

service s j that it has invoked, a service consumer tx maintains a p-element vector PerEval x j representing tx ’s perception of s j ’s behavior. In addition to all the quality attributes, PerEval x j contains the time stamp of the interaction between tx and s j . The time stamp is digitally signed by s j to make sure that tx does not claim to have values for PerEval x j without a legitimate interaction. Note that digitally signing the time-stamp does not ensure that tx will report true values for QoW Sd . However, it does ensure the authenticity of the tx :s j interaction. For instance, in our running example from Figure 1.2, in the absence of a time-stamp, CB may provide the vector PerEvalCB:LC , when it has not interacted with LC. The issue of falsifying the experienced quality will be discussed in the upcoming sections. PerEval kx j is tx ’s perception of service s j ’s reputation with regard to the quality parameter φk . Different strategies may be adopted in updating PerEval x j . A simple one may be a perinvocation update. Upon an invocation of service s j , the delivered quality QoW Sd is compared to service s j ’s promised quality QoW S p and, if necessary, a reputation updating algorithm is run to compute the new personal evaluation of service s j . In essence, personal evaluation reflects the QoW S performance of the provider in consumer’s views. The personal evaluation PerEval x j , represents only consumer tx ’s perception of the provider s j ’s reputation. Other service consumers may differ or concur with tx ’s observation of s j . A service consumer that inquires about the reputation of a given service provider from its peers may get various differing personal evaluation “feedbacks.” To get a correct assessment of the service provider’s behavior, all the personal evaluations for s j need to be aggregated. The aggregation of all personal evaluations to derive a single reputation value is defined as the service provider’s assessed reputation in that service’s view. The service consumers may employ different reputation aggregation techniques. Therefore the ‘assessed reputation’ for the provider may be different at each consumer. In light of the feedback-based reputation models, the personal evaluations (as calculated by different service consumers) can be considered as feedbacks, and the assessed reputation as the aggregation of those feedbacks. Note that the notion of assessed reputation as defined in our model differs from the definition of global reputation, in that it is not consistent across all services, i.e., it is an aggregation of all personal evaluations in only consumer tx ’s view. Definition: The reputation of a Web service s j ∈ S, Reputation(s j ), as viewed by a service consumer tx that wants to invoke s j , is the set of all personal evaluations that other service consumers have submitted ratings for service s j . Note that those service consumers which have not interacted with s j do not contribute to s j ’s reputation. Let L denote the set of service consumers which have interacted with s j in the past and are willing to share their personal evaluations of s j . We assume that L is not empty, i.e., some service willing to share information can be found. Thus, L ⊆ T with L 6= 0/ and each service x in L has PerEval xj values for s j . Then, reputation of s j , as viewed by a consumer x is defined as: Reputation(s j ) =

^

x∈L

(PerEval xj )

(4.1)

4.2 Reputation Evaluation Metrics

45

where represents the aggregation function. It can be as simple as representing the union of personal evaluations where the output is a real number, or an elaborate process that considers a number of factors to assess a fairly accurate reputation value. An absolute limit on the maximum value (Amax ) for Reputation(s j ) may also be placed by the service consumer to dilute the effects of indefinite reputation growth. Equation 4.1 provides a first approximation of how the assessed reputation may be calculated. However, the assessed reputation calculation involves various factors that need to be precisely defined and measured. In the following, we build upon this equation to calculate the assessed reputation score. We provide details of the various reputation evaluation metrics defined. V

4.2 Reputation Evaluation Metrics In RATEWeb, reputation management is a cooperative process in which various Web service consumers participate. Since service raters may differ in their provided reputation ratings, it is necessary to define mechanisms that can aid in deriving an accurate reputation value in presence of differing feedbacks. We have attempted to define the evaluation metrics such that the reputation of Web services can be captured as accurately as possible. Previous works as [146, 36, 17, 1, 115, 95] have defined similar metrics. However, none of the existing works have used metrics in such an extensive manner to capture the different facets of reputation, thereby effecting the overall accuracy of the models. RATEWeb is designed in accordance with real world social networks methodologies, which provide better accuracy as they mature, have the ability to evolve, and dynamically evaluate the changing conditions [67]. RATEWeb’s metrics are defined to capture most (if not all) aspects of social reputation. We believe that all the factors are essential for the accurate assessment of a provider’s reputation. The metrics are: 1. Rater Credibility: We enable the service consumers to base their decisions according to the credibility of raters. A service consumer’s credibility determines how much other service consumers may trust its reported ratings regarding the reputation of the Web services it has invoked. This allows us to differentiate between service trust and feedback trust. For instance, a service that does not have high reputation as a provider (low service trust) may be a credible source (high feedback trust) when it comes to judging the behavior of other service providers, and vice versa. The importance of differentiating between service quality and rating quality has been studied before and it is shown that reputation models that do not differentiate offer little resistance to various reputation attacks [146]. 2. Majority Rating: We provide a feedback-based reputation system where service consumers can rate the different Web services. The assessed reputation of a service provider is not a mere aggregation but is evaluated on a majority basis. 3. Past Rating History: We allow the credibility scores of raters to be updated, based on past ratings history.

46

4 Reputation Assessment

4. Personal Experience for Credibility Evaluation: We consider the possibility of a rater to default, i.e., provide an incorrect feedback. The consumers can evaluate the honesty of the feedback ratings according to the deviation between their personal experience and the ratings reported by other service consumers (raters). 5. Personal Preferences: We provide a personalized reputation evaluation where consumers can weigh the different QRe f attributes according to their own preferences. 6. Personal Experience for Reputation Assessment: We allow incorporating the ‘first-hand interaction’ data in calculating final reputation scores. 7. Temporal Sensitivity: We provide mechanisms to address the temporal sensitivity of ratings, where older ratings are given less weight than present ones. In the following, we define the above mentioned evaluation metrics in detail. We also show how these metrics help in evaluating an accurate reputation score for Web services. Note that we use all the defined metrics in unison to evaluate provider reputations and not in isolation from one another.

4.2.1 Credibility of Raters The foremost drawback of feedback-only based systems is that all ratings are assumed to be honest and unbiased. However, in the real world we clearly distinguish between the testimonies of our sources and weigh the “trusted” ones more than others [130]. A Web service that provides satisfactory service (in accordance with its promised quality (QoW S p )), may get incorrect or false ratings from different evaluators due to several malicious motives. In order to cater for such “bad-mouthing” or possibilities, a reputation management system should weigh the ratings of highly credible raters more than consumers with low credibilities [61, 32, 127, 107, 146]. In RATEWeb, the reputation score of the provider is calculated according to the credibility scores of the raters (used as the weight) [85]. Thus, Equation 4.1 becomes: Reputation(s j ) =

∑tLx =1 (PerEval xj ∗Cr (x)) ∑Lx=1 Cr (x)

(4.2)

where Reputation(s j ) is the assessed reputation of s j as calculated by the service consumer and Cr (x) is the credibility of the service rater x as viewed by the service consumer. The credibility of a service rater lies in the interval [0,1] with 0 identifying a dishonest rater and 1 an honest one. The processes involved in calculating the credibilities of raters are discussed below.

4.2 Reputation Evaluation Metrics

47

4.2.2 Evaluating Rater Credibility There are a few existing online systems such as eBay, Amazon, Yahoo! Auctions, etc. that use a centralized reputation system. Most of these systems rely only on the numerical feedbacks received from different users as a reputation measure, or in some cases supplement these with textual feedbacks also left by the consumer. The reputation values are calculated as simple aggregations of the received ratings, which may not accurately predict the trustworthiness of the providers. For example, in eBay (which is one of the most highly used online reputation systems) the buyers and sellers can rate each other on a three point scale, with +1 for a positive rating, 0 for neutral and -1 for a negative rating. The transaction participants are also asked to leave a textual feedback rating. The centralized eBay reputation system then computes the reputation as a summation of all negative and positive ratings received. Since humans are involved directly in processing the provided information (reputation value plus textual feedback), the eBay system has been successful [113] [57]. Clearly, such a ratings system is not accurate. A user with 50 positive feedback ratings will have a reputation value equaling one with 300 positive and 250 negative feedback ratings [82]. The inability of automated systems to reason in a human-like manner means that the textual feedback will not be of great use. Hence, an eBay-like system may not be practical for the service-oriented environments. We can see from the example of eBay’s reputation calculation methodology that a simple aggregation of feedback ratings does not accurately reflect a user’s reputation. Some other online businesses use an average over all ratings to compute the reputation of a user. For instance, Amazon’s auction site uses this method. It allows transaction participants to rate on a scale from 1 to 5. Then an average of all feedback ratings to date is calculated to compute an overall reputation score. Thus, a user with ratings of 4, 3, 4, 4, 5, and 3 would have an overall reputation score of 3.8. Although this method is an improvement, it still does not accurately reflect the reputation as seen in the real world. Consider another series of ratings: 1, 1, 9, 1, 1, 1, 9, 9, and 1 received for a Web service provider. In an averaging model, the overall reputation score would be 3.7. Clearly, this score is also not in accordance with the ratings received. In online reputation systems similar situations may arise where all the reported ratings are not uniform, either due to differences in raters’ actual experiences or malicious motives. Thus, designing a ratings system that is robust enough to detect and mitigate the effects of disparate ratings is a fundamental issue [33] [143]. To overcome the above mentioned problems, several methods have been proposed in literature that screen the ratings based on their deviations from the majority opinion. Examples include the Beta Deviation Feedback [22], Beta Filtering Feedback [143], Like-mindedness [139], and Entropy-Based Screening [142]. We adopt a similar notion to dilute the effects of unfair or inconsistent ratings. We use a majority rating scheme, in which the “uniformity of ratings” indicates their accuracy. The basic idea of the proposed method is that: if the reported rating agrees with the majority opinion, the rater’s credibility is increased, and decreased otherwise. Unlike previous models, we do not simply disregard/discard the rating if it disagrees

48

4 Reputation Assessment

with the majority opinion but consider the fact that the rating’s inconsistency may be the result of an actual experience. Hence, only the credibility of the rater is changed, but the rating is still considered. We use a data clustering technique to define the majority opinion by grouping similar feedback ratings together [33] [136]. We use the k-mean clustering algorithm [80] on all current reported ratings to create the clusters. The most densely populated cluster is then labeled as the “majority cluster” and the centroid of the majority cluster is taken as the majority rating (denoted M): M = centroid(max(ℜk )) ∀k where k is the total number of clusters, max(x) gives the cluster ℜ with the largest membership and centroid(x) gives the centroid of the cluster x. The Euclidean distance between the majority rating (M) and the reported rating (V ) is computed to adjust the rater credibility. The change in credibility due to majority rating, denoted by M f is defined as:  √ n p  ∑k=1 (M−Vk )2 1− if ∑nk=1 (M −Vk )2 < σ σ Mf = (4.3) otherwise 1− √ n σ 2 ∑k=1 (M−Vk )

where σ is the standard deviation in all the reported ratings. Note that M f does not denote the rater’s credibility (or the weight), but only defines the effect on credibility due to agreement/disagreement with the majority rating. How this effect is applied will be discussed shortly. There may be cases in which the majority of raters collude to provide an incorrect rating for the provider Web service. Moreover, the outlier raters (ones not belonging to the majority cluster) may be the ones who are first to experience the deviant behavior of the providers. Thus, a majority rating scheme ‘alone’ is not sufficient to accurately measure the reputation of a Web service. We supplement the majority rating scheme by adjusting the credibility of a service rater based on its past behavior as well. The historical information provides an estimate of the trustworthiness of the service raters [143] [127]. The trustworthiness of the service is computed by looking at the ‘last assessed reputation value’, the present majority rating and that service consumer’s provided rating. It is known that precisely defining what constitutes a credible rating is an interesting and hard research problem by itself [146]. However, we have attempted to define the credibility of Web services in a practical manner according to the information available to the service consumer. We define a credible rater as one which has performed consistently, accurately, and has proven to be useful (in terms of ratings provided) over a period of time. Consistency is the defined behavior of a service that exhibits similar results under standard conditions. We believe that under controlled situations (i.e., other variables being the same), a service consumer’s perception of a Web service should not deviate much, but stay consistent over time. We assume the interactions take place at time t and the service consumer already has record of the previously assessed reputations (denoted A), which is defined as:

4.2 Reputation Evaluation Metrics

49

A=

t−k G

t−1

Reputation(s j )t

(4.4)

where Reputation(s j ) is as defined in Equation 4.1 for each time instance t, is the aggregation operator and k is the time duration defined by each service consumer. It can vary from one time instance to the complete past reputation record of s j . Note that A is not the “personal evaluation” of either the service rater or the service consumer but is the “assessed reputation” calculated by the service consumer at the previous time instance(s). If the provider behavior does not change much from the previous time instance, then A and the present rating V should be somewhat similar. Thus, the effect on credibility due to agreement/disagreement with the last assessed reputation value (denoted A f ) is defined in a similar manner as Equation 4.3:  √ n p  ∑k=1 (A−Vk )2 1− if ∑nk=1 (A −Vk )2 < σ σ Af = (4.5) otherwise 1− √ n σ 2 F

∑k=1 (A−Vk )

In real-time situations it is difficult to determine the different factors that cause a change in the state of a Web service. A rater may rate the same service differently without any malicious motive, i.e., accurately (but not consistent with the last reporting). Thus, the credibility of a rater may change in a number of ways, depending on the values of V , M, and A. The equivalence of the majority rating M, submitted personal evaluation rating V and the assessed reputation at the previous time instance A is used in adjusting the service rater’s credibility Cr . The general formula is: Cr (x) = Cr (x) ± ℵ ∗ ϒ

(4.6)

where ℵ is the credibility adjustment normalizing factor, while ϒ represents amount of change in credibility due to the equivalence or difference of V with M and A. The signs ± indicate that either + or − can be used, i.e., the increment or decrement in the credibility depends on the situation. These situations are described in detail in the upcoming discussion. We place more emphasis on the ratings received in the current time instance than the past ones, similar to previous works as [22] [143] [139] [142]. Thus, equivalence or difference of V with M takes a precedence over that of V with A. This can be seen from Equation 4.6, where the + sign with ℵ indicates V ' M while − sign with ℵ means that V 6= M. ℵ is defined as: ℵ = Cr (x) × (1− | Vx − M |)

(4.7)

Equation 4.7 states that value of the normalizing factor ℵ depends on the credibility of the rater and the absolute difference between the rater’s current feedback and the majority rating calculated. Multiplying by the rater’s credibility allows the honest raters to have greater influence over the ratings aggregation process and dishonest raters to lose their credibility quickly in case of a false or malicious rating. The dif-

50

4 Reputation Assessment

ferent values of ϒ are described next. Adjusting Rater Credibilities: ϒ is made up of M f and/or A f , and a “pessimism factor” (ρ ). The exact value of ρ is left at the discretion of the service consumer, with the exception that its minimum value should be 2. The lower the value of ρ , the more optimistic is the consumer and higher value of ρ are suitable for pessimistic consumers. We define a pessimistic consumer as one that does not trust the raters easily and reduces their credibility drastically on each false feedback. Moreover, honest raters’ reputations are increased at a high rate, meaning that such consumers make friends easily. On the other hand, optimistic consumers tend to “forgive” dishonest feedbacks over short periods (dishonesty over long periods is still punished), and it is difficult to attain high reputation quickly. Only prolonged honesty can guarantee a high credibility in this case. V , M, and A can be related to each other in one of four ways, and each condition specifies how M f and A f are used in the model. In the following, we provide an explanation of each and show how the credibilities are updated in our proposed model using different values for ϒ . 1. The local reported reputation value is similar to both the majority rating and the previously assessed reputation, i.e., (V ' M ' A). The equality M ' A suggests that majority of the raters believe the QoW S of s j has not changed. The service rater’s credibility is updated as: Cr (x) = Cr (x) + ℵ ∗ (

|M f + A f | ) ρ

(4.8)

Equation 4.8 states that since all factors are equal, the credibility is incremented. 2. The individual reported reputation rating is similar to the majority rating but differs from the previously assessed reputation, i.e., (V ' M) and (V 6= A). In this case, the change in the reputation rating could be due to either of the following. First, the rater may be colluding with other service consumers (raters) to increase/decrease the reputation of s j . Second, the QoW S of s j may have actually changed since A was last calculated. The service rater’s credibility is updated as: Mf Cr (x) = Cr (x) + ℵ ∗ ( ) (4.9) ρ Equation 4.9 states that since V ' M, the credibility is incremented, but the factor V 6= A limits the incremental value to ( Mρf ) (not as big as the previous case). 3. The individual reported reputation value is similar to the previously assessed reputation but differs from the majority rating, i.e., (V 6= M) and (V ' A). The individual reported reputation value may differ due to either of the following. First, V may be providing a rating score that is out-dated. In other words, V may not have the latest score. Second, V may be providing a “false” negative/positive rating for s j . The third possibility is that V has the correct rating, while other consumers contributing to M may be colluding to increase/decrease s j ’s reputation. Neither of these three options should be overlooked. Thus, the service rater’s credibility is updated as:

4.2 Reputation Evaluation Metrics

51

Cr (x) = Cr (x) − ℵ ∗ (

Af ) ρ

(4.10)

Equation 4.10 states that since V 6= M, the credibility is decremented. And to cater for the above mentioned possibilities brought in due to the factor V ' A, the value that is subtracted from the previous credibility is adjusted to ( Aρf ). 4. The individual reported reputation value is not similar to either the majority rating or the calculated reputation, i.e., (V 6= M) and (V 6= A). V may differ from the majority rating and the past calculated reputation due to either of the following. First, V may be the first one to experience s j ’s new behavior. Second, V may not know the actual QoW S values. Third, V may be lying to increase/decrease s j ’s reputation. The service rater’s credibility is updated as: Cr (x) = Cr (x) − ℵ ∗ (

|M f + A f | ) ρ

(4.11)

Equation 4.11 states that the inequality of all factors means that rater’s credibility is decremented, where the decremented value is the combination of both the effects M f and A f . Even with the above mentioned techniques in place, every ratings submission that a service consumer receives from service raters may not prove useful. In other words, the consumer’s own experience (denoted OE) with the provider may differ from the rater’s feedback (V ). In RATEWeb, after each interaction, apart from rating the provider s j , the service consumer also evaluates the usefulness of the raters that provided a rating for s j . If the Euclidean distance between OE and Vi (both representing s j ’s assessed reputation) falls below a predefined threshold, Vi is deemed useful, otherwise it is not. The reader may think that the rater and the consumer may disagree depending on some objective feature of the service. RATEWeb avoids such a situation by considering the “personalized preferences” (for different attributes) of both the raters and the consumers. Details are presented in the next section. Service ratings are historical data, i.e., ratings from previous time instances. For example, if the provider is evaluated at time t, then the historical data would contain ratings from times t − 1,t − 2, ...,t − n. The service consumer would submit a “usefulness rating” at time t for the rater’s submission at t − 1. The usefulness of a service is required to calculate a service rater’s “propensity to default,” i.e., the service rater’s tendency to provide false/incorrect ratings. There may also be cases where raters alternate between being useful and not useful, over a period of time. Thus, to get a correct estimate of the rater’s propensity to default, we compute the ratio of the total number of times the ratings submission was useful (k) over the total number of submissions (n). This is similar to the manner in which peer recommendations are evaluated for usefulness in “recommender systems” [72, 128]. The usefulness factor (u f ) is: ∑k Ui u f = ni=1 (4.12) ∑x=1 Vx

52

4 Reputation Assessment

where Ui is the submission where the rater was termed ‘useful’ and Vx denotes the total number of ratings submissions by that service. The rater’s credibility (calculated using either of Equations 4.8 through 4.11) is then adjusted as: Cr (x) = Cr (x) ∗ u f

(4.13)

4.2.3 Personalized Preferences Service consumers may vary in their reputation evaluations due to their differences in QoW S attribute preferences over which a Web service is evaluated. For instance, consider reliability, availability, price, security, and privacy as the QoW S attributes over which Web service’s reputation is evaluated. Some service consumers may label Web services with high reliability as more reputable while others may consider low-priced services as more reputable. We allow the service consumers to calculate the reputation scores of the Web services according to their own personal preferences. Each service consumer stores its QoW S attribute preferences in a reputation significance vector (RSV). Since, service consumers can change their preferences from one transaction to the other, the RSV is submitted with each ratings submission. The service consumers can then choose either to accept the reputation evaluation scores of the raters or compute the scores themselves if they have a different RSV. In the latter case, the rater is asked for the individual QoW S attribute values instead of the computed personal evaluations. In this manner, the consumers have the ability to weigh the different attributes according to their own preferences. Let φh (s j , u)x denote the rating assigned to attribute h by the service rater x for service provider s j in transaction u, m denote the total number of attributes and RSVh denote the preference of the service consumer for attribute h. Then, the local reputation for s j as reported by service rater x is defined as: PerEval xj =

x ∑m h=1 (φh (s j , u) ∗ RSVh) m ∑h=1 RSVh

(4.14)

4.2.4 Temporal Sensitivity Reputation information of a service provider decays with time [86][90]. Hence all the past reputation data may be of little or no importance. For instance, a Web service performing inconsistently in the past may ameliorate its behavior. Alternatively, a service’s performance may degrade over time. It may be the case that considering all historical data may provide incorrect reputation scores. In order to counter such discrepancies, we incorporate temporal sensitivity in our proposed model. The rating submissions are time-stamped to assign more weight to recent observations and less to older ones. This is termed as “reputation fading” where older perceptions

4.2 Reputation Evaluation Metrics

53

gradually fade and fresh ones take their place. We adjust the value of the ratings as: PerEval x j = PerEval x j ∗ fd

(4.15)

where PerEval x j is as defined above and f d is the reputation fader. In our model, the recent most rating has the fader value 1 while older observations are decremented for each time interval passed. When f d = 0, the consumer’s rating is not considered as it is outdated. The “time interval” is an assigned factor, which could be anywhere from a single reputation inquiry, ten inquiries or even more than that. All inquiries that are grouped in one time interval are assigned the same fader value. In this way, the service consumer can define its own temporal sensitivity degree. For example, a service can omit the fader value’s effect altogether by assigning it a null value. We propose to use a fader value that can then be calculated as: f d = √1P , where u Pu is the time interval difference between the present time and the time in which the rating was collected from the rater. This allows the convergence of reputation to a very small value as time passes. Note that the consumer can assign a group of ratings collected at different times to have the same time-stamp, and hence lie in the same time interval. As mentioned earlier, other calculated values for the fader are also acceptable.

4.2.5 First-hand Knowledge Most of the service consumers that have interacted with a Web service provider in the past and were satisfied, continue/prefer to interact with that particular service. Users seldom switch their basic providers online for fear of degrading quality. Web services are inherently dynamic and new services (with better QoW S) may be introduced in the system any time. Moreover, services with low reputation scores may improve upon their score. However, if service consumers only interact with trusted Web services, they may miss better options in terms of QoW S. We allow the service consumers to incorporate their first-hand interaction knowledge for calculating the final reputation score of the Web services. To the best of our knowledge, presentday reputation systems only allow the users to view/derive a reputation value of the provider based solely on the testimonies of different users. The user’s own experience is of a subjective nature which is not factored in the reputation value. Usually, the users do not consider the providers with whom they had a bad experience in the past, even if they receive good reputation scores from other users. In RATEWeb, reported ratings are combined with first-hand knowledge to derive the reputation score. This enables the consumer to consider all Web service possibilities and select the best one. Thus, the equation for assessed reputation calculation becomes: Reputation(s j ) =

∑Lx=1 [

x ∑m h=1 (φh (s j ,u) ∗RSVh ) m RSV ∑h=1 h ∑Lx=1 Cr (x)

∗ fd ∗Cr (x)]

(4.16)

54

4 Reputation Assessment

Figure 4.1 shows the pictorial representation of the reputation assessment algorithm that uses the metrics defined above. The input to the algorithm is a list of service raters that have interacted with the service provider(s) in the past and thus have reputation ratings for them. Note that the algorithm iterates over the complete list of potential service providers obtained from a UDDI registry. The output of each algorithm invocation is the service provider with the highest reputation. To simplify the representation, we do not show loops or update processes in Figure 4.1. ð ÔzÝÞÐ ñòÓ éÐ6Ûózé:Ïì ôzÓ í#Ïì Ò#Ð Ï6ì é¡Ð õÒ#Ðqõ:Ò#ô:Ïì ÏÝzÞÐ Ò#Ð Ó ÛÔ#é¡ó Ûì Ð õ:ÏÉöNÓ ó ó Ïì ÏÔÐéÏì ôzÓ í#ÏÝzì ÛôzÓ öNÏì é÷:ÏÓ Ô#Õ}Ó Ôô#ÏzéÐ Ó ÕNÒÐ Ïö

ÎcÏ#Ð6Ñ*ÒÐ Ó ÔÕ

Ös×ÉØgÙ*Ö

î*Û

ÚÛÜÝzÞÐ ÏÖ

ï*Ïé ÚÛzÜÝzÞ'Ð Ïß

ÎlÏ#Ð}ä å

Ú*ÛÜÝ:ÞÐ Ï

ÎlÏ#Ð6æ*çÉÔè/Ò:é'Ð

ÑQÏÝzÞ#Ð Ò#Ð Ó ÛÔ

ê6ëÝ:Ïì Ó ÏÔ:í#Ï

ÎcÏÐ¡á ÚÛÜÝzÞÐ ÏÉÚ*à Î¡Ï#Ð>âã

æcÞÐ Ý:ÞÐ ñ#øÏì ôzÓ í#Ïè<ì ÛôzÓ ö6Ïì çÉÓ Ð õ¡Ð õ:ÏõzÓ Õ6õ:ÏzéÐ<ì ÏÝzÞÐ Ò#Ð Ó ÛÔ

Fig. 4.1 Reputation Evaluation Using Defined Metrics

At the start of the algorithm, a loop is started that iterates over the list of service raters that have a personal evaluation rating (the last ‘assessed reputation’ from their perspective) for the service provider (s j ) in question, and reputation ratings are collected. The service rater returns a vector that comprises of a scalar reputation rating, the reputation significance vector (RSV) of the rater, the ratings for individual attributes and the reputation calculation time-stamp. The RSV of the rater is then compared with the service consumer’s own RSV. If the values are similar, the scalar reputation rating is accepted. In case the two RSV’s are different, a reputation rating is computed based on the consumer’s attribute preferences. This allows the service consumer more flexibility in assimilating reputation ratings from various raters. Thereafter all ratings are used to compute the majority rating. Then, the credibility of the service rater (Cr ) is computed by passing the majority rating, the last calculated assessed reputation and the individual rater’s rating. Moreover, the credibility is adjusted using the usefulness factor (U f ). The reported reputations (Vi ) and each rater’s credibility value (Cri ) are then used in computing the “weighted reputation rating” (factoring in user’s own past experience, if any). The reputation value is also diluted according to the value of f d . If this computed reputation value is greater than the reputation value computed for the previous service provider, then the current s j is labeled as the service with highest reputation value. When all the s j ’s reputations have been assessed, the s j with the highest reputation is identified.

4.3 Reputation Assessment with Scarce Ratings

55

4.3 Reputation Assessment with Scarce Ratings Reputation systems rely on the feedbacks or ratings provided by the members of the community for a given subject. At times, due to various reasons, majority of the members may not be willing to engage in the rating process. In such situations of ratings scarcity, the accuracy of the reputation system may be compromised. To address the issue of ratings scarcity, we introduce the idea of reputation inference. Reputation inference approximates ratings aggregation and predicts the reputation of a given subject based on historical data. In terms of prediction accuracy, machine learning algorithms have provided better results over other traditional techniques [13]. It is well known that no single prediction method outperforms others, and the choice of the prediction method depends on data quality, bias in the training data, and the intended purpose of the prediction [157]. However, Artificial Neural Networks (ANNs) and Hidden Markov Models (HMMs) exhibit a relatively high predictive power, especially with large data sets [152]. Since ANN performances depend greatly on the chosen architecture, and a user may need to perform extensive model training by considering almost every feature [152], we prefer to use HMMs for reputation prediction. Moreover, an HMM allows a user more control than an ANN, with comparable accuracy [13]. HMMs have been used successfully in pattern recognition (voice and face), hand writing recognition, natural language domains, DNA sequence analysis, finger print matching, prediction of stock market prices, etc [56, 110]. We build on that success to predict the reputation of Web services through HMMs. We compare the prediction results to ANN-based reputation prediction to show how an HMM provides a better alternative. The proposed model is novel in the sense that no prior research has focused on using either ANNs or HMMs for predicting reputation in a serviceoriented environment. Several reputation systems have been proposed in the literature. The spectrum of these reputation management solutions ranges from “purely statistical” techniques to “heuristics-based” techniques. While statistical techniques focus on providing a sound theory for reputation management, heuristics-based techniques focus on defining a practical model for implementing a robust reputation system. Bayesian systems [143, 103] and belief models [63, 150] are the major examples of purely statistical techniques. Bayesian systems work on binary ratings (honest or dishonest) to assess the reputation, by statistical updating of beta probability density functions. In a belief model, a consumer’s belief regarding the truth of a ratings statement is also factored in reputation computation. The techniques for combining beliefs vary from one solution to the other. For example, [150] uses the Dempster’s Rule, while Subjective Logic is used in [63]. The complexity of purely statistical solutions has prompted researchers to present heuristics-based solutions. These solutions aim to define a practical, robust, and easy to understand/construct reputation management system. Some examples include [146, 61, 1, 17]. We present a hybrid solution defining key heuristics, and a statistical model (HMM) for reputation assessment. In our model, we use a simple HMM that outputs one of two reputation symbols. One symbol represents a trustworthy service (i.e., high reputation), while the sec-

56

4 Reputation Assessment

ond symbol represents an untrustworthy (with low reputation) service. The choice of only two symbols is made only for simplifying the model explanation, and the number can be extended easily. Although the reputation values obtained from Equation 4.16 may be continuous, we distinguish the results to two discrete symbols by setting a threshold. For example, on a scale from 0 to 10 (0 being the lowest) with the threshold set at 5, reputation values graters than 5 are treated as trustworthy and untrustworthy otherwise. Other techniques (as vector quantization [52]) may also be used. Defining the system with just two reputation symbols suffices the need of our application. The service consumer either interacts with a trustworthy service, or it does not interact with an untrustworthy service. The need for “middle-ground” or “fuzzy” decision making arises only in cases where no trustworthy service is available. However, this can also be handled by adding a clause that lets the consumer choose the best provider from the remaining (untrustworthy) services. Our proposed methodology is shown in Figure 4.2. Each service consumer’s HMM first trains itself using the feedbacks provided by its peers. Once a reliable model is developed, the high and low reputations of the services are predicted. In the next step, the service consumer compares all the predicted provider reputations. The provider that has the highest predicted reputation for the next time instance is chosen for interaction. After each interaction, the observed behavior values and present feedbacks are input to the HMM, and the model is refined. In the following, we provide details of our HMM-based reputation prediction model.

ùQúûü:ý þücÿzüzý þ ü z ü þ : ý ü # ý ü ü:ý û

ùú'ûü#ý þ'ücÿzü:ý þ ü ü þ ý ü : ý * ü üý û

$#$$

ùQú'ûü:ý þ'ülÿNü#ý þ " ü ! ü þ : ý ü ## ý * ü ü:ý û

ý ü % & ü '% ' û ý ( ) +0 * ,z ý ( - ./0 *

zý ü 1 & ü ' û ý ( ) 2 z ý ( - .3 4 * ,N 4 *

$#$$

zý ü 5 & ü '% ' û ý ( ) 2 * , z ý ( - .3 *

67 # û ü #8 ' ý ( )* N

Fig. 4.2 HMMs for Reputation Assessment

Model Details A reputation system can be thought of as a physical system which can be specified by assigning values to a number of variables that describe the system [110]. When the values of all state variables of the reputation system are known, we say that its state has been specified. Therefore, the state of the reputation system is all that is

4.3 Reputation Assessment with Scarce Ratings

57

needed in order to describe the reputation system at any instant. Since a reputation system is highly dynamic, it passes from state to state in the course of time. Moreover, the dynamic behavior of Web services also induces changes in the states of their associated reputation systems. Such changes in the states are called state transitions or simply transitions. We use a Hidden Markov Model (HMM) to capture the different states of a reputation system and assign probabilities for state transitions. Then using this defined model, and associated probabilities of state transitions, we predict the future states of the reputation system (based on historic data). An HMM is a finite state machine in which the observation sequence is a probabilistic function of fixed number of states. In our case, it provides a probabilistic framework for modeling service reputations. It differs from Markov chains (MCs) where the observation sequence is a deterministic function of its states. The HMM we use here is a discrete time HMM with N states and M symbols. It consists of an N × N reputation state transition matrix P that defines the transition probability from one state to another or itself, an N × M reputation output symbol probability matrix B that gives the probability of generating different reputations while being in a particular reputation state, and an N dimensional vector called the initial reputation state probability vector π that gives the probability of being in a particular reputation state at the start of the hidden Markov process. Hence a hidden Markov model is denoted as ζ = {P, B, π }. The probability (Pr) of generating the reputation sequence given the model ζ can be written mathematically as: Pr(yT1 |ζ ) = π B(y1 )PB(y2 )...PB(yT )

(4.17)

where yT1 = [y1 , y2 , ..., yT ] with each yk having either high or low reputation (i.e., one of the 2 symbols), and B(yk ) denotes the probability of generating symbol yk from different states. The estimation of the total number of states of HMM is an open question that still needs to be solved satisfactorily. “In the case of HMM, the problem of model selection (and in particular the choice of the number of states in the Markov chain component model) has yet to be satisfactorily solved” [79]. Similarly, in [24], it is stated that “the order estimation problem in an HMM is difficult because of our poor understanding of the maximum likelihood in HMMs.” The likelihood of a process in case of HMMs, increases as the number of states increases. However, even though a higher state HMM is expected to perform well, it does not guarantee to provide optimal results [79, 24]. We have used the Bayesian Information Criterion (BIC) to estimate the number of states for our reputation model. BIC is one of the most accurate and frequently used order estimation techniques [133]. It is defined as: xˆ = min[−2(sup logPr(yn1 )) + klog(n)] Mx

(4.18)

where k is the dimension of a HMM, n is length of the observed reputations sequence, M x represents the HMMs with different number of states, and xˆ represents the selected HMM with optimal number of states. Using BIC over past reputations, we train HMMs with different number of states and obtain their corresponding log-

58

4 Reputation Assessment

likelihood (denoted l) values. The model with the minimum BIC value is then chosen. For instance, Table 4.1 shows the BIC order estimation process for a number of HMMs evaluated using experimental reputation data for different service providers. Since 4.6861 × 103 is the lowest value obtained, BIC selects a 2-state HMM as the model of choice. Note that the number of HMM states that the BIC estimator selects, are specific to the training data. Thus, the number of states that different service consumers obtain for their respective HMMs, may also differ (since each consumer aggregates reputations according to his own preferences, knowledge, etc). Table 4.1 Estimating BIC values of different HMMs Model 2-State HMM 3-State HMM 4-State HMM 5-State HMM 6-State HMM

k 4 9 16 25 36

−l 2321.8056 2321.7719 2321.6676 2320.8490 2320.4865

BIC 4.6861 × 103 4.7391 × 103 4.8131 × 103 4.9070 × 103 5.0230 × 103

Figure 4.3 shows the log-likelihood (l) values generated for different HMMs. Note that all the models (2-state, 3-state, and 15-state HMMs shown) produce similar l-values, upon reaching the local maxima. Moreover, the number of iterations required to get to the local maxima are also similar. In terms of accuracy, the different state models show similar results. But since greater the number of states in an HMM, greater is the complexity of the prediction process [133], we use the minimal state HMM (2-state HMM). We use the Baum-Welch Algorithm (BWA) [14] (a form of the ExpectationMaximization (EM) algorithm) to estimate the parameters of the HMM. For details, the interested reader is referred to [133, 110]. We use the following equations to estimate the parameters of the HMM: pi j,p+1 = b j,p+1 (y) =

si j (yT1 ) ∑ j si j (yT1 )

(4.19)

∑ j s j (yT1 ) ∑ j b j,p+1(y)

(4.20)

where p and b are as defined earlier, and t t Si j (yt+1 1 ) = si j (y1 )P(yt+1 ) + α (y1 )Ui j pi j,p b j,p (yt )

(4.21)

t t S j (yt+1 1 , y) = s j (y1 , y)P(yt+1 ) + α (y1 )δ (y, yt+1 )Ui j

(4.22)

where δ (y, yt ) = 1, if y = yt , δ (y, yt ) = 0 if y 6= yt , and Ui j is an N × N matrix whose i j-th element is one, and the rest of its elements are zeros. The initial conditions are defined as: Si j (y11 ) = 0 and S j (y11 , y) = 0.

4.3 Reputation Assessment with Scarce Ratings e

Xd

x 10

0

4

59

9 d

Z

b

-1

ScXV W -1.5

]^_ YZ

[\

SU

;c @ > A/B"ChC ]^_

[\

YZ

-2

-3

ScX

S=XV W -1.5

Z

m

4

-1

:

SU

;= @> A2BDCEC

-2

S UV W -2.5

S UV W -2.5 ST

Z ]^a `

g

Log-likelihood

SiX Z

]^a `

e

S dV W -0.5

S d5V W -0.5 b

Xd

x 10

0

R

Log-likelihood

d

R

d

0

W5 X 10d FDGH/I Number of iterations A JKL5M+ K N >cAKJ @1> O LP5Q

Xd d 0 xe 10

R

X15W

ST

-3

d

W5 X10d G5Hf F"GFDHE I AKI J7 Number of iterations AL1JKM+L5N M+ O LKQ P5Q > A7N > J AK @1J> @1 O LK> P5

0

XW

15

4

S dV W -0.5 b Z

ScX

]^a `

S=X V W -1.5

]^_ YZ

[\

-1

j k

log-likelihood

Z

SU

n

;i @> AlBDChC

-2

S U5V W -2.5 ST

-3

d

0

W5 X10d F"GHEI Number of iterations A JL1M# K N > A7J @>=O LKP5Q

X15W

Fig. 4.3 BWA Log-Likelihood Estimation

The generation of the reputation sequence given the model is performed using the following procedure. We start from the π vector and select the initial reputation state by generating a uniformly distributed random number (within the desired reputation range). The reputation output symbol is obtained by again generating a uniformly distributed random number and then assigning a reputation output symbol according to the probability distribution of the B matrix. The next reputation state is then obtained by again generating a uniformly distributed random number and selecting the next reputation state according to the probability distribution of the previous reputation state from the P matrix. We then go back and find the reputation output symbol according to the probability distributions of different reputation symbols using the B matrix. We continue till the required number of reputation output symbols are obtained. Figure 4.4 shows the steps (simplified to elucidate) involved in using an HMM to assess a provider’s reputation. To decide on interacting with a particular service provider, feedbacks from different raters are collected and aggregated (to derive the provider’s reputation). In situations where adequate ratings are not present, aggregate reputations from previous instances are used to predict future provider behavior. The reputations, initial conditions (if any), etc. are used as inputs to the BWA to extract HMMs (with different number of states). BIC is then used to estimate the optimal number of states for the HMM. The selected model is then used to predict future service reputations, to aid in the interaction decision process. Any subsequent re-estimations are performed on the selected model. Note that we use the HMM to predict changes in the reputations of the providers. This is performed in the discrete

60

4 Reputation Assessment

w u { ~ %{ | u 7 | z| u

qK±±5 ±~{

o1prq?s

1| i{

7 { ~ { | u x { | u ®¢1¯1°¨ ¨c¦ ©ª ²?¡¡¢¡ ¡ ³¢ª¨

o1prq+t

D s

? t

o7w x y5z{ | }~{

o1prqvu

K1{ ~ { | u yz{ | }~ { | u

?" u

y5z{ | }~{ D{ | }~ +}" 5| u

?D o1pEq# "{ | }v~?D

K { ~{ | u K 1| %{ | u

{ ~ { z

o5prqKcw u { 7 | u z{ ~{ z 71{ ~ { | u w u | { | ~ u 1| { | u z o5p«q7 { { 7oK¬

Dv ¡¢ £r¤¥ ¢£¦ § ¨c¦ ©ª

Fig. 4.4 HMM-based Prediction Processes

sense. Once the prediction is performed, a continuous random number is generated that is based on the probability distribution of the original reputation when it is in a particular state. The computation complexity of the learning algorithm (BWA) is O(N 3 ), where O is the number of observation symbols and N is the number of states of the model.

4.4 Reputation Assessment for Service Compositions In this section, we provide a framework for managing the reputations of Web services involved in service compositions. We analyze and determine the impact of a component service’s reputation on the reputation of a composition and vice versa. Specifically, we propose techniques for the propagation of reputation information throughout the composition to aid all component services in making informed decisions regarding the selection of their respective component services. Our techniques address the problem of reputation management in compositions from two perspectives: (i) ensuring reputation assessment accuracy of component services from the point of view of the composition orchestrator, and (ii) ensuring fairness in decreasing/increasing reputations from the point of view of component services such that no service is wrongfully blamed. The ability to create and sustain a competitive advantage is the goal of any business activity. With the globalization of commercial and industrial activities, the realization of this goal requires collaborating and partnering with suppliers, customers, designers, research institutes, and so on to integrate their individual competencies to create a level of competency that is unmatched and difficult to copy and develop [119]. Different collaboration-related concepts ranging from supply chain management to virtual enterprises (VEs) have thus emerged in the last decade, each describing a different level or format of strategic collaboration. As a result, three types of enterprise models have been defined. These are: internal, stable, and dy-

4.4 Reputation Assessment for Service Compositions

61

Table 4.2 Differences between Traditional Supply Chains and Virtual Enterprises Functionality Traditional Supply Chain Marketing Push-based Production Fixed order lineup Logistics Mass approach Customer relations Dealer-managed Uncertainty management Finished goods inventory buffering Inventory High stock Suppliers Stable relationships, Long lead times

Virtual Enterprise Pull-based Customer demand-based Fast, customized Shared across enterprise Strategic management, Run-time Dynamic, Responsive

namic [126]. An internal enterprise manages almost all business functions itself and does not get involved in outsourcing. Various divisions are formed within the enterprise with each division specializing in one component of the business. In 1980s, General Motors followed such a business model, but due to the high maintenance costs it had to cut down several of its divisions and change the enterprise model. A stable enterprise is one in which moderate level of outsourcing takes place. The lead business forms long-term partnerships with other businesses that either provide inputs to the lead business or distribute its outputs. The list of partners does not change much over the years and stays somewhat stable (hence the name), since the cost of replacing a component business in a stable enterprise is fairly high. Most businesses with a supply chain based model fall under this category. In a dynamic enterprise, the lead business does not form long-term relationships with its suppliers. The partnerships are short-lived and different suppliers may be engaged to satisfy customer demands in similar scenarios, as the cost of replacing a component business is minimal [126]. VEs fall under this category. The strength and recent surge in popularity of VEs lies in the fact that providers do not need to keep active inventories of items (or develop all services), and “pull” the required item (or service) from the dynamic partner upon customer demand [83]. Other distinguishing features of VEs in comparison with traditional supply chains are listed in Table 4.2. Selection of competent and trustworthy components is a key issue in designing supply chains, and the problem is compounded in case of VEs [121]. Studies in supply chain management indicate that comparing in-depth performance reviews of suppliers over multiple criteria can provide the solution to the problem of trust [119]. However, measuring the performance of dynamic enterprises that involve a number of participants (e.g. VEs) has received little attention [121]. In [53], it is suggested that selection of an optimal mix of partners (for dynamic enterprises) depends on both the past competitive performance of businesses, and importantly, their ability to work together as a team. This is the approach we follow in developing a reputationbased trust framework for VEs that involve Web service compositions. We focus on the impact of reputation on individual Web services that work together in a “team of services” (composition).

62

4 Reputation Assessment

4.4.1 Share the Fame / Share the Blame In service compositions, service orchestrators need to be aware of their own (i.e., the composition’s) reputation so that in cases where the component services negatively affect the reputation of the orchestrator, it can take suitable actions such as replacing the component service, penalizing the defaulting component, etc. For example, the orchestrator may select to transfer the blame, incurred in the form of loss in reputation, to the actual component service responsible. Similarly, the orchestrator should also award honest component services, resulting in an increase in the orchestrator’s reputation. This recognition can come in the form of reputation increment for the component services responsible. In light of these requirements we name the proposed reputation model as share the fame / share the blame. In this book we focus on “blame sharing.” The findings are also potentially applicable to fame sharing. A service composition may involve component services that are themselves compositions. Thus, we may have nested compositions, and a composition orchestrator may not be aware of all the services involved. In these situations, blame sharing is not straight forward as there are many “hidden” variables involved. For example, consider Figure 4.5, where a user plans to attend a conference at City-X, which is totally new to the user. Since the user does not have enough time to invest in planning the whole trip, the Trip Planner (TP) virtual enterprise is consulted. The user communicates some constraints to TP. For example, the user may be interested in (a) touring the city using a bus tour guide (no cruise required), (b) be interested in watching a movie, and (c) plan to go to a good theater in the vicinity, and (d) be keen on exploring the area museums also. The services selected to meet the user’s demands are represented by straight lines in Figure 4.5. Note that Museums, Downtown Tours, Cinema and Theater are individual services that do not outsource any functionality, while Sightseeing Service, Area Tours, Arts, Bus Tour are composed services that delegate some part of their assigned work to other services (all services not mentioned here). Since the user only interacts with Trip Planner, it is unaware of other services due to privacy, trade secrets, etc. considerations [114, 83] . Similarly, Trip Planner is unaware that which of its component services is a composed service. It may happen that the services chosen by Trip Planner do not provide an optimal combination when invoked together, and the quality (hence reputation) of the Trip Planner-composition drops. Note that although a component service’s selection is based on its reputation (i.e. highly reputable services are selected), the combination thereof may not prove to be ideal due to a variety of reasons (as trade constraints, legislative or geographical incompatibilities, etc). We identify the reputation of composed services in a top-down manner. This means that post-transaction completion, apart from rating the component services, the invoking service (Trip Planner in the running example) identifies the operations that did not meet its expected QoWS values. Based on this experience, the invoker adjusts the reputation of the component service(s) in its knowledge scope responsible for delivering that operation. For example, suppose Trip Planner identifies that something went wrong with the user’s sightseeing experience which causes a degra-

4.4 Reputation Assessment for Service Compositions

63

Fig. 4.5 Sample Composition

dation in Trip Planner’s reputation. We assume that Bus Tour is the real “culprit”, but since Trip Planner only communicates with Sightseeing Service, it transfers the blame to the Sightseeing Service by reducing its reputation. Since we assume the services can observe their reputations (through indirect means), Sightseeing Service would in turn calculate the difference in its reputation. Based on the percentage decrease in its reputation, Sightseeing Service would then transfer the blame to its component services. Each service that transfers or propagates the blame down the composition chain may deploy a different strategy in calculating the amount of blame to be transferred. For example, Sightseeing Service may decide to decrease the reputation of component services linearly, exponentially, etc. The component services would in turn employ similar procedures of transferring the blame down to their component service(s). The rationale for employing such a reputation degradation chain is that each component service is held responsible for the composition’s reputation degradation. In situations where a single operation branches into several sub-operations, the blame has to be transferred to the actual culprit and service(s) responsible for other sub- operations should not take undue blame. However, since the invoker only rates the operations atomically, the component service cannot determine which suboperation defaulted. In our example, Sightseeing Service outsources the sightseeing activities to three component services namely Area Tours, Arts, and Museums. If Trip Planner labels sightseeing as the faulty operation, it will decrement Sightseeing Service’s reputation as it outsources this functionality only from Sightseeing Service. Since sightseeing is outsourced to three services (Area Tours, Arts, and Museums), Sightseeing Service has no way of identifying the culprit sub-operation directly. Hence Area Tours, Arts, and Museums would all be blamed for the fault.

64

4 Reputation Assessment

Since, in the discussion above we assumed Bus Tour was the real culprit, Arts, and Museums are blamed unjustly (Area Tours should be responsible because of Bus Tour). In other words, blaming each component service equally may not prove to be the ideal solution in terms of fairness.

Fairness and Accuracy in Blame Propagation We propose to optimize fairness in blame-forwarding through historical knowledge and fuzzy control. The purpose is two-fold: to ascertain the identity of the blameworthy service, and assess the amount of blame to be assigned to each culprit. Fuzzy control allows blame transfer control laws to be expressed as high level rules. This simplifies the controller design and facilitates incorporation of new knowledge, both automated and human. The premise behind our approach is that past defaulters are likely to defect again in a composition. The architecture of the proposed fuzzy control-based blame propagation system is shown in Figure 4.6. We assume that services can monitor their own reputations throughout the community, in a manner similar to obtaining the reputations of other service providers. The feedback loop includes a differentiator component that computes the change in the orchestrator’s reputation for each interval. A negative change in the orchestrator reputation indicates that one of the component services did not function as expected in the final composition. When the amount of change is above a defined threshold, the orchestrator transfers the blame to the component services involved. A fuzzy controller is used to assess the amount of blame to be transferred (in form of reputation loss), and the “inferred identity” of the real culprit. After blame is forwarded, the orchestrator monitors the reputations of the component services for a defined time period to ascertain the validity of its decision. This is done using the “Reputation Probing” component. The objective of this is two-fold: First, to assess whether the blame was assigned to the correct component. Second, to establish that blame forwarding helped in correcting the component’s behavior. If the component service does not ameliorate its behavior, the amount of blame is increased. The fuzzy controller’s recommendations are guided by a number of rules. These rules are stored in a rule database (rule base), which is part of the controller. The rules are written on the basis of interaction histories in the form of component services’ profiles maintained by the orchestrator. The profile consists of the component services invoked by the orchestrator, the interaction time-stamp, component reputations, and the sequence in which those component services were invoked. Table 4.3 shows such an invocation profile stored at Sightseeing Service (SG) for five services Museums (MU), Area Tours (AT), Arts (AR), Cuisine (CU), and Other (XX) that SG has invoked in the past. Note that the Time column (t) only shows continuous ascending values for clarity. In reality, these would be replaced by the actual timestamps of the interactions. Since CU and XX are not part of the current composition, Sightseeing Service focuses only on the interactions with MU, AT, and AR in the

4.4 Reputation Assessment for Service Compositions

65

Fig. 4.6 Blame Propagation System Architecture

profile. It is interested in finding out how much blame each component service (MU, AT, AR) should receive. The fuzzy controller can use the data in Table 4.3 to make generalizations about each service’s behavior. For example, the individual profiles can be deduced as follows. Museums’ Profile: (1) High reputation when invoked alone. (2) High reputation when invoked with AR and CU, (3) High reputation when invoked with AT, AR and XX, (4) Low reputation when invoked with AR and CU. In this case, (3) and (7) seem to conflict. But note that they have different invocation sequences. In (3) AR is invoked before CU while in (7) CU is invoked before AR in the sequence. Thus, such cases are treated individually and on a case-by-case basis. Table 4.3 An Invocation Profile Maintained by SG t 1 2 3 4 5 6 7

MU AT 0.7 0.8 0.75 0.11 0.3 0.71 0.2 0.45

AR CU XX

Sequence MU AT 0.8 0.3 MU-AR-CU 0.8 XX-AT 0.77 0.3 AT-AR 0.81 0.43 AT-MU-XX-AR 0.79 0.67 MU-CU-AR

If both interactions have the same invocation sequence, then the latest reputation score is considered, and older ones are discarded. Since CU is not involved in the current composition, AR-CU sequence is not considered and the generalized statement that “MU has low reputation (based on latest record (7)) when invoked as part of a composition involving AR” is made. Area Tours’ Profile: (1) High reputation

66

4 Reputation Assessment

when invoked alone. (2) Low reputation when invoked in a composition with XX. (3) Low reputation when invoked in a composition with AR. (4) Low reputation when invoked in a composition with MU, AR, and XX. This leads us to make the generalized statement that AT performs badly when invoked as part of any composition. Arts’ Profile: (1) High reputation in all conditions (invoked stand-alone or composed). Based on the evidence that Sightseeing Service has for the three services, it can come to the conclusion that in a Museums, Area Tours, Arts composition, Area Tours and Museums are the likely culprits. Thus, MU should be blamed along with AT, and AR should not receive any blame. The rules for the fuzzy controller are of the “IF...THEN...” form defined in terms of linguistic variables, which are different from the numerical input or output variables of the controller. Linguistic variables are central to fuzzy control. A linguistic variable exists in one-to-one correspondence with numeric variables and takes on linguistic values such as very low , low, average, etc. The variables actually take on a “degree of truth” (or certainty) for each possible linguistic value, which is represented as a continuous value between 0 and 1, where 0 is false, 1 is true and 0.5 represents half-way certain. In Figure 4.6, “Change-in-Component-ServiceReputation” is an example of a linguistic variable corresponding to the numeric variable for the actual change in the component’s reputation. In order to do control, we have to convert values between numeric and linguistic variables and vice versa. This is achieved by using membership functions. There are several kinds of membership functions that represent the uncertainty associated with the measurement of variables [108]. Since reputation values run from 0 to 1 in our model, where 0 is completely untrustworthy and 1 is highly trust worthy, we can classify several variables along this range. Different types of membership functions may be used for this purpose, e.g., triangular, trapezoidal, gaussian, etc. Figure 4.7 shows the membership functions we have used for the five linguistic variables. Both triangular and trapezoidal membership functions are used. In the figure, Very Low and Very High are triangular functions, while the remaining are trapezoidal ones. The membership functions provide the orchestrator’s degree of certainty. For instance, when the component service’s reputation (denoted input variable) is less than 0.1, the orchestrator’s certainty about the value being Very Low increases with decreasing reputation. When the component’s reputation is 0.15, the orchestrator still assesses it to be Very Low but the degree of certainty is same as the reputation being Low (i.e., the orchestrator is not sure whether to classify this reputation value as Very Low or Low). Similarly, for other variables. The number of rules defined using the membership functions rely on the number of inputs of the fuzzy controller. In our case, the inputs are the actual number of component services taking part in the composition. For example, if the orchestrator outsources from 2 component services the total number of rules will be 2 5 (2 for the number of inputs and 5 for the linguistic values). Assume the two component services are named CompA and CompB, then the rules are of the form shown in Table 4.4. The fuzzification component of the controller implements the membership functions discussed above and converts the input variables to their linguistic equivalents.

4.4 Reputation Assessment for Service Compositions {8w'z *$w

8rO g

67 | ' w'z | '

w'z u8rT

w*'z w*w r8'Mz 8x f(h l

f f

f'h g

f'h i

f'h j

f(h k

f(h l

f(h m

f(h n f'h o f(h p g qr'sut8r'v$w'v8x'yKw'z {*| }Tw~Aw't*8x $x | r'v

Fig. 4.7 Membership Functions for Five Variables of Reputation Table 4.4 Rule Base for Two Component Services (A and B) Rule

IF CompA’s CompB’s Reputation AND Reputation 1 Very Low Very Low 2 Very Low Low ... ... ... 32 Low High

THEN Blame to Blame to CompA AND CompB Very High Very High Very High High ... ... High None

Fuzzy inference involves evaluation of the fuzzy rules to determine which rule(s) are best suited (as per the IF part) and fire those rules to take different actions (the THEN parts) in terms of a linguistic variable. Details of fuzzy inference can be found in [108]. The de-fuzzification component is complementary to fuzzification and converts the output linguistic variable to a numerical value. The fuzzy controller output guides the orchestrator in assigning the blame to each component service involved (denoted ∂ ). Since outsourced operations are not equal in terms of impact in a composition (some operations are more important than others), amount of blame to be transferred should also reflect that. The orchestrator assigns an “operational significance” (OpSign) to each outsourced sub-operation, denoting the importance of the suboperation in completing the parent operation, with ∑ OpSignk = 1. Multiplying ∂ and OpSign, we get the amount of blame to be transferred (denoted BoS) relative to the importance of the outsourced operation and past behavior of the service in a composition. The orchestrator (Sightseeing Service in this case) may use a number of strategies in assessing the blame amount through the fuzzy controller. Assume that Trip Planner’s feedback about Sightseeing Service reduces its reputation by a factor ∆ . In order to “forward” this blame, Sightseeing Service may use: a constant strategy (i.e., component service’s reputation is reduced by at least ∆ ), a linear strategy (i.e., component service’s reputation is reduced by at least 2∆ ), an exponential strategy (i.e., component service’s reputation is reduced by at least ∆ 2 ), etc. The component services will also choose a strategy to share the blame with their component services (if any).

68

4 Reputation Assessment

There are no hard and fast rules for deciding which strategy is the best. It is thus at the discretion of the invoking service to decide which strategy suits its business model. We propose to use the rate of maliciousness in the service community as a measure of which strategy to use. The rate of maliciousness (denoted ℜ) is defined as the ratio of the number of transactions where the providers defect, to the total number of transactions (ℜ thus lies in the range [0, 1]). A provider’s “defection” is measured after each individual transaction, by the service consumer (denoted rater). If the provider performs satisfactorily in the transaction, the rater can label the transaction as “acceptable.” Alternatively, the transaction is labeled as “defective.” Thus, defection (denoted D) can be represented as a binary. Since service raters can differ in their total number of transactions, and the number of defective transactions experienced, we can expect a variation in the value of ℜ across different service raters. In essence, the value of ℜ would depend on each rater’s personal experience, and the manner in which it estimates D after each transaction. The basic idea of the proposed scheme is for the component service invoker to forward less blame when ℜ is low, and higher degrees of blame when ℜ is high. This allows the consumer to adapt to the current state of the system (i.e., defective vs. acceptable transactions). Formally, for each service consumer i, ℜ is defined as: ℜi =

Di Ti

(4.23)

where Di is the number of transactions where providers have defected for consumer i, and Ti is the total number of transactions that consumer i has undertaken. Since the severity of defective transactions varies, the service rater can also assign relative weights to the transactions. For example, in a “high impact” transaction where the consumer suffers a huge loss as a consequence of the provider’s defection, the consumer may count two (or more) defective transactions instead of one (while increasing the Ti count by only one), to increase ℜi . The assignment of such weights is left at the discretion of the service rater. In the experiments, we evaluate different blame strategies for component services to see what strategy is most efficient in which scenario. The only matter of concern is that the amount of blame forwarded should be in accordance with the deviation of the service’s performance from its promised value (as mentioned in [158]). If the constant strategy is chosen the blame would be calculated as ∆ + (∆ ∗ BoS), under the linear strategy blame transfer amount would be calculated as 2∆ +(∆ ∗BoS), etc. Hence each component service shares the blame of its invoking service by at least ∆ plus the penalty incurred due to its OpSign and ∂ . This ensures that the amount in reputation degradation increases as we traverse down the chain of component services to the real culprit.

Chapter 5

Robustness and Performance Study

In this chapter, we provide details regarding the implementation and performance study of the proposed approach for reputation assessment. The experimental study is holistic in the sense that we have verified the accuracy of all the modules we defined under RATEWeb: reputation bootstrapping, rater credibility evaluation, reputation assessment for individual and composed services, and reputation prediction using a hidden Markov model. In the following, we provide details of each study.

5.1 Reputation Bootstrapping Evaluation In this section, we provide a detailed evaluation of the Reputation Bootstrapping techniques proposed in Chapter 3.2. We simulate interactions on the service Web using 100 Web services that interact with each other over a period of 100 transactions. The default value for the rate of maliciousness in the community (ℜ) is 10% with the number of credible raters (C) being greater than 60%. For details regarding ℜ and C, please consult Chapter 3.2.

5.1.1 Comparing Bootstrap Approaches In the first experiment, we intend to evaluate the accuracy of our proposed “Adaptive” technique in assigning an initial reputation, and compare it with the existing “Punishing” approach, and the case where each newcomer is unconditionally trusted (denoted “None”). We use the average bootstrap success rate of the community to measure each technique’s accuracy and effectiveness. Bootstrap success rate (BSR) for each individual service is defined as the difference between the initial assigned reputation (using the respective technique) and the actual reputation (calculated post-transaction completion). In this scenario, we vary the rate of maliciousness (ℜ) in the commuZ. Malik and A. Bouguettaya, Trust Management for Service-Oriented Environments, DOI 10.1007/978-1-4419-0310-5_5, © Springer Science + Business Media, LLC 2009

69

70

5 Robustness and Performance Study

Bootstrap Success Rate

nity in steps of 10%, and assume that the honesty of newcomers is proportional to ℜ. Note that when each newcomer is assigned a high reputation, services are likely to defect, and then whitewash their reputation without any penalties. The “None”plot in Figure 5.1 represents this behavior. As ℜ increases in the community, BSR drops. When ℜ is greater than 50%, BSR reaches the minimal value since a service consumer is not likely to get the service it requires. In case of “Punishing” each newcomer, there is a possibility of putting the legitimate newcomers (non-whitewashers) at a disadvantage. This is shown by the initial drop in the BSR values, where majority of the newcomers are assigned low reputations, when in fact they perform honestly. However, the punishing-approach works well for increasing ℜ. This is consistent with the previous work [156].

1

Comparing Bootstrapping Approaches

0.5

0

None Punishing Adaptive 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rate of Maliciousness

Fig. 5.1 Comparing Bootstrapping Approaches with Increasing ℜ

We can see that the proposed adaptive approach works well for all cases of ℜ. Note that when ℜ is neither too high nor too low, the adaptive technique suffers in terms of accuracy. However, even in this worst case, the adaptive technique works as effectively as (or better than) other techniques.

5.1.2 Evaluating the Adaptive Approach In the second experiment, we vary the credibilities of raters (in reporting ℜ i ) to study their effect on the accuracy of the proposed technique. We observe the differences between the actual ℜ and the calculated ℜ (obtained through rater testimonies) as an accuracy measure. We increase ℜ in a linear manner with the defection in the community increasing at each time instance. Moreover, all the dishonest raters report values that differ by at least three points from the observed values, e.g. on a scale of 1 to 10, if 2 is observed then a value of 5 or higher is reported, for an actual value of 4 the reported values could be 1 and lower, or 7 and higher, etc. Figure 5.2 shows the comparisons between the actual community ratios and adaptive ratios for different rater credibilities. We can see that for the first three cases, i.e., when 10%,

5.1 Reputation Bootstrapping Evaluation

71

20%, and 40% of the raters are dishonest, the calculated ℜ is close to the actual maliciousness ratio, and the effects of dishonest reports are diluted. On the other hand, when 60% or higher percentage of raters are dishonest, there is considerable deviation in terms of accuracy. The percentage of dishonest raters cannot be exactly determined as it is a highly dynamic phenomenon. Evidence from previous empirical studies of eBay’s reputation system (one of the most widely used reputation systems), suggests that almost 99% of the raters stay honest in the system and provide positive ratings to the providers [113]. Although such a high percentage may be attributed to the nature of eBay’s business model (auction) or its reputation model (both parties rate each other), we believe it provides a rough guideline of the number of credible raters in a ratings-based reputation community. Some authors have also proposed approaches for eliciting honest ratings [65, 101]. 1 0 % Raters are Dishonest

1

Defective Ratio

DefectIve Ratio

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0.7 0.6 0.5 0.4 0.3 0.2 0

1

11

21

31

41

51

61

71

81

1

91

11

21

31

41

Defective Transactions

4 0 % Raters are Dishonest

1 0.9

51

61

71

81

91

81

91

Defective Transactions

6 0 % Raters are Dishonest

1 0.9

0.8

Defective Ratio

Defective Ratio

0.8

0.1

0

0.7 0.6 0.5 0.4 0.3 0.2 0.1

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 1

11

21

31

41

51

61

71

81

91

Defective Transactions

0 1

11

21

31

41

51

61

71

Defective Transactions

8 0 % Raters are Dishonest

1

Defective Ratio

2 0 % Raters are Dishonest

1 0.9

0.9

0.9 0.8

YYKAY

0.7 0.6

8 'K8Y2YY /* / ' *¡Y 8 *¡Y¢£K£Y( 2/' ¤ ¢YK¥$ ¦'¡2§¥'¥Y¨ Y((©Yª

0.5 0.4 0.3 0.2 0.1 0 1

11

21

31

41

51

61

71

81

91

Defective Transactions

Fig. 5.2 Ratio Comparisons with Different Number of Dishonest Raters for the Adaptive Approach

We conclude that the proposed adaptive technique for reputation bootstrapping can control some amount of dishonesty on part of raters, but in situations where more than 40% of the raters are dishonest, an alternate reputation bootstrapping mechanism may be required.

72

5 Robustness and Performance Study

5.1.3 Comparing the Reputation Assignment Approaches In the third experiment, the different reputation assignment methods are evaluated. We intend to show that reputation assignment through “evaluation” provides the most accurate results. Figure 5.3 shows the reputation values assigned by the community provider for one single service provider using different techniques. For clarity, only the first ten time instances are shown. We show the assigned values for: two constant values (a high one of 7.5 and a low one of 2), average reputation, reputation through referrals, and show how they relate to the actual provider reputations collected through evaluation. Figure 5.3 illustrates the case of an option that is fair to the newcomer in one instance but that may prove not to be so for the next instance. For example, consider the third time instance where the provider’s reputation is evaluated to be 5. If the newcomer is assigned a high constant (7.5), then existing services have the incentive to white-wash their low reputation for a new high one. However, if a low constant (2) value is assigned, then the new service will be disadvantaged despite the white-wash incentive being removed. Similarly, if the average reputation was assigned, it would be 3.3 (way below than the provider’s actual behavior). Bootstrapping reputation through referrals provides the closest results, but in some situations it is also not accurate. Moreover, referrals are expected to be scarce (due to market competition) in service-oriented environments. Thus, we conclude that evaluation provides a suitable alternative for bootstrapping the newcomer’s reputation through assignment.

Fig. 5.3 Bootstrapping Reputation through Assignment

5.2 Rater Credibility Analysis

73

5.2 Rater Credibility Analysis The basis of any modeling effort can be categorized as either analytical or empirical. Analytical models are constructed to aid prediction of system behavior from a high-level or generalized perspective. In contrast, empirical analysis is often conducted to study system specifics in a detailed manner [43]. Accurate representation of system behavior through analytical modeling is only possible if relationships between system elements are well-understood, and can be represented mathematically. However, “by definition, an analytical model of reality must always be incomplete” [123]. Therefore, empirical evaluation should accompany modeling so that key elements of the model can be understood and “tweaked” to generate an optimal outcome. In this section, we discuss our analytical model for adjusting rater credibilities. A comprehensive evaluation and analytical and empirical analysis of our proposed framework is also presented. Following the general rule that credibility of a rater should increase if the rater is perceived as honest and decrease otherwise, we use Equation 4.6 as the basis of our analytical model (value of ℵ is as defined in Equation 4.7). Therefore, the credibility of the rater at time instance n is updated as: if tx is honest Cr (tx , i)n−1 (1 + α (1− | Vx − M |)) Cr (tx , i)n = (5.1) Cr (tx , i)n−1 (1 − β (1− | Vx − M |)) otherwise where α and β represent the amount of change in credibility. These variables are assigned constant values in the analytical model, and are used as a replacement for the factors defined in Equations 4.8 through 4.11. Note that for pure analytical anal¯ However, ysis, M can be replaced by the statistical mean of all ratings received (V). to keep the comparison of the proposed framework with the generalized analytical model simple, we use M instead. We use a Gaussian distribution to model the behavior of providers and raters. This enables us to choose a mean value for the provider’s performance in each time instance, and use a finite variance to generate various ratings. Rater behaviors are defined such that honest raters always generate a value close to the provider’s original performance value (provider’s real reputation for that time instance used as a mean for rater gaussian distribution), while dishonest raters provide incorrect values for majority of the cases. The analytical model is evaluated using hundred (100) raters. We generate performance values for a single provider for 100 rounds, and corresponding 100 ratings in each round. An individual rating t x is considered equivalent to the majority rating M if the absolute distance between them is less than 0.15 (on a scale of 0 to 1).

74

5 Robustness and Performance Study

Setup As part of the empirical analysis, the experiments we conducted provide evidence that our model provides results similar to the the general analytical model and allows us to assess the robustness of the proposed techniques. The experiments were designed to simulate interactions on the service Web, and were conducted in a closed environment where the actual behavior of service providers is accurately captured, i.e., we can monitor each service’s behavior. Provider behaviors generated in the analytical analysis (through a gaussian distribution) are used in the simulation to facilitate comparison between analytical and empirical methods. The reputations for the service providers are then calculated based on the testimonies of different service consumers. In each time instance, some service consumers and providers interact with each other and at the end of the interaction, service consumers rate the service providers. For experimental and data gathering purposes, we assume that each service provider offers a single operation that can be invoked by the service consumers. The validity of the proposed metrics is calculated by observing the difference between the actual provider behavior and calculated reputation. Although system performance related to reputation storage and collection overheads is important, we focus on consumer-centric performance here, i.e., accurately evaluating the rater credibilities and hence reputations of service providers. Dynamic Provider Behavior Since the presence of malicious providers cannot be discounted, we evaluate our techniques against different types of provider behavior. The first type of providers perform consistently and do not show large variance among their behaviors from one transaction to the other. This means that a provider that performs with high QoW S values continues to do so (indefinitely), and does not engage in any malicious activity to take advantage of the consumers’ trust that it may have attained over time. Similarly, a low performing provider does not amend its behavior and stays in the “same state”. The second type of providers change their behavior after a number of transactions, either to (i) take advantage of the consumers’ trust, or (ii) ameliorate their behaviors. The first case includes strategic providers that aim to build a reputation by performing honestly initially, and then start “milking” [146] the attained reputation. In the experiments discussed below, we focus on providers that perform with consistently high reputations, and providers that change state from good behavior to bad (other cases, as performing with low QoW S values or changing from low to high behavior show similar results). These groups (and any combination thereof) cover any behavior that a service provider may exhibit. This ensures that the experiment samples are representative of the real world environment which contains a variety of provider behaviors.

5.2 Rater Credibility Analysis

75

Reputation Assessment with Varying Number of Credible Raters We have divided the service raters into two groups: honest raters (ones with high credibility), and dishonest raters (ones with low credibility). These groups can be related to each other in one of three ways in the environment: the number of honest raters can exceed those of dishonest raters, dishonest raters can out-number honest raters, or honest and dishonest raters can be equal in number. We set the inequalities in rater behaviors (first and third scenario) to be significant and use a 70-30 ratio imbalance to indicate the inequality. In the following, provider reputations are assessed for each scenario and compared with the actual provider reputations for both analytical and empirical methods. Note that all raters start with neutral credibility values (0.5 on a 0 to 1 scale, with 1 meaning an honest rater). «¬ A®¯ °±¬²³¯ ´ µ·¶¸?¹ºA¯ ºY´ ±¹K´»§±µ¼Y®¯ ?¬ ½A¾§±A¿?ÀK´ ¼(´ ¯ /¹·Á?´ ¼YÂAº ¯ ¹ Ã¹±³Á/´ ¼(´ ±³Ä Åu¯ Æ/µÇ ½ Reputation Assessment Comparison

0.5

0.6

Original Simulation

0.4

Analytical

0.4 Error

Reputation

0.8

0.2 0

Reputation Assessment Error

0.6

1

Analytical

0.3

Simulation

0.2 0.1

11

15 4

30 7

45 10

60 13

75 16

0

90 19

11

15 4

30 7

45 10

60 13

75 16

90 19

Transactions

Transactions

«¬ A®¯ °±¬²³¯ ´ µ·¶µ¼A¹Æ/¯ ¹ÆÈ»§±Aµ¼M®¯ /¬ ½M¾§±¿ÀA´ ¼M´ ¯ ?¹ÈÉA®±Mº³Ê ¬ /ËÌÃ¹A± Á/´ ¼Y´ ±´ Í£¹´ µ±¬/Ä Å¸¯ Æ/µ³´ ÈÎA²Ç ½ Reputation Assessment Comparison Original

0.5

Simulation Analytical

0.6 0.4

0.4 Error

Reputation

0.8

0.2 0

Reputation Assessment Error

0.6

1

Analytical

0.3

Simulation

0.2 0.1

11

15 4

30 7

45 10

60 13

75 16

90 19

Transactions

0

11

15 4

30 7

45 10

60 13

75 16

90 19

Transactions

Fig. 5.4 High Credibility Raters Out-Number Others

5.2.1 Honest Raters in Majority In the first instance of the experiment, honest raters (ones with high credibility values) out-number dishonest raters. Honest raters report values close to the originals (within the 0.15 threshold), while dishonest raters deviate by at least 0.4 points (on

76

5 Robustness and Performance Study

a 0 to 1 scale). Figure 5.4 shows the effect of the inequality in raters’ numbers for calculating the provider’s reputation, and the amount of error experienced as a result. Since majority of the raters are honest, dishonest raters have little effect on the reputation assessment process, and little error values are generated. The error is maximum for about the first 6 time instances since the consumer needs to adjust dishonest raters’ credibilities. However, it is still less than the 0.15 threshold we set for the experiments. Both provider behaviors (consistent and state change), show similar results as dishonest raters lose credibility early. ÏÐ ÑAÒÓ ÔÕÐÖ³Ó × Ø·Ù¸Ñ?ÚÛAÓ ÛY× ÕÚK×Ü§ÕØÝYÒÓ Ñ?Ð ÞAß§ÕAà?áK× Ý(× Ó Ñ/Ú·â?× ÝYãAÛ Ó Ú äÚÕ³â/× Ý(× Õ³å æuÓ ç/Øè Þ Reputation Assessment Comparison

0.5

0.6

Original Simulation

0.4

Error

Reputation

0.8

0.4 Analytical

0.3

Simulation

0.2

Analytical

0.2 0

Reputation Assessment Error

0.6

1

0.1

11

15 4

30 45 60 7 10 13 Transactions

75 16

0

90 19

11

15 4

30 45 60 7 10 13 Transactions

75 16

90 19

ÏÐ ÑAÒÓ ÔÕÐÖ³Ó × Ø·ÙØÝAÚç/Ó ÚçÈÜ§ÕAØÝMÒÓ Ñ/Ð ÞMß§ÕàáA× ÝM× Ó Ñ?ÚÈéÑAÒÕMÛ³ê Ð Ñ/ëÌäÚAÕ â/× ÝY× Õ× Ñ ì£ÚÑ× ØÕÐ/å æ¸Ó ç/Ø³× ÑÈíÑAÖè Þ Reputation Assessment Comparison Original Analytical

0.6 0.4

0.4 Analytical

0.3

Simulation

0.2

0.2 0

0.5

Simulation Error

Reputation

0.8

Reputation Assessment Error

0.6

1

0.1 11

15 4

30 45 60 7 10 13 Transactions

75 16

90 19

0

11

15 4

30 45 60 7 10 13 Transactions

75 16

90 19

Fig. 5.5 High Credibility Raters and Low Credibility Raters are Equal in Number

5.2.2 Equal Number of Honest and Dishonest Raters The second instance of the experiment where the number of honest and dishonest raters are equal in number (50-50 ratio) is shown in Figure 5.5. It can be seen that in this case, the analytical method converges rater credibilities near the 15th. time instance, while through simulation, honest and dishonest raters are distinguished near the 30th. time instance. However, even after this point is reached, final reputation calculations show a slight variation due to the dishonest raters’ cluster chosen as the “majority rating” (M).

5.2 Rater Credibility Analysis

77

îï ðAñò óôïõò ö ÷·øuð/ùAúAò úYö ôAùAöû§ô÷ü(ñ?ò ð?ïOýKþ§ôA ÿ Kö ü(ö ò ð? ù ? ö üA ú ò ù ù ô / ö üYö ô ¸ò ? ÷ ý Reputation Assessment Comparison

1

0.5

0.6

Original

0.4

Simulation

Error

Reputation

0.8

11

15 4

30 45 60 7 10 13 Transactions

75 16

0.4

Analytical

0.3

Simulation

0.2

Analytical

0.2 0

Reputation Assessment Error

0.6

0.1 0

90 19

1

15 4

30 7

45 10

60 13

75 16

90 19

Transactions

îï ðAñò óôïõò ö ÷·ø¸÷üK ù ? ò ù ûô÷üMñò ð?ï ýKþôÿ Kö üMö ò ð? ù £ ðAñôM ú ï ð ùA ô / ö ü(ö ôö ð2 ùðAö ÷Aô ï uò ?÷³ö ð ðA õ ý Reputation Assessment Comparison

0.8

Original

0.5

0.6

Simulation

0.4

Analytical

0.4

Analytical Simulation

0.3 0.2

0.2 0

Reputation Assessment Error

0.6

Error

Reputation

1

0.1 0

11

15 4

30 45 60 7 10 13 Transactions

75 16

90 19

1

15 4

30 7

45 10

60 13

75 16

90 19

Transactions

Fig. 5.6 Low Credibility Raters Out-Number Others

5.2.3 Dishonest Raters in Majority In the third instance of the experiment, dishonest raters out-number honest raters in the environment. Figure 5.6 shows the reputations that are calculated in this situation where rater credibilities are mostly low. Comparable to the actual provider behavior, these ratings show some deviation. This is due to the malicious reporting of non-credible raters. Note that collusion among raters is not enforced for these experiments, and raters agreeing on a (dishonest) reputation value is only incidental. Since consumer reporting is not uniform and is dishonest, assessed reputations show deviation from the actual provider performance values. Still, the assessed reputation are fairly consistent and close to the actual reputations. This is mainly due to the incorporation of first-hand knowledge at the end of each transaction, which dilutes the effects of dishonesty to some extent by lowering (dishonest) rater credibilities. However, the overwhelming majority of malicious raters cause M to change in each time instance, and hence the assessed reputation.

Discussion We need to make a few observations for the above mentioned techniques. First, the consumer can base his decision only on the information he has (from the past),

78

5 Robustness and Performance Study

and the information he gathers (in form of feedback ratings). Second, the credibility of the raters is directly influenced by the number of ratings that are similar to each other, and previously assessed provider reputation. The proposed techniques emphasize the “majority rating” where the agreement with the majority results in a credibility (and hence weight) increment. We believe this is a valid assumption as malicious raters are likely to be scattered, and an attempt to gain a majority (through collusion) would prove too costly [136]. Third, even if the large majority in one round wrongfully alters a provider’s reputation (and rater credibilities), the consequent rounds will detect malicious rating anomalies. If a large number of raters continue to act maliciously for extended periods of time, then such anomalies are hard to detect as a service consumer cannot decide on the actual honesty of the majority of raters. This is also in accordance with the real life social networks phenomenon [23]. However, the consumer’s own personal experience aids in detecting such malicious raters. We believe that the strength of our proposed techniques lies in the ability of a service consumer to identify malicious raters (either in the present round or in consequent ones), and assess the provider reputation accordingly.

5.3 RATEWeb Evaluation and Comparison

79

5.3 RATEWeb Evaluation and Comparison In this section, we provide a comprehensive evaluation (in terms of accuracy) of RATEWeb, and compare its accuracy with a similar existing work.

Setup We have created a service environment of hundred (100) Web services, with one round of interactions between providers and consumers spanning over 1000 timeinstances. We conduct fifteen rounds of experiments and list the average behavior exhibited. Five QoW S attributes namely: reliability, availability, invocation fee, response time and authentication are used in the experiments (since using more than one criterion provides a better evaluation of the service provider). The values to these attributes are assigned according to a pre-defined rating scheme. The only data set available (to the best of our knowledge) for measuring QoS of real Web services invocations [3, 4] is used for generating service provider behaviors. This is similar to the data collection techniques presented in [57] for capturing the behavior of sellers at eBay, and enables us to validate the applicability of our approach in a realistic setting. Using our running example (Figure 1.3), we have simulated interactions between Car Broker Web services as consumers and Credit History Web services as providers. In each time instance, some service consumers and providers interact with each other and at the end of the interaction, service consumers rate the service providers. Also, the two parties record the interaction and digitally sign it. Digitally signing the interaction history helps to avoid the “interaction falsification” attack, where a consumer may provide the rating for a service provider when it has not interacted with that provider. For experimental and data gathering purposes, we assume that each service provider offers a single operation that can be invoked by the service consumers. The reputation of each provider is then calculated according to our proposed metrics. Table 5.1 Evaluation Parameters Parameter

Default Value Number of Web services 100 Number of transactions in one round 1000 Total experiment rounds 15 Quality attributes (denoted QoW S) measured 5 Provider behavior groups 5 % of malicious raters (denoted Sm ) 50% % of feedbacks in which raters act maliciously (denoted mtrans)

None

Change Value No No No No No Yes (Varies) Yes (Varies)

80

5 Robustness and Performance Study

Consumers rate the providers according to their own judgment (Rater 0, Rater 1, etc.) which are known as personal evaluations. The personal evaluations are used to compute consumer credibilities and the majority rating. Consequently, the ‘assessed reputation’ of a service provider as viewed by a service consumer is calculated. The assessed reputation for a single provider may vary from consumer to consumer since it depends on various personalized preferences. In the experiments, we compare the scalar values of the actual provider performance (QoW Sd ) and the calculated assessed reputation from a single service consumer’s point of view, to assess the robustness of our proposed techniques. The consumers are assumed to function honestly in a transaction (e.g. pay on time, provide required details, etc). Therefore, the rational strategy for service providers is to act honestly and consistently with high QoW S in order to increase/maintain their client bases. We also study the effects of malicious consumer behavior in the role of a service rater (i.e., reputation feedback provider). Table 5.1 shows the list of different parameters used in the experiments. The parameters that are changed during the experiments are mentioned in the column labeled “Change Value,” along with the new value. For example, the parameter Sm is changed in some experiments. However, since the new value is not fixed and differs from one experiment to the other, “Varies” is listed as the new value. The hundred service providers are divided into five groups of twenty members each. The groups are created to simulate various malicious behaviors. The first group of providers behave consistently with high QoW S values, i.e., these providers behave rationally and do not engage in any malicious activity. The next group performs with consistently low QoW S values. These providers are always looking to take advantage of the consumer. The third group performs with high values for the first 500 time instances but then suffer a performance degradation. These are strategic providers that aim to build a reputation by performing honestly initially, and then start “milking” [146] the attained reputation. The fourth group acts in an opposite manner to the third group where providers perform with low values in the beginning. After the 500th time instance, they ameliorate their behavior and start performing with high QoW S attribute values. These providers are ones that learn from their mistakes. The final group of providers perform in a random manner, oscillating between high (performing as promised) and low reputation values (acting maliciously). The five mentioned groups (and any combination thereof) cover any behavior that a service provider may exhibit. This ensures that the experiment samples are representative of the real world environment which contains a variety of provider behaviors. Similar to previous experiments, service raters are divided into two groups: honest raters (ones with high credibility), and dishonest raters (ones with low credibility). These groups can be related to each other in one of three ways in the environment: the number of honest raters can exceed those of dishonest raters, honest and dishonest raters can be equal in number, or dishonest raters can out-number honest raters. We set the inequalities in rater behaviors (first and third scenario) to be significant. We use a 75-25 ratio imbalance to indicate the inequality. In the following, provider reputations are assessed for each scenario and compared with the actual provider reputations.

5.3 RATEWeb Evaluation and Comparison

81

5.3.1 Honest Raters in Majority In the first instance of the experiment, honest raters (ones with high credibility values) out-number dishonest raters. Figure 5.7 shows the effect of this inequality in calculating the provider’s reputation. The plots are labeled A through E to indicate the five provider groups defined above. For instance, Figure 5.7-A shows the comparison between original provider performance (QoW S d ) and the assessed reputation for providers that perform consistently with high QoW S values. It can be seen that due to the high number of honest ratings, the assessed reputations are almost equal to the original provider performance. The small variation in assessed and original reputations is due to the inconsistency brought in by the (honest) differences in opinions of credible raters and malicious attempts of non-credible raters. This is true for any type of provider behavior (Figure 5.7-A through E). 2 110

3

Provider Performance Consistently High

110

Reputation Value

Reputation Value

0.88 0.66 0.44 0.22 0 0

01

4 150

7 300

10 450

13 600

16 750

0.8 8 0.6 6 0.4 4 0.2 2 00

19 900

Provider Performance Consistently Low

0

150

Transactions

Provider Performance Degrades from High to Low

110

0.8 8 0.6 6 0.4 4 0.2 2 0 0

0 1

150 4

300 7

450 10

600 13

750 16

900 19

Transactions

600

750

8 0.8 6 0.6 4 0.4 0.2 2 00

0 1

150 4

300 7

450 10

600 13

750 16

Transactions

5

Provider Performance Oscillates

110 0.8 8

0.6 6

! "# $ %'&(&)&&)*+') ,.-/ #0/ 1"

0.4 4 0.2 2 0 0

0 1

150 4

300 7

450 10

600 13

900

Provider Performance Upgrades from Low to High

4

Reputation Value

450

Transactions

Reputation Value

Reputation Value

110

300

750 16

900 19

Transactions

6

Fig. 5.7 Reputation Assessment when High Credibility Raters Out-Number Others

900 19

82

5 Robustness and Performance Study

5.3.2 Equal Number of Honest and Dishonest Raters The second instance of the experiment where the number of honest and dishonest raters are almost equal in number is shown in Figure 5.8. It can be seen that although the assessed reputations are not exact, but are close to the desired value. In terms of accuracy, deviation above or below a certain threshold is acceptable. Since majority ratings that are calculated from the reported consumer ratings depend on the generated data clusters, we use a two-point threshold in our experiments, whereby the ratings that differ by two points are allocated to the same cluster. The centroid of a cluster holding maximum elements is the majority rating for a particular time instance. Note that if two or more clusters have equal number of members, then the tie is broken randomly for the choice of the majority rating. Therefore, the majority ratings (and hence reputations) that are calculated in Figure 5.8 are sometimes closer to the actual performance (high credibility cluster’s centroid chosen as M) and at other times are not so close (low credibility cluster’s centroid chosen as M). N 10 1 0.8 8 6 0.6 4 0.4 0.2 2 00

110

Reputation Value

Reputation Value

O

Provider Performance Consistently High

10

150 4

300 7

450 10

600 13

750 16

0.8 8 0.6 6 0.4 4 0.2 2 0 0

900 19

Provider Performance Consistently Low

10

150 4

Provider Performance Degrades from High to Low

6 0.6 4 0.4 2 0.2 0 1

150 4

300 7

450 10

600 13

750 16

900 19

Transactions

600 13

750 16

900 19

Provider Performance Upgrades from Low to High

6 0.6 4 0.4 2 0.2 0 0

0 1

150 4

300 7

450 10

600 13

750 16

900 19

Transactions

P

Q

Provider Performance Oscillates

110

Reputation Value

450 10

8 0.8

0.8 8

00

110

Reputation Value

Reputation Value

110

300 7

Transactions

Transactions

0.8 8 0.6 6

7.8 98: ;

0.4 4

<= > ?> @.AB CED(DFDD0FGIHEFJ.K0L A0L > M@

0.2 2 00

10

150 4

300 7

450 10

600 13

750 16

900 19

Transactions

R

Fig. 5.8 Reputation Assessment when High Credibility Raters and Low Credibility Raters are Equal in Number

5.3 RATEWeb Evaluation and Comparison

83

5.3.3 Dishonest Raters in Majority In the third experiment, dishonest raters out-number honest raters in the environment. Figure 5.9 shows the reputations that are calculated in this situation where rater credibilities are mostly low. Comparable to the actual provider behavior, these ratings show some deviation. This is due to the malicious reporting of non-credible raters. Note that collusion among raters is not considered for these experiments, and raters agreeing on a (dishonest) reputation value is only incidental. Since consumer reporting is not uniform and is dishonest, assessed reputations show deviation from the actual provider performance values. Still, the assessed reputation are fairly consistent and close to the actual reputations. This is mainly due to the incorporation i Provider Performance Consistently High

0.88 0.66 0.44 0.22 0 0

110

Reputation Value

Reputation Value

1 10

j

1 0

4 150

7 300

10 450

13 600

16 750

0.66 0.44 0.22 0 0

19 900

Provider Performance Stays Low

0.88

0 1

150 4

Provider Performance Degrades from High to Low

0.8 8 0.6 6 4 0.4 0.2 2 0 0

0 1

150 4

300 7

450 10

600 13

600 13

750 16

900 19

750 16

900 19

0.8 8 0.6 6 4 0.4 2 0.2 00

0 1

150 4

Transactions

300 7

450 10

600 13

750 16

900 19

Transactions

k

l

Provider Performance Oscillates

110

Reputation Value

450 10

Provider Performance Upgrades from Low to High

1 10

Reputation Value

Reputation Value

110

300 7

Transactions

Transactions

0.8 8

S.T UTV.W

0.6 6

XY Z [Z \0]^ _E``ba `(`0acId'ae fbg ].g Z h\

0.4 4 0.2 2 00

0 1

150 4

300 7

450 10

600 13

750 16

900 19

Transactions

m

Fig. 5.9 Reputation Assessment when Low Credibility Raters Out-Number Others

of first-hand knowledge at the end of each transaction, which dilutes the effects of dishonesty to some extent by lowering (dishonest) rater credibilities. However, the overwhelming majority of malicious raters cause M to change in each time instance, and hence the assessed reputation. This experiment instance shows the “worst-case” scenario of service raters with no collusion. The effects of collusion will be evaluated in the upcoming discussion. Note that in Figures 5.8 and 5.9 at least half of the raters are dishonest. This makes the assessed reputations deviate from the original

84

5 Robustness and Performance Study

values. However, in [143], Whitby et al. suggest that such high numbers of malicious raters in real world applications are unrealistic and a much lower rate of dishonesty should be expected. Hence, we may safely conclude that RATEWeb proves to be successful in assessing provider reputations in a fairly accurate manner.

Adjusting Rater Credibilities for Reputation Evaluation The experiments to this point show the effects of rater credibilities for different provider behaviors. In the following, we list the results of reputation evaluation using different ϒ in Equation 4.6. Particularly, we use different ρ values to compute rater credibility and consequently provider reputations. n

o Provider Performance Consistently Low 10 1

0.8 8

Reputation Value

Reputation Value

Provider Performance Consistently High 10 1

6 0.6

0.44 0.22 0

10

150 4

300 7

450 10

600 13

750 16

0.88 0.66 0.44 0.22 0

900 19

10

150 4

Transactions

0.66 0.44 0.22 10

4 150

7 300

p Reputation Value

10 450

600 13

750 16

900 19

13 600

16 750

19 900

Provider Performance Upgrades from Low to High

0.8 8 6 0.6

0.4 4 0.2 2 0

0 1

Transactions

150 4

300 450 600 7 10 13 Transactions

750 16

900 19

q

Provider Performance Oscillates

10 1

0.8 8

s tutvw

6 0.6

xzy { |{ }b~ E{ |zEy b.{ b{ { z~( y x( }0z0y.x' 0y 'y 0{ 0{ { z~( .y x }00.y.x' .y

0.44 0.22 0

10 1

Reputation Value

Reputation Value

0.88

0

450 10

Transactions

Provider Performance Degrades from High to Low

10 1

300 7

10

4 150

7 10 13 300 450 600 Transactions

16 750

19 900

r

Fig. 5.10 Original Performance and Assessed Reputation Comparison Using Low ρ Value (Case of Optimistic Consumer)

5.3 RATEWeb Evaluation and Comparison

85

5.3.4 Case of Optimistic Consumer The reputations calculated using low ρ values (optimistic consumer) in Equation 4.16 are compared with the original provider performance and the results are shown in Figure 5.10. Figure 5.10-A shows the results for a “high performing” provider, i.e., the provider performs consistently with high QoW S values. The assessed reputations are shown for two main scenarios. In the first scenario, the majority of raters have high credibility. In the second scenario, malicious raters outnumber honest raters. Since low ρ values are chosen, rater credibility suffers low decrement in case of a dishonest rating report. The first scenario results in the calculated reputations being very close to the original provider performance (shown by a dashed line) since dishonesty is minimal. However, in the second scenario, the large number of malicious raters directly affects the majority rating and hence the final assessed reputation. Therefore, the assessed reputation (shown by a dotted line) is not as close to the original performance. Similarly graphs in Figure 5.10-B through E show the comparison for other service provider groups. Since dishonest raters are not punished heavily, their credibilities do not fall as much. This causes their dishonest feedbacks in the upcoming transactions to be counted normally, and a little disparity between actual and assessed reputations is observed.

5.3.5 Case of Pessimistic Consumer The comparison between original provider performance and assessed reputation using high ρ (pessimistic consumer)values in Equation 4.16 is shown in Figure 5.11. Figure 5.11-C shows the results for a provider that performs with high QoW S values in the beginning and then its performance drops. Similar to the the previous case, the assessed reputations are shown for two main scenarios. In the first scenario, the majority of raters have high credibility. In the second scenario, malicious raters out-number honest raters. In our proposed model, any deviation from either M or A negatively effects the rater’s credibility. The results in the first scenario are not much different from the previous case and assessed reputations are very close to the original provider performance (shown by a dashed line). This is due to the manner in which credibilities are evaluated. However, the results in the second scenario using a high ρ value differs from the previous case, and the assessed reputations are relatively closer to the original provider performance. This is due to the ‘punishing’ behavior, when rater’s evaluation differs from the majority rating and the previous assessed reputation (Equation 4.8 through Equation 4.11). The assessed reputations (shown by a dotted line) mostly lie within the two-point threshold, which is an acceptable accuracy. Similarly graphs in Figure 5.11-A through E show the comparison for other service provider groups.

86

5 Robustness and Performance Study

Provider Performance Consistently Low 10 1

0.8 8

Reputation Value

Reputation Value

Provider Performance Consistently High 10 1

6 0.6

0.44 0.22 0

10

150 4

300 7

450 10

600 13

750 16

0.88 0.66 0.44 0.22 0

900 19

10

150 4

Transactions

Reputation Value

Reputation Value

0.66 0.44 0.22 4 150

7 300

10 450

900 19

13 600

16 750

19 900

6 0.6

0.4 4 0.2 2 0

0 1

150 4

300 450 600 7 10 13 Transactions

750 16

900 19

Provider Performance Oscillates

10 1

Reputation Value

750 16

0.8 8

Transactions

0.8 8

6 0.6

¡¢ £ ¤.£ ¥¦.§ ¨'£ ¤©ª'¢ «0¬£ 0£ § £ ® ¯°¦(® «.¢ ±¡z²® ¥0²0³«.¢0¡´® ©«.¢ ± µ0¶·¸ª'¢ «¬0£ £ § £ ® ¯z°¦(® «0¢ ±´¡²® ¥²0³(«.¢.¡'® ©«0¢ ±

0.44 0.22 0

600 13

Provider Performance Upgrades from Low to High

10 1

0.88

10

450 10

Transactions

Provider Performance Degrades from High 10 1 to Low

0

300 7

10

4 150

7 10 13 300 450 600 Transactions

16 750

19 900

Fig. 5.11 Original Performance and Assessed Reputation Comparison Using High ρ Value (Case of Pessimistic Consumer)

RATEWeb Comparison To further evaluate the effectiveness of the RATEWeb metrics, we compare the accuracy of RATEWeb with the conventional approach (in which rater credibilities are ignored and reputations are mere averages of all ratings), and a variant of the PeerTrust approach (a popular heuristics-based approach for P2P systems that also considers rater credibilities) [146] catered towards service-oriented needs. We model services’ malicious behaviors by experimenting under two settings, namely: “with no collusion” and “with collusion” (similar to the approach in [146]). In the setting with no collusion, malicious service providers cheat during transactions and raters provide dishonest ratings. In the collusive setting, malicious services perform similarly to the previous setting, and in addition, collude with other services to increase/decrease some provider’s reputation. We change the percentage of malicious raters (denoted Sm ) in steps of 10%, and consider a transaction as successful if post-transaction completion, the delivered QoW S is close to the computed reputation. Thus, transaction success rate (T R) is defined as the total number of successful transactions over total number of transactions in the community.

5.3 RATEWeb Evaluation and Comparison

87

5.3.6 Measuring Transaction Success Figure 5.12 shows the effects of changing Sm for RATEWeb, PeerTrust-V, and the normal case where no trust system is used, for the two settings. We can see that since the raters provide dishonest ratings all the time, (T R) drops at a consistent rate, and the two settings exhibit similar results. In the collusive setting, RATEWeb is able to withstand the dishonesty till 50% of the raters are malicious, but the success rate drops thereafter. Since RATEWeb only relies on rater testimonies, when majority of the ratings are dishonest, it becomes difficult for the system to assess the “true” reputation. Incorrect (majority) ratings are considered credible in each time instance and T R drops. PeerTrust-V however is more capable to handle collusion in cases of higher percentages of Sm . In the non-collusive setting, the case is somewhat reversed. With increasing Sm , a large number of ratings may differ from each other which causes raters that deviate from the majority to be labeled as dishonest, diluting the effects of their ratings. Still, with increasing Sm , T R is brought down. No Collusion

0.8 0.6 0.4

Normal PeerTrust-V

0.2

RATEWeb

0 0.1

0.3 0.5 0.7 Percentage of Malicious Raters

With Collusion

1 Transaction Success Rate

Transaction Success Rate

1

0.9

0.8 0.6 0.4

Normal

PeerTrust-V

0.2 0

RATEWeb

0.1

0.3 0.5 0.7 Percentage of Malicious Raters

0.9

Fig. 5.12 Transaction Success with respect to Percentage of Malicious Raters

Evidence from previous empirical studies of eBay’s reputation system (one of the most widely used reputation systems), suggests that more than 90% of the raters stay honest in the system and provide positive ratings to the providers [113, 64]. Although such a high percentage may be attributed to the nature of eBay’s business model (auction) or its reputation model (both parties rate each other), we believe it provides a rough guideline of the number of credible raters in a ratings-based reputation community. Moreover, as mentioned earlier, it is expected that high numbers of malicious raters in real world applications are unrealistic and a much lower rate of dishonesty should be expected [143]. Thus, we may conclude that in terms of effectiveness for real world applications, RATEWeb provides slightly better results than PeerTrust-V.

88

5 Robustness and Performance Study

5.3.7 Changing Rate of Maliciousness We also vary the number of feedbacks in which raters act dishonestly (denoted “Rate of Maliciousness” mtrans) to elucidate RATEWeb’s ability in handling malicious rater behavior. Varying mtrans allows us to model the behavior of those raters that attempt to “fool” the system by occasionally providing honest values. Accuracy is measured by estimating the root-mean-square error (RMSE) (denoted reputation error) of the estimated reputations and the actual provider reputations. A low RMSE value indicates better performance. Moreover, we set the percentage of malicious raters in the community (Sm )to 50%. No Collusion

0.4

Normal

0.3

RATEWeb

With Collusion

0.8 0.7 Reputation Error

Reputation Error

0.5

PeerTrust-V

0.2 0.1

Normal

0.6

PeerTrust-V

0.5

RATEWeb

0.4 0.3 0.2 0.1

0 0.1

0.2

0.3 0.4 0.5 0.6 0.7 Rate of Maliciousness

0.8

0.9

1

0 0.1

0.2

0.3

0.4 0.5 0.6 0.7 Rate of Maliciousnes

0.8

0.9

1

Fig. 5.13 Reputation Accuracy with respect to Rate of Maliciousness

Figure 5.13 shows the reputation error comparisons for the different approaches, with variable mtrans. Both PeerTrust-V and RATEWeb outperform the normal approach, with RATEWeb providing slightly better results than PeerTrust-V. In both settings, the reputation error is around its maximum when mtrans constitute half of the ratings. This indicates that when less than half of the testimonies are false, the effects of these transactions are overlooked due to the honest majority of ratings. Similarly, when mtrans exceeds 50%, rater credibility is negatively affected, and false testimonies are filtered out (since half of the raters are honest). Alternatively, when half of the rating testimonies are malicious, raters may be able to confuse the system a little bit more. However, note that in the worst case the reputation error is only 0.17. With Sm expected to be less than 50% [143, 113, 64], RATEWeb’s accuracy is deemed acceptable. We believe that RATEWeb outperforms PeerTrust-V because of the way rater credibilities are evaluated in our model [84], and the use of “personalized preferences”.

Discussion The experiments described in this section show that RATEWeb generates service reputations in a fairly consistent, accurate and effective manner. The process of reputation generation does not create any “spikes” due to the actual behavior incon-

5.3 RATEWeb Evaluation and Comparison

89

sistencies and is gradual. The gain and degradation of reputation, are both gradual processes. The proposed techniques inhibit a variety of attacks that include interaction falsification, unfair ratings for both bad mouthing and complimentary purposes, strategic rating falsification, provider behavior deterioration, incomprehensive provider evaluation, and the staleness of ratings. Since the proposed heuristics work only with the ratings submitted by different consumers and no trusted third party agents or services are employed in the reputation assessment procedures, the honesty or dishonesty of majority of the raters directly affects a provider’s reputation [23]. In the experiments, the consumer ratings were based on the actual provider performance to simulate real-world scenarios. Thus, based on the experimental results we conclude that the proposed RATEWeb metrics can be used for real world applications.

90

5 Robustness and Performance Study

5.4 Reputation Assessment with Scarce Ratings In these experiments, HMM-based reputation prediction (for ratings scarcity) is carried out using the assessed reputations data (similar to ones calculated in the previous study). The accuracy of the hidden Markov model is evaluated by observing the variance between the actual behavior and predicted reputation. We check the validity of our HMM-based reputation prediction by comparing the results with another formal prediction model (ANN) and with an ad hoc model in which no prediction is used. The type of ANN that we have used (through MATLAB) incorporates adaptive filtering which allows the network to adapt at each time step to minimize the error and predict in a relatively short time. The parameters of the HMM (from the processes in Figure 4.4) we use in the experiments are shown in Table 5.2. The generation of the reputation sequence given the model is performed using the following procedure. We start from the π vector and select the initial reputation state by generating a uniformly distributed random number (within the desired reputation range). The reputation output symbol is obtained by again generating a uniformly distributed random number and then assigning a reputation output symbol according to the probability distribution of the B matrix. The next reputation state is then obtained by again generating a uniformly distributed random number and selecting the next reputation state according to the probability distribution of the previous reputation state from the P matrix. We then go back and find the reputation output symbol according to the probability distributions of different reputation symbols using the B matrix. We continue till the required number of reputation output symbols are obtained. Table 5.2 Parameters of the Chosen HMM P

0.988 0.011 0.008 0.991

B

0.051 0.948 0.990 0.009

π

0.999 0.001

Prediction Models: Note that a hidden Markov process can predict accurate values only when the data has some intrinsic memory, i.e., it shows some pattern. We have observed that completely random data is memoryless and hence is not a good candidate for HMM-based prediction. Similarly, data that stays in one state for extended periods of time does not benefit much from HMM-based prediction since simpler statistical forecasting as time-series analysis, regression, etc. can be used to predict accurate values. An HMM is most useful for random data that shows some pattern. We have used a data set in which service reputations oscillate between high (above 5) and low (less than 5). The reputation data is random such that the service provider can shift from a high to low state and vice versa at any time. The time spent in each state is random and not uniform. However, the data exhibits memory since the service follows the pattern of staying in one state and moving to the other after

5.4 Reputation Assessment with Scarce Ratings

91

‘sometime.’ Note that the service provider does not change states at almost every time instance (as in memoryless systems).

5.4.1 HMM vs ANN Figure 5.14 shows the original reputation data generated for one service provider, and the comparison of the original data against HMM-based and ANN-based predicted reputation. The original data is represented by a continuous blue line while the predicted reputations are shown as dotted red lines. In Figure 5.14-A, the original values for 5000 iterations are shown. However, Figures 5.14-B and -C show the zoomed-in values for iterations 1200 to 1600 for explanatory purposes. We have »¿¹ Ð

ÁÃÂÅÄ ÆEÄ ÇÈÉ

º ¹ »¹

¹

º¹E¹

»¹´¹¹

»ºE¹¹

¼z¹¹E¹

¼º¹´¹

½E¹¹´¹

Ô

½º'¹¹

¾¹¹¹

¾´º¹¹

º¹¹E¹

ÁÃÂÅÄ ÆÄ Ç¿ÈÉ.ËÌ¿Í ÒÓIÓ

º ¹ »¹

»¿¼¹E¹

»¼zº¹

»½¹E¹

»½º¹

»¿¾¹´¹

»¿¾zº'¹

»¿º´¹¹

»º´ºE¹

»¿À¹¹

»¼z¹¹

»¼º¹

»½¹´¹

»¿½º¹

»¿¾z¹'¹

»¿¾zº´¹

»º'¹¹

»ºº¹

»À¹¹

º ¹ »¹ Ñ

ÁÂÊÄ ÆÄ Ç¿ÈÉ.ËÌ¿Í0ÎÏÏ

º ¹

»¼z¹¹

»¼zº¹

»½¹¹

»½º¹

»¾z¹´¹

»¾zº´¹

»¿º¹´¹

»º'º¹

»¿À´¹¹

»¼z¹E¹

»¼zºE¹

»½E¹¹

»½Eº¹

»¿¾z¹E¹

»¿¾zº'¹

»º¹'¹

»º´º¹

»¿À¹¹

»¿¹ º ¹

Fig. 5.14 Predicted Reputation Comparisons

92

5 Robustness and Performance Study

trained the HMM and ANN over 1000 iterations and then predicted reputations for 5500 future iterations. The predicted reputations shown in Figure 5.14 are not completely identical to the original ones. There is some “error” associated with each reputation prediction. However, the error is not disruptive to the prediction process due to its small size. The predicted reputation values obtained using the HMM (Figure 5.14-B) and the ANN (Figure 5.14-C) are very close to the original reputation. Therefore, we can safely conclude that the generated values are representative of the original service behavior allowing fairly accurate trust decision making. Both the prediction models predict the reputation in a fairly accurate manner. This proves that both ANN and HMM-based methods are viable. Note that the duration of the provider’s “stay” in either state (high or low reputation) is random and no two intervals are equal. Still, the HMM and ANN are able to predict the reputation fairly accurately. However, the predicted values for HMM are closer to the original values in comparison with the ANN. Therefore, HMMs get our preference over ANNs. Moreover, the strong mathematical basis of HMMs is also a plus, which the ANNs lack. Since there is some cost associated with each of the above mentioned prediction models, either of these is of help to the service consumer only if the accuracy obtained through reputation prediction is more than the reputation values calculated in an ad hoc manner. Þ Ú Ý Ù Ü Ø Û × Õ ÕÖÖÖ Þ

Õ ×'ÖÖ

ÕØ'ÖÖ

Õ Ù'ÖÖ

ÕÚÖÖ

×'ÖÖÖ

ßáàâ ãâ ä¿åæ çè´é êIëzìí´î Ú Ý

Ù Ü Ø Û × Õ ÕÖÖÖ

Õ ×'ÖÖ

ÕØ'ÖÖ

Õ Ù'ÖÖ

ÕÚÖÖ

Fig. 5.15 Reputation Evaluation without Prediction and Personal Experience

×'ÖÖÖ

5.4 Reputation Assessment with Scarce Ratings

93

5.4.2 No Prediction or Personal Experience Input To capture the effects of reputation assessment when no prediction is involved, we have performed the experiments on the same data set of original reputations. Figure 5.15, shows the result of 1000 interactions out of a total 6500 interactions. The first 1449 interactions are those in which rater feedbacks are present. The 1450th. interaction onwards, no peer feedback is available about the service provider. Since no prediction is involved here, the future reputation evaluations hover around the reputation value that was observed at the last time instance. Since service consumer’s own experience is also not factored in the reputation computation, we see an almost stationary reputation graph.

5.4.3 No Prediction but Including Personal Experience Figure 5.16 shows the effects of incorporating the service consumer’s personal experience in calculating the provider reputation, when no peer feedbacks are available and no prediction is used. Around 600 iterations are shown for this case out of a total of 6500 iterations. In Figure 5.16-A, the last set of peer feedbacks is received at the 980th. iteration, which gives an overall low value (around 3) for the provider’s reputation. Starting from the 981st. interaction, the service consumer incorporates his own personal experience into the last calculated aggregate reputation to reassess the provider’s reputation. Since, the consumer’s own testimony is weighed in highly, therefore the “general trend” of the evaluated reputation moves towards the original. However, since majority feedbacks are still centered around 3, an accurate assessment is not possible and the values are off by some degrees. Figure 5.16-B, provides a similar picture, but in this case the last feedbacks receiving iteration (960), leaves the service consumer with an aggregate reputation of 7. Subsequent reputation evaluations using the personal experience provide accurate results when the actual provider performance is high but inaccurate results when the actual provider performance is low. The reason is similar to the previous case, that since majority of the ratings are around 7, if the service consumer evaluates the provider as a low performer, the general trend of reputation evaluation moves in that direction but the high majority rating keeps the assessment inaccurate. In light of the above experiments, we conclude that using a prediction model, we can assess the reputation of a service provider fairly accurately even if no rater feedbacks are present. We have also seen that HMMs have a slight edge over ANNs in computing service reputations. In contrast, if no prediction model is used then the reputation values that are assessed are inaccurate.

94

5 Robustness and Performance Study #

øúùbû üû ý´þÿ ö÷

ôõ ó

ò ðñ

ï

$$ ï$$$

ï ï $$

ï% ð $$

$$ ï$$$

ï ï $$

ï% ð $$

ö$$ ö÷

÷

ï

&'" ñ

$$

ï ò $$

$$

ï ò $$

ôõ ó

ò ðñ

ï

ö$$

(

÷

ö ôõ

ï

ñ øúùû üû ýþzÿ

ó

ò ðñ

ö$$ ÷

ö ôõ

ï$$$

ï´ ï $$

ï ð $$

ï

ï´ ï $$

ï ð $$ ï

!"

ñ

$$

ï ò $$

$$

ï ò $$

ó

ò

$$

ðñ ö$$ ÷

$$

ï$$$

ñ

Fig. 5.16 Reputation Evaluation incorporating Personal Experience without Prediction

5.5 Composed Services Reputation Evaluation Through the experiments for listed in this section, we intend to answer three interrelated questions: (1) Does blame propagation help? (2) If blame propagation is used, what strategy to use (constant, linear, exponential)? (3) What is the impact of wrongfully blaming a component service, i.e., when the service is not at fault but gets punished anyhow? (We use the reputations assessed using techniques similar to ones described in Section 5.3).

5.5.1 Fuzzy Rules for Composition Reputation Figure 5.17 shows how the fuzzy rules are fired in our model. Only two component services are invoked by the orchestrator (denoted Component A and B respectively),

5.5 Composed Services Reputation Evaluation

95

and the graph shows how the change in Component A’s reputation will be evaluated. For these experiments, we have put a maximum on the amount of change in reputation (0.35), i.e., the component service’s reputation can be changed only in the range [0, 0.35]. As mentioned earlier, depending on the domain, the amount of blame to be transferred can be adjusted. In Figure 5.17, we can see that when Component B’s reputation is high (≥ 0.8), compared to Component A’s reputation (let’s say 0.2), Component A’s change in reputation is maximum (0.35), i.e., A takes most of the blame, while B is not penalized. However, as A’s reputation increases, its blame or change-in-reputation decreases (B’s graph is similar). Similarly, for other cases.

D1E L D1E F1K

)<3:-03=123> ; 03> )+*1,./*10/2/04/?+@ AB> D1E F 7C2-./934 :34 ; *-0 D1E J K

D1E J D1E D/K J

D1E I

-D E H D-E G )+*-,./*103210346587+21.194 :/4 ; *10

D6E F D

D

D-E F

D-E H

D-E I

J

D6E G )+*-,./*10/2/0343?!7+21.194 :/4 ; */0

Fig. 5.17 Change in Component A’s Reputation as per the Defined Rules

5.5.2 Blame Propagation Effects Figure 5.18 shows the effects of blame propagation on the services involved in a composition. We assume that each service can obtain a measure of its “true” reputation. After any dissatisfying transaction, the composition orchestrator Web service (acting as the invoker) may leave a negative feedback for the component service, thereby reducing its reputation. Since reputation is defined as an aggregation of a number of feedbacks, the effect of a single composition orchestrator may not be profound. However, for experimental purposes we assume that a negative feedback by even one rater reduces the reputation of the component service. Moreover, if more than one operation are outsourced from one service, the invoker leaves feedback for each operation. This allows the component service to know which of its operations defaulted in the invoker’s view. The reputation building process in Figure 5.18 is shown for three scenarios namely share the blame, usual, and no awareness. The process is shown for a single service in a composition of ten or more services for each scenario. The services experience a (forced) drop in their reputation in the 650th., 1350th. and 2000th.

96

5 Robustness and Performance Study

Fig. 5.18 Reputation Propagation Effects on a Composed Service’s Reputation

transaction. In Share the Blame scenario, the composed service identifies the faulty service based on the information received, and propagates the “blame” to its components. In these experiments we force the assumption that once a service becomes aware of its reputation loss due to blame propagation (reputation assessment through direct monitoring is different from this), it does not repeat its mistake and rectifies its behavior. This causes a gradual elevation in the composed service’s reputation right after it suffered a degradation. In the Usual case, the service does not identify the faulty service immediately, and thus experiences a loss in its reputation due to the faulty operation for extended periods. Upon identification, and new service selection, the composed service starts its gradual reputation ascend. In the third case, since the service is not aware of the fault, it keeps repeating its mistake and hence keeps losing its reputation. When the reputation reaches a very low value, the service is not selected again, and its (low) reputation remains constant. We can clearly see from Figure 5.18 that our proposed technique produces favorable results in terms of the composition’s reputation.

Fig. 5.19 Strategy Choosing Effects on when Provider Acts Immediately

97

Fig. 5.20 Strategy Choosing Effects on when Provider Acts on a Moderate Number of Complaints

5.5.3 Choosing a Blame Forwarding Strategy The next set of experiments targets question (2) stated above. We focus on a single component service and assume that other services in the composition are trustworthy that maintain a consistent reputation. The experiments are conducted for 20 transactions and one time iteration encompasses one transaction. Previous works in organizational behavior economics classify the behavior of service providers that receive complaints from consumers into four classes [55], providers that: (i) immediately change their negative behavior, (ii) act upon receiving a moderate number of complaints, (iii) act upon receiving a high number of complaints, and (iv) ignore the complaints and do not change their behavior. We also study the behavior of the component service provider according to these four classes to see how each blaming strategy works. It is assumed that once the service provider rectifies its behavior, its reputation increases in a consistent manner. In each experiment shown, the composition starts with the highest reputation (1.0), and performs honestly for the first 4 transactions. Around this time, the service provider’s performance drops below 0.9 (this is enforced), which prompts the orchestrator to adjust the component services’ reputations. The bar-graphs show the composition’s reputation along three axes, where the number of transactions are represented along the x-axis, the number of complaints along the y-axis, and the reputation value along the z-axis. Figure 5.19 shows the behavior of the provider that rectifies its behavior as soon as it receives a complaint from the invoker. Since the provider changes its behavior immediately, the three blame forwarding strategies work in similar manner, and the composition is able to “bounce back” to its original reputation state quickly. In contrast, the provider that does not change its behavior at all, would cause the

98

5 Robustness and Performance Study

reputation of the composition to decrease consistently. The rate of drop in the composition’s reputation depends on the chosen strategy. Figure 5.20 shows the change in the composition’s reputation when providers change their behavior on a moderate number of complaints (in the experiments this is chosen to be 24 complaints). We can see that under the exponential strategy the composition’s reputation drops drastically and till the provider acts upon reaching a moderate number of complaints, the composition’s reputation drops to about 0.2. The linear and constant strategies work a little better in terms of reputation degradation, and the reputation only drops to about 0.6. However, note that in our experiments, the provider is chosen after each transaction (even when its reputation is decreasing), which may not be applicable in real world scenarios. Thus, in critical applications, an exponential strategy may provide the best results for the composition invoker, as it decreases the reputation of the provider quickly, meaning that the chances of this provider being selected become low.

Fig. 5.21 Strategy Choosing Effects on when Provider Acts on a High Number of Complaints

Figure 5.21 shows the case for providers that change their behavior on receiving a high number of complaints (in the experiments this is chosen to be 45 complaints). Similar to the previous case, under the exponential strategy the composition’s reputation drops drastically till the required number of complaints are reached. Only then does the provider rectify its behavior and the reputation building process starts. Similar results are shown for the constant and linear strategies. Although the exponential strategy works the “quickest” to identify and rectify the composition’s reputation through blame forwarding, there may be disadvantages to this strategy. For instance, if a component service is wrongfully accused or forwarded the blame, that service may decide not to participate in the composition. This implies that the com-

5.5 Composed Services Reputation Evaluation

99

position’s reputation may not recover easily as the blameworthy service remains part of the composition, while an honest service is taken out (to which another trustworthy component service with equal reputation may not be found), thereby reducing the composition’s reputation. Therefore, a composition orchestrator/service invoker should decide which strategy works best for its application.

5.5.4 Impact of Orchestrator’s Decision

Reputation

As mentioned earlier, the reputation propagation results are dependent on the composition orchestrator’s identification of the faulty operation. It may happen that the operation that the orchestrator identifies, may not be the “actual cause” for the degradation of the composition’s reputation. Figure 5.22 shows the impact of a correct or an incorrect decision on part of the composition orchestrator, on the reputation of the composition. Here also, the composition experiences a drop in reputation at the 650th., 1350th. and 2000th. transaction (enforced). When the faulty operation is identified correctly, the composition reevaluates itself (through propagation) quickly and the reputation is maintained/increased. Alternatively, the composition takes some time to recover from a wrong decision. The reputation keeps declining in face of the wrong decision for sometime, and only when the correct fault is detected, does the reputation starts increasing. Although the proposed technique provides a reasonable solution, the decision of the composition orchestrator does play an important role in propagating and hence defining the correct reputation.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Composition Orchestrator's Decision Impact Right Decision Wrong Decision

Q

1

RS3S 5

QS3S3S 9

QR3S3S

13

T3S3S3S 17

T/R3S3S

21

Transactions

Fig. 5.22 Impact on the Composition’s Reputation because of Right/Wrong Decision

100

5 Robustness and Performance Study

5.5.5 Blame Forwarding Impact in Relation to Maliciousness The overall impact of the proposed techniques for blame propagation is shown in Figure 5.23. In these experiments we vary the number of malicious (low QoWS provider) component services in the composition. The number of total component services is constant, and we only change their provided QoWS to distinguish between a “good” component and malicious one. We compute the Reputation Error (shown on y-axis) as the difference between the expected QoWS due to the component’s reputation and the actual QoWS delivered. The plot in Figure 5.23 is shown for three cases. The Normal graph represents the case where no reputation information is used by the composition. Thus, with every new transaction the same malicious services are chosen (as no reputation distinction is made). The end result is that with the number of malicious components increasing in the community, the reputation error also increases. The second plot represents the case where reputation information is used only to select services, and no blame propagation (through reputation adjustment of component services) is employed. The results are similar to the previous case, with the exception that the incorporation of reputation information reduces the error as low performing components may not be selected after a while. The third plot represents the case where our proposed techniques for blame sharing are used. Clearly, this provides the best overall results as malicious components are quickly identified by the respective orchestrators and not selected again in the composition. Therefore, we conclude that the proposed blame sharing approach through reputation propagation proves fairly useful in identifying and weeding out malicious component services.

1

Reputation Error

0.8

Share the Blame Impact Normal Blame Sharing No Blame Sharing

0.6 0.4 0.2 0 0.1 0.3 0.5 0.7 0.9 Percentage of Malicious Component Services

Fig. 5.23 Impact on the Composition’s Reputation in Relation to Number of Malicious Components

5.6 Reputation Assessment Cost Analysis

101

5.6 Reputation Assessment Cost Analysis The objective of the experiments described in this section is to understand the runtime overhead of deploying RATEWeb, to see how well it scales. Runtime overhead mainly involves the cost of retrieving required information (in form of feedbacks) and the time it takes to assimilate all the gathered information. Thus, the total cost is directly influenced by the reputation collection model used. For experimental purposes we define three such models for service-oriented environments and compare their costs. Note that each collection model merits its own extensive discussion, which is not the objective of this book. We are merely using these models to analyze the reputation computation costs. In the following, first a brief overview of each collection model is presented. This is followed by a series of cost analysis experiments.

Fig. 5.24 Publish-Subscribe Collection Model

Publish-Subscribe Model In this model, consumers have the ability to publish the list of providers they have interacted with (and are willing to share their interaction experience) in a repository/registry. This allows other consumers to look for raters in regard to a specific service provider. It is assumed that service registries and consumers will have operations defined that facilitate the processes of updating interaction lists and ratings discovery respectively. For instance, similar to previous works that add QoS information in the UDDI along with service descriptions [122, 111, 73, 75, 93, 49],

102

5 Robustness and Performance Study

we could also add the IDs of providers, the consumer is willing to share information (i.e., provide feedback) about. Figure 5.24 shows the step-wise details of the process. In the first step a consumer interacts with a provider. It then updates its interaction list to include the provider it just interacted with (and holds interaction QoW Ss). When another consumer looks for prospective providers (step 3), it can look for other consumers (raters) that have interacted with those providers (step 4). Since the actual QoW S values reside with the rater, the consumer interacts with the rater (step 5) to retrieve the desired rating.

Fig. 5.25 Community Broadcast Collection Model

Community Broadcast Model In the community broadcast model, we use ontologies to define ratings-communities. Any service consumer that intends to publish or obtain reputation ratings is required to register with a community. This is done on voluntary basis and only registered consumers can obtain and share ratings. At the completion of a service request (consumer-provider interaction), the consumer disseminates interaction QoW S values (i.e. rating) in the community. We use a broadcast-based approach in which each registered consumer receives the ratings. Figure 5.25 shows the step-wise details of the process. In the first step consumers register with the community. After a consumer interacts with the provider (step 2), it broadcasts the interaction ratings in the whole community (step 3). Other service consumers that discover a list of prospective providers through the service registry (step 4), can use the ratings they received in step 3 to assess the provider reputations accordingly (step 5). Note that a major

5.6 Reputation Assessment Cost Analysis

103

drawback of this model is possibility of “useless” traffic in the network (as services that do not need a provider’s reputation get it anyhow).

æWÅ-ç ÈBÖ Ì6Í Õ1èBÖ

â+á Û6ãdä ådàdá Ý

é6Å/êCÌ1ÕdÍ è3ëÎ ÇdÍdìCÍ Ç/í-Ñ Ð1ÌdÍ É

ñCàdá ãWä ò1àóCà1ô+ä Ý3õ á ä à1Ý

î%Õ6ÅBïÉ/ðÆÍ Ì/Ð6Ñ Ò1Ó ÌÔ+Õ/Ö Ì6Í Î Ç-Ídì+Í Ç/í/Ñ Ð1Ì6Í6ÔÌ-Ù1ÊÖ Õ3Ö Ñ ÇdÈ

ö-Å1÷Ç6Ó Ð&Ô+Ì6Ù-Ê Ö Õ3Ö Ñ Ç6È ç ÈÎ Ç6Í ËÕ/Ö Ñ Ç6È

î-ÅBïÉ1ðÎ Ç-ÍWÔ+Ì6Ù-ÊBÖ Õ3Ö Ñ Ç6È ÚÛ+Ü1ÝdÞWßàdá Ý

î&Ò1Å1ÔÌ1ø6Ê/Ì1É ÖWù6Ç6Í úÕ1Í Ð

Ä/Å/ÆÇ6È/É3Ê1ËÌdÍ ÉÎ ÇdÍ Ë Ï ÆÍ Ì1Ð6Ñ Ò/Ó ÌÔÕ3Ö Ì6Í ×dØÍ Ç6Ê-Ù3É

îè1Å1Ô+Ì-ø1Ê/Ì1ÉÖ]ù-Ç-Í úÕ6Í Ð

Fig. 5.26 Credibility-Based Collection Model

Credibility-Based Model In the credibility-based model, service consumers form rating “cliques” or groups. A service consumer maintains a set of credible raters and requests ratings for a given provider only from the set of credible raters. It may happen that the credible raters do not have the required ratings. In this case, the ratings request is forwarded to the set of credibles for each credible service that does not store the required ratings. The requesting consumer can place a limit on the number of hops before a ratings query is exhausted. Figure 5.26 shows the step-wise details of the process. In the first step, consumers form “credible rater” groups (bootstrapping may be experience-based or through random assignment). After a consumer interacts with a provider (step 2), it does not disseminate the QoW S information, but hold it for others to query it (step 3). Other service consumers that discover a list of prospective providers through the service registry (step 4), ask their “credible raters” if they have ratings for the providers (step 5a). If a rater in the group has the rating, it is communicated (step 5), otherwise the ratings query is forwarded by the credible raters to their list of credible groups (steps 5b and c). This is similar to existing P2P architectures with the added constraint that only trustworthy raters are consulted. The major benefit of this approach is that the probability of only getting trustworthy ratings is high.

104

5 Robustness and Performance Study

However, a drawback is that ratings may become scarce as it is not highly likely that services in the credible set would have interacted with the provider in question. We have created a service environment in which the reputation of a single service provider is to be assessed using RATEWeb, from a single consumer’s point of view.

5.6.1 Model Performance Comparison In the first experiment, we vary the number of service raters from 50 to 1500 and observe the performance of each collection model described above. The service consumer gathers all the ratings and assesses the provider’s reputation. We measure the total time (in seconds) RATEWeb takes to assess the reputation of the provider. This includes (i) the rating lookup cost, and (ii) reputation assessment cost. For the former, we generate simple Web services that provide only the rating for the provider once invoked. The ratings are generated in a random manner (since RATEWeb accuracy is not the objective here). For generating the large number of Web services, we use the in-house developed Web Service Benchmark Framework (WSBF) [97]. Table 5.3 Performance Evaluation Parameters Parameter Number of Raters Rater Response Time Registry Response Time Reputation Assessment Time for 50 ratings Broadcast Time for 50 raters Publish-Subscribe requests / sec Depth of Credibility Groups Number of Communities

Default Value 50 75ms 50ms 86ms 0.5s 20 4 1

We have used the default parameter values as listed in Table 5.3. For the PublishSubscribe model, we assume that once a consumer gets the list of raters from the registry, it can retrieve the ratings from 20 raters in one second. The rater response time to send the ratings is set to 75ms (when no congestions are present). It is therefore not important in which order the ratings arrive (since all have same time). The rest of the time is spent in processing the ratings (i.e. majority rating calculation, credibility updates, reputation assessment). For instance, in Table 5.3 the assessment time for 50 ratings is 86ms. In the Credibility-Based model we assume that only four “hops” are required among the credibility groups and that all raters are part of one of these groups. In Table 5.3, we define this maximum number of credibility groups that can be queried (e.g. if the consumer’s credible raters do not possess the rating, it is forwarded to the second group) as the depth of the group. For the Community-Broadcast model we assume that all services belong to a single community (upcoming experiments change this simplifying assumption), and that it takes 0.5 seconds for the ratings of 50 raters to be disseminated in the community (i.e. reaching all 50 participants).

5.6 Reputation Assessment Cost Analysis Reputation Computation Time

18

Publish-Subscribe Credibility-Based Community-Broadcast

15 12

Time (sec)

105

9 6 3 0

50 1

300 6

550 11

700 16

950 21

1500

Number of Raters

Fig. 5.27 RATEWeb Performance Analysis

Figure 5.27 shows the comparison between the three models. We can see that the Community-Broadcast model exhibits the worst performance. This is due to the fact that as the number of raters increases in the community, the overhead of transmitting the ratings to all participants also increases. The graph for the Credibility-Based model shows variations along the upward trend. We note that in this model, since all raters are not honest, all ratings are not required by the consumer. Therefore, we generate credibility groups for the raters which have the maximum size of one-third of the total number of raters. Once the consumer receives the required number of ratings, it can start the process of aggregation. The variations in the Credibility-Based model are due to the random generation of credible groups which cause the required ratings to be found sometimes at depth 1 or increase to 2, 3 and 4 respectively. The Publish-Subscribe model shows the best performance. Since we do not consider registry or rater bottle-neck effects (retrieving rater identities and ratings respectively), only the time required to access the ratings and aggregating them is taken. Note that the results are for a consumer’s ability to retrieve 20 ratings simultaneously, and the performance can be directly effected if this number is changed.

5.6.2 Performance Analysis Figure 5.28 shows three experiments in which we vary different parameters to see how the reputation assessment time is effected. In the first experiment, we vary the number of consumers (in the Publish-Subscribe Model) that are trying to retrieve the ratings for the provider at the same time (Figure 5.28-A). We simulate simultaneous access by multiplying the number of consumers with the average time for one consumer and adding it to the total time. In real situations this number may be a little low, depending on the multi-processing abilities of the rater. However, for experimental purposes, we believe that a simple summation of all retrieval times should be acceptable. We can see that the “bottle-neck” effects experienced by the consumer, both at the registry (to obtain rater lists) and at the rater (ratings retrieval) increase the total time experienced by the consumer. In the second experiment, we vary the number of communities, and number of members per community for the

106

5 Robustness and Performance Study

Community Broadcast Model (Figure 5.28-B). Here we assume that the consumer is registered with all available communities (2, 5, 30 etc), and it starts reputation aggregation only after the ratings are disseminated among all community participants. We can see that it takes about 19 seconds for ratings to be disseminated among 1500 participants for a rater that is registered with 30 communities. Although this is a large number, we believe that in real-life situations such a scenario may not exist, and services may register with less than 10 communities at a time (and total computation time for this case is around 6 seconds). In the third experiment, we vary the depth of credible rater groups, by ensuring that the required number of ratings (one-third for this case) are retrieved only after the defined depth is queried. We can see that a consumer needs to wait a little more than under this model, due to the limited availability of required ratings.

Fig. 5.28 RATEWeb Performance With Changing Default Values

In the experiments above, we have used very basic versions for each dissemination protocol. Although the performance times are acceptable, we believe that with protocol refinements these times can be improved. We have seen that PublishSubscribe Model provides best times followed by the Credibility-Based Model. The number of messages exchanged and hence the network load in these models is low as compared to the Community-Based model. However, when the number of raters is low, we can see that the three model times are very comparable. The choice of the model (in regards to computation time and reputation accuracy) will thus depend on the domain, and type of interactions for which the services are deployed.

Chapter 6

Related Work

Reputation management involves several components, including modeling, data collection, data storage, communication, assessment, and reputation safeguards. Over the years, several research initiatives have worked on these problems. These efforts have not been limited to a single field. Varied disciplines including economics, computer science, marketing, politics, sociology, and psychology have studied reputation in several contexts [34]. In the recent past, these research activities have gained momentum. In computer science, reputation has been studied both in theoretical areas and practical applications. Theoretical areas where reputation has been studied include game theory, bayesian networks [140], and social networks [23]. Theoretical literature that addressed reputation focused on proving properties of systems based on reputation. For example, results from game theory demonstrate that there are inherent limitations to the effectiveness of reputation systems when participants are allowed to start over with new names [112]. In [58], the authors study the dynamics of reputation, i.e., growth, decay, oscillation, and equilibria. Practical literature on reputation is mainly concerned with the applications of reputations. Major applications where reputation has been effectively used include e-business, peer-to-peer (P2P) networks, grid computing systems [8], multi-agent systems [116], Web search engines, and ad-hoc network routing [20, 76]. Work on some of these applications is discussed in this chapter.

6.1 Reputation Systems for E-business Experimental studies that confirm the effect of reputation systems on the growth of e-business. In fact, some studies conclude that many of online businesses would probably not have survived without reputation systems [34]. These studies focused, in particular, on studying how reputation systems affect online auctions, also known as C2C e-auctions. Many of these studies have investigated and, generally, confirmed the positive impact of reputation systems on online shoppers’ decisions to engage in business transactions. This, for example, was the result of a study on Z. Malik and A. Bouguettaya, Trust Management for Service-Oriented Environments, DOI 10.1007/978-1-4419-0310-5_6, © Springer Science + Business Media, LLC 2009

107

108

6 Related Work

Ebay’s reputation system [77]. More interestingly, reputation systems proved to benefit both sellers and buyers [69]. This was proved through experimental studies that analyzed the relationship between a seller’s reputation, auction closing prices, and the probability that an auction will receive at least one bid [35]. Traditional quality assurance mechanisms fall short of establishing trust in online trading communities [35]. Therefore, reputation has long been seen as a valid alternative to establish trust between the different actors in e-business settings. The success of the online auction site eBay has been attributed to its reputation system. It uses a reputation-based trust scheme where, after each transaction, buyers and sellers rate each other using the Feedback Forum. Reputation profiles are designed to predict future performance and help users decide whom to transact with [113]. It intends to remove the inherent risk associated with online interactions involving unknown buyers and sellers. Most of the reputation systems that have been proposed for e-business scenarios follow eBay’s feedback-based model. However, there are situations where a “feedback only” model may not produce correct results. In the following, we provide a detailed discussion on alternative solutions that attempt to alleviate the problems caused by using such solutions.

6.1.1 Incentives-based Solutions Feedback-based systems suffer their own problems. Imbalance in the feedback reporting occurs when negative feedbacks significantly outnumber positive feedbacks or vice-versa. It may result from factors other than deception. A case where negative feedbacks may outnumber positive feedbacks is P2P environments where a positive feedback from a peer towards another peer may decrease its own reputation. Obviously, in this case peers will have little or no incentive in providing positive feedbacks about other peers. An example of research that addressed this issue is [65]. The authors present a decentralized reputation mechanism that is incentivecompatible, i.e., in which a peer does not decrease its own reputation by reporting positive ratings of other peers. Most reputation systems tend to suffer in situations where positive feedbacks disproportionately outnumber negative feedbacks. Typically, these systems rely on voluntarily provided information. This creates strong incentives to free ride. In [113], a solution is proposed where individuals are paid for providing feedbacks that turn out to be accurate. Such markets for evaluations would rely on micro-payments from potential buyers to past buyers. In auction markets, current potential buyers could check sellers’ reputation, but only at a cost.

6.1.2 Credibility-based Solutions Most of the reputation research in e-business assumes the feedback is always given honestly and with no bias and paid little attention to handle the situation where

6.1 Reputation Systems for E-business

109

peers may conspire to provide false ratings. A few proposals attempted to address the issue of quality of the feedbacks. The proposal for computing and using reputation for Internet ratings by Chen et al. [27] differentiates the ratings by computing a reputation for each rater based on the quality and quantity of the ratings it gives. However, the method is based on the assumption that the ratings are of good quality if they are consistent to the majority opinions of the rating. Adversaries who submit fake or misleading feedbacks can still gain a good reputation as a rater in their method simply by submitting a large number of feedbacks and becoming the majority opinion. Dellarocas [33] proposed mechanisms to combat two types of cheating behavior when submitting feedbacks. The basic idea is to detect and filter out exceptions in certain scenarios using cluster-filtering techniques. The technique can be applied into feedback-based reputation systems to filter out the suspicious ratings before the aggregation. In comparison, our trust model is more general. We use the credibility of the feedback source as one of the basic trust parameters when evaluating the trustworthiness of peers. Other examples of research include [151] and [72]. In [151], the authors address the problem of deception in testimony propagation and aggregation. The objective is to protect against spurious ratings generated by malicious agents. In [72], the authors study the effectiveness of “shill attacks” on recommendation-based reputation systems. These attacks are usually carried out either to promote the attackers’ products and services, or to lower the popularity of competitors’ products and services. In [39], the authors investigate whether source credibility theory can support reputation mechanisms in B2B electronic commerce. In contrast with B2C electronic marketplaces, the raters in B2B communities are skilled and connected, necessitating a reputation mechanism to account for the relationship between the user and the rater. The authors presents TrustBuilder, a prototype rating tool that incorporates a methodology to calculate a weighted rating based on source credibility theory. In the proposed methodology, the weights of a rater’s ratings depend on user preferences instead of rater behavior, which decreases the amount of data required to calibrate the model. To experiment their system, the authors asked industry practitioners to evaluate bids from service providers using a credibility-weighted tool as well as a standard unweighted tool. The experiment showed that the use of a credibilityweighted tool led to increased user confidence as well as more varied evaluations. Examples of research addressing the accuracy issue include [145] and [102]. In [145], the authors consider reputation in the context of peer-to-peer e-commerce communities. They show that models based solely on feedback from other peers in the community is inaccurate and ineffective. Their solution extends this model into one where two additional criteria are used to determine the reputation of peers, namely, the total number of transactions a peer performs with other peers and the credibility of the feedback source. In [102], the authors propose the idea of exponential smoothing, a reputation assessment approach that provides a sustained incentive for sellers to behave honestly over time. The authors note that existing feedback systems, e.g., eBay’s accumulative score mechanism and Amazon.com’s average score mechanism, do not achieve this goal. They propose a design for a reputation system based on the idea of exponential smoothing. In [155], the authors present two collab-

110

6 Related Work

orative reputation mechanisms for online communities: Sporas and Histos. Sporas is a mechanism for loosely connected communities where a global reputation value is generated for each member. In contrast, Histos provides a personalized notion of reputation. It evaluates the reputation of a member based on the identity of the member requesting the reputation information.

6.2 Decentralized Reputation Management Most of the traditional online communities are centralized systems where a single authority is responsible for mediating the transactions between different participants. The technological advancements are bringing a shift in this context and online interactions are no longer limited to centralized environments. Such decentralized systems where any node may interact directly with any other node to achieve a given goal are swiftly gaining popularity. In this section, we discuss the role of reputation in context of decentralized systems.

6.2.1 Peer-to-Peer Reputation Systems A Peer-to-Peer system is a decentralized network where a peer may request a resource from another peer that provides the requested resource. The basic use of reputation in P2P networks is to identify reputable nodes so that non-reputable nodes may be prevented from affecting the system. The basic idea is to deploy reputation systems that provide reliable reputation information about peers. A peer can then use this information in decision making, e.g., who to download a file from [54, 30, 66]. Examples of applications based on this concept include P2P anonymity systems ( e.g., anonymous remailers [37]) and P2P resource sharing networks. In [1], the authors propose a decentralized reputation system for P2P networks. Their system assumes most network peers are honest, and reputations in the system are expressed as complaints. Though the method works well, it is not at all robust to dynamic peer personalities. In [30], the authors propose an approach to security in P2P information sharing environments where peers keep track of other peers’ reputation and may share their reputation information with other peers. A distributed polling algorithm is proposed for reputation sharing. The algorithm has two phases. In the polling phase, a peer p polls its peers to inquire about the reputation of selected providers that offer the requested content. This is achieved by broadcasting a Poll message requesting each peer’s opinion about the providers. All peers may answer p’s request. The poller p then uses the opinions received from voters to make its decision about where the source to use to download the requested content. In [90], the authors study the advantages and disadvantages of resource selection techniques based on peer reputation. They analyze the effect of limited reputation information sharing on the efficiency and load distribution of a P2P system. A no-

6.2 Decentralized Reputation Management

111

ticeable result of the study is that limited reputation sharing can reduce the number of failed transactions by a factor of 20. In [146], the authors present PeerTrust, a system that uses reputation to estimate the trustworthiness of peers. To decouple feedback trust from service trust, peers use a personalized similarity measure to more heavily weigh opinions of peers who have provided similar ratings for a common set of past partners. In a large P2P system, however, finding a statistically significant set of such past partners is likely to be difficult. As a consequence, peers will often have to make choices among a set of candidates for which there is no information. In [140], the authors propose a Bayesian network-based trust model and a method for building reputation based on recommendations in P2P networks. The evaluation of the model using a simulation of a P2P file sharing system shows that the system where peers communicate their experiences (recommendations) outperforms the system where peers do not share recommendations. In [54], the authors propose a reputation system for decentralized unstructured P2P content sharing networks like Gnutella. The purpose is to help well-reputed peers make decisions about who to serve content to and who to request content from. In [89], the authors compare two identity infrastructures for P2P resourcesharing environments. The first uses a centralized login server that ties nodes’ network pseudo-identities to their real-world identities. In the second approach, each node generates its own identity. The authors show that the first approach provides better support for reputation management by preventing nodes from changing identities. They also show that the second approach provides a higher level of anonymity. In [66], the authors present a system, called EigenTrust, that computes and publishes a global reputation rating for each node in a network using an algorithm similar to Google’s PageRank [105]. Each peer is associated with a global trust value that reflects the experiences of all other peers in the network with the target peer. Peers use these trust values to choose who they download from, as a consequence, the community. identifies and isolates malicious peers from the network. The limitation of EigenTrust is that it assumes the existence of pre-trusted peers in the network. In [37], the authors report on their experience in designing reputation mechanisms for anonymous remailing and anonymous publishing systems. In anonymous publishing, for example, a set of servers form a P2P network where they may anonymously publish documents. The rule is that any server that publishes must also provide some disk space to store other servers’ material for a certain time. The authors use a reputation system that prevents servers from cheating by dropping data early. PowerTrust [159] is a ”distributed version” of EigenTrust. It states that the relationship between users and feedbacks on eBay follow a Power-law distribution. It exploits the observation that most feedback comes from few ”power” nodes to construct a robust and scalable trust modeling scheme. In PowerTrust, nodes rate each interaction and compute local trust values. These values are then aggregated to evaluate global trust through random walks in the system. Once power nodes are identified, these are used in a subsequent look-ahead random walk that is based on Markov chain to update the global trust values. Power nodes are used to assess the reputation of providers in a ”system-wide absolute” manner. This is in contrast with our approach where each consumer maintains control over the aggregation of ratings

112

6 Related Work

to define a provider’s reputation. Moreover, PowerTrust requires a structured overlay (for DHT), and the algorithms are dependent on this architecture. In contrast, service-oriented environments or the Web in general do not exhibit such structure. PRIDE [36] is a P2P reputation framework that uses an elicitation-storage protocol for exchange of recommendations. The peers maintain a certificate authority which is responsible for the identity certificate of the peer. IP-based safeguard (IBS) is used to counter possible dishonesty since self-certification can lead to a peer generating a large number of identities for malicious reasons. Simple arithmetic average of recommendations received by a service provider is proposed to assess peer reputation. As mentioned earlier, such an approach based solely on the sum of negative and positive ratings alone is vulnerable to unfair rating attacks, and hence is inappropriate. The XRep system proposed in [31] uses a combination of peer-based reputations and resource-based reputations to evaluate a peer’s honesty. In this proposed scheme, storage overheads are substantially high while incorporating resourcebased reputations as the number of resources is significantly more than the number of peers. Moreover, the experiments consider a Zipf (non-uniform) distribution of resources and peers. However, it is not practical to consider a single resource to be widespread enough to have a sufficient number of voters in the system. Similar to our approach, XRep uses cluster computing to weigh feedbacks and detect malicious parties. However, no formalized trust metric is discussed. The P-Grid approach proposed in [1] assumes that most peers in the network are honest. The reputations in the system based solely on complaints. Though the method works well in the proposed scheme, it is not robust. For instance, security concerns can arise if a peer ends up storing its own information. This is stated to be a rare case and redundancy is proposed to be employed to ensure data integrity. Moreover, since trust is only represented in binary values (1 and 1), the robustness is questionable. Either an agent will be completely trustworthy or untrustworthy. In our proposed system, the varying degrees of reputation solve this problem. REGRET [116] is a reputation system that adopts a sociological approach for computing reputation in multi-agent societies in an e-commerce environment. Similar to our approach where the nature of the community effects the service’s reputation, REGRET employs both individual and social components of social evaluations where the social dimension refers to reputation inherited by individuals from the groups they belong to. However, the proposed scheme requires a minimum number of interactions to make correct evaluations of reputation. It is likely that partners will not interact the minimum number of times to provide a reliable reputation value. Moreover, the problem of malicious raters is not studied. In [65], a system of Reputation-agents (R-agents) that buy and sell reputation information on prospective partners is proposed. System participants choose R-agents on the basis of their personal experiences with the success rate of a particular Ragent and truthfulness is computed as statistical similarity. In other words, if the rating is the same as the last report about that agent, then it is deemed honest. Therefore, if a significant proportion of the agents are lying, then the definition of truth is

6.2 Decentralized Reputation Management

113

inverted. This is similar to our approach. However, unlike our approach no checks are placed to counter maliciousness. Zacharia and Maes in [155] present SPORAS, which is a centralized reputation system that establishes reputation for users in an on-line community, based on the aggregation of rates given by users after each transaction. Reputation in SPORAS aims to predict future performance of the users. In order to make accurate predictions using a small computational space, a recursive and adaptive algorithm for updating reputation is used. Reputation is calculated continuously using the previous value of reputation; and the previous value of reputation is reinforced or weakened depending on the rates obtained. This aggregation method then allows newer rates to count more than older ones. Because SPORAS is a centralized reputation system, it is not viable for VOs where partners need personalized reputation values calculated from assembled rates of those they trust already rather than those they do not know. Although the assumption made in SPORAS to make reputation values dependent on the reputation of the entity who is providing a feedback is correct, it mixes two different dimensions of reputation. While a user can be reputed as completely unable to cheat on deals, nonetheless that same user may be a bad evaluator of other users. That is, being an excellent service provider does not mean being an honest evaluator.

6.2.2 Reputation Management on the Grid A grid is an infrastructure to access and use heterogeneous resources across domains and organizations. In [1], the authors address the issue of scalability in reputationbased trust management in decentralized systems. They propose to use a scalable access method, called P-Grid, to store trust information that peers have about their past interactions with each other. Their system insists that honesty is what most of the network peers follow, and system reputations are expressed and judged by received complaints from fellow peers. Despite its applicability, there are a few flaws in the system. For instance, it is not physically strong for dynamic peer personalities. Several other researchers have considered the use of reputation to improve the selection of resources in grids. In [5], the author presents an architecture and an algorithm to establish trust in different entities on a grid (services, resources, and users). The architecture is based on a hierarchical reputation model that has three levels: entity trust, institution trust, and virtual organization trust. Trust plays a distinct role in grid systems. It has an impact on several aspects including security, privacy, and performance. For example, in [10, 9], the authors present a model to integrate trust in resource management for grid systems. To illustrate, they show how the computing performance in a grid system can improve using a resource management algorithm that incorporates trust. In the proposed approach, trust is used to avoid overhead incurred by traditional security mechanisms. The approach assesses trust through reputation. The grid system is divided into grid domains. Each grid domain is managed by a single administrative authority and corresponds to a group

114

6 Related Work

of resources a group of clients. Reputation is associated with domains rather than individual entities. At time t, the trust relationship between a domain D i and a domain D j is computed based on the direct relationship at time t between Di and D j , as well as the reputation of D j at time t.

6.2.3 Reputation Management for Multi-Agents Systems A multi-agent system is a form of a decentralized system where many intelligent, autonomous agents interact with each other. The agents may cooperate to achieve a common goal or be engaged in selfish interactions [59]. The effects of reputation have also been studied in the context of multi-agent systems. As mentioned earlier, most reputation systems rely on the feedbacks of the participants. There have been some research works in the field of multi-agent systems that do not rely on individual feedbacks. An example is the method proposed in [109]. The method uses the “position” of an agent within the community’s social network to determine its reputation. In [92], the authors present a reputation model where agents disseminate reputations and endorsements for Web services through a specialized agency. Another approach that has been studied is that of introducing incentive compatible reputation mechanisms. In [65], a method is proposed that provides a rationale for agents to share reputation information truthfully. The key idea is that each agent a may buy reputation information about another agent b from a special agent, noted R-agent, that manages agents’ reputations. Reputation information is considered an item with a value. An agent that buys a reputation information at a cost c 1 from R-agent may also sell a reputation information to R-agent at a price c 2 . In [95, 94, 93], the authors describe an agent-based system where agents act as proxies to collect information on and build a reputation of a Web service, and then disburse endorsements of the service to create a reputation for a Web service. The authors present an approach that provides a conceptual model for reputation that captures the semantics of attributes. The semantics includes characteristics, which describe how a given attribute contributes to the overall rating of a service provider and how its contribution decays over time. The approach thus applies both to reputations and to explicit endorsements of a service provider by another party. Reputation of agents in a marketplace setting has also been studied in the literature. In [154], the authors study the effect of agent reputation on item pricing. In the model, agents do much of the bidding and negotiating on behalf of their users. Users’ reputations are established through collaborative mechanisms. The distinguishing feature of the proposed scheme is the use of dynamic pricing algorithms. These algorithms aim to get an accurate price during the bootstrapping phase where all users start with minimal reputations. The mechanisms that are introduced provide a successful solution for reaching stable market equilibrium conditions, while reputation followers achieve this more effectively.

6.3 Reputation in Game Theory

115

6.3 Reputation in Game Theory Game theory is an established field in various disciplines including economics, social sciences, and mathematics. It is based on games of strategy. Various gametheoretic approaches have been used in studying behavior analysis. Reputation has also been studied in the context of game theory. Most of the work in this regard has focused on repeated games in which some players are uncertain about the payoff structures of their opponents [2]. The effects of reputation have been studied in modeling different types of games. Examples include [48, 41, 40]. The issue studied in [48] is maintaining a reputation when strategies are imperfectly observed. When a long-run player’s stage-game strategy is perfectly observed by his opponents, his investments (in his reputation) have a direct and predictable effect on the evaluation of his reputation. The authors study the effects of reputation in games with a single long-run player whose choice of stage-game strategy is imperfectly observed by his opponents. In [41], the authors construct a game where “reputation is bad”, i.e., the desire to avoid bad reputation eliminates all possibilities for profitable interactions between a long-lived agent and a sequence of short-lived principals. In [40], this is generalized to a class of games where the “bad reputation” property holds. The key properties are that participation is optional for the short-run players, and that every action of the long-run player that makes the short-run players want to participate has a chance of being interpreted as a signal that the long-run player is ”bad.” The effects of applying reputation in using game theoretic approaches has also been studied. It focuses on applying game theoretical modeling to understand and analyze many economic, social, political, and psychological phenomena. For example, in [11], the authors study the relationship between inflation and reputation in market economies. The authors study the effect that a government’s reputation as perceived by the public has on the development of inflationary trends in the economy. Another typical problem where game theory was instrumental is the entrance deterrence problem, also known as Selten’s chain store game [120]. In this game, a multi-market monopolist faces a finite sequence of potential entrants. Each entrant observes the actions taken in the previous markets and chooses whether to enter the market monopolized by the incumbent firm. If there is an entry, the monopolist has two options; to fight the entry or to accommodate. Accommodation is the best short-run response for the incumbent whereas fighting is an optimal policy only if it deters some of the future potential entrants. The ability of the incumbent to establish a reputation of any of the two types (i.e., fighter or accommodator) is key in determining the outcome of the game. In [100], the authors present an analysis related to this game. Their analysis suggests that if a firm is threatened by several potential entrants, then predation may be rational against early entrants, even if it is costly when viewed in isolation. Another reputation-related issue studied in game theory is the effect of changing identities on reputation systems. Results from game theory demonstrate that there are inherent limitations to the effectiveness of reputation systems when participants are allowed to start over with new names [112].

116

6 Related Work

6.4 Reputation Management for the Web Reputation has also been used to assess the quality of information on the Worldwide Web. The information is mainly accessed through search engines over the Web or is communicated through emails and similar mechanisms. Consequently, the bulk of research on reputation systems has been conducted in these major directions. In the following, we discuss some of these works.

6.4.1 Search Engines Reputation has been used in the area of Web-based information retrieval. The information access and retrieval is dependent on the reputation of the Web-accessible information source or a service provider. In this regard, assessing the reputation of the source is a major challenge. The problem is exacerbated when the preferences of users accessing the information must be taken into consideration. In [59], the authors present a solution for reputation assessment by quantifying parameters on the nodes of a graph. In their proposed solution each node represents an information item and the directed arcs link pairs of nodes (a, b), where node b’s credibility depends on node a. The credibility is a numerical quantity attached to the node. Nodes with a large number of incoming arcs correspond to information items with high credibility. This formulation has been adopted in some Web search engines ( e.g., Google) to determine data’s reputation and, consequently, rank search results. In [68], the authors address the issue of measuring the reputation of Web sites. Their work aimed at answering the question of whether certain types of search tools yield sites that are perceived to be more reputable, i.e., authoritative and trustworthy, than others.

6.4.2 Electronic Mail In recent times, reputation-based spam filters have been introduced for email systems. These filters provide an effective alternative to the traditional anti-spam techniques based generally on content inspection. The messages that are sent are evaluated on the behavior of the sender. The system keeps track of whether a sender typically engages in good behavior (such as sending legitimate e-mail messages), bad behavior (such as sending spam or malicious code) or something in between [29]. Earlier spam filters used simple lists (blacklists and whitelists) to categorize the senders. Blacklists contained the IP addresses of known spammers, phishers and virus senders, and whitelists contain the IP addresses of senders known to be legitimate. Quickly, these systems became unable to cope with more sophisticated forms of spams. The next-generation spam filters improved on the inadequacies of their

6.5 Reputation in Web Services

117

predecessors by introducing dynamic black and white lists, automatic propagation of IP lists, and message scoring, i.e., assigning a score to each message based on the likelihood of being a legitimate message. For instance, TrustedSource is such a reputation-based anti-spam system. It is able to score all 4.2 billion IP addresses across the Internet on a spectrum of good to bad, depending on both sender history and message characteristics. An algorithmic method for scoring emails based on the inferred reputation relationships from the network is presented in [51]. The proposed algorithm has been used in the email client called TrustMail. The client uses a reputation network to derive the importance that a given message has for a given user. The client complements spam filters by helping users to automatically identify good messages that might otherwise be indistinguishable from unwanted messages.

6.5 Reputation in Web Services Despite the abundance in reputation-related literature, little research has focused on the reputation of Web services. The authors presented a distributed model for Web service reputation in [96]. The model enables a service’s clients to use their past interactions with that service to improve future decisions. It also enables services’ clients to share their experience from past interactions with Web services. Agents are associated with each Web service, that act as proxies to collect information on and build a reputation of a Web service. The authors present an approach that provides a conceptual model for reputation that captures the semantics of attributes. The semantics includes characteristics, which describe how a given attribute contributes to the overall rating of a service provider and how its contribution decays over time. In [93], the authors present an ontology model to aid in establishing trust in Web services. The trust model is refined in [94] and [95] and a trust model based on a shared conceptualization of quality of service (QoS) attributes is presented. The model shares the need for ontologies with our presented model. However, it lacks some important features that are central to our proposed model. The reputationbased trust model in [95] lacks complete automation of feedback reporting. Human participation is necessary for rating Web services. Moreover, all agents that report reputation ratings are assumed to be trustworthy. Similarly, the common agencies to whom these ratings are communicated for sharing/aggregation are also expected to behave honestly. In our model, no such simplifying assumption is made. We calculate the reputation of a Web service based on the testimonies of both trusted and malicious raters. We have provided an elaborate method to measure the credibilities of service raters. The credibility-based scheme allows us to assign more weights to the trustworthy testimonies as compared to untrustworthy ones. This feature was deemed as “future work” in [95]. Another feature that is absent in the previous models, but is present in ours is the incorporation of “local historical information” with the “global reputation view.” Moreover, our model allows Web services to person-

118

6 Related Work

alize their attribute preferences as in [95]. However our model also accepts single reputation values (that are to be calculated from attribute aggregations) if the preferences of the service requester are similar to those of the rater. This reduces the computation time for reputation, which is required in real-time situations as Web service selection and service query optimization. In [46], a principal might trust an object if that object is trusted by a third party that is trusted by the given principal. This is similar to the notion of endorsement proposed in [92]. A key difference between the two approaches is that [46] captures policies for endorsement and delegation, whereas [92] seeks to capture service attributes and how they can be combined to support various policies. In [81], the authors present a framework for reputation-based service selection in semantic grids. The framework consists of a matchmaking service, a composer service, and a reputation manager service (RMS). Service consumers provide their ratings of services to the RMS. The RMS computes the reputation of a service based on the ratings for that service received from different users. In [42], the authors propose a Web service discovery method that considers both the functionality and the behavior of the Web services, while providing a scalable reputation model for ranking the discovering services. The method operates over a peer-to-peer system, thus avoiding the inherent problems of centralized systems such as scalability, single point of failure and high maintenance cost. Reputation was also considered in the context of service composition. Most of the research on this issue focused on “choreographical aspects” of composition. As service composition had largely become understood, the issue of trustworthy composition of Web services became a challenge [124]. The prime idea was to use service reputations to enable the composition of trustworthy services. A summary of the related work in comparison with RATEWeb is shown in Table 6.1.

6.5 Reputation in Web Services

119

Table 6.1 Related Work Summary: In comparison with RATEWeb System PeerTrust [146] EigenTrust [66]

Limitation • Requires a common set of past providers • Feedback trust and service trust are coupled together • Assumes pre-trusted peers PowerTrust • Power nodes have high credibility [159] • Assumes a power-law distribution • Requires a structured overlay • System-wide reputation PRIDE • IP-Based Safeguard (IBS) is used [36] to counter possible dishonesty • Based on simple average of negative and positive ratings • Focused primarily on efficiency / XRep • No formalized trust metric [31] • Data needs to be in a Zipf distribution PGrid • Assumes that most peers in the [1] network are honest • Representation of trust in binary values only REGRET • Requires a minimum number [116] of interactions to make correct evaluations of reputation • Raters dishonesty not discussed R-agents • Ratings similar to last report [65] are deemed honest CONFIDANT • Neighbors monitor node behavior [20] and maintain their reputation • A Beta distribution is assumed for the node behavior QoS in • Requires a minimum number of Overlay interactions to make Networks correct evaluations of reputation [115] • Feedback trust and service trust are coupled together Web Services • Human feedback reporting [94, 95] • Raters are assumed to be trustworthy • Ratings are communicated to common agencies for aggregation which are expected to be honest

RATEWeb • Not required • Rater credibility (Feedback trust) is assessed separately from Provider Trust (service trust) • Credibility depends on rating honesty • No distribution assumed • Not required • User-centric reputation • Service identities are not ”strict” (i.e. not tied to IPs) • Based on defined metrics • Focus on functionality • Formal metrics discussed • Services (and data) can be spread. No assumption is made • Honest and dishonest services co-exist (in any number) • Reputation (hence trust) measured on a continuous scale from 0 to 1 • No minimum number of interactions required • Rater dishonesty assessment is an integral part of the approach • Rater credibility assessment incorporates this approach and extends it • Services cannot monitor their peers (normal Web-based interactions) • No distribution is assumed • No minimum number of interactions required • Rater credibility (Feedback trust) is assessed separately from Provider Trust (service trust) • No human intervention required • Raters consist of both honest and dishonest services • No central/common agency consulted for ratings aggregation (Each service performs this locally)

“This page left intentionally blank.”

Chapter 7

Conclusion

There is increasing consensus that Web services will be the silver bullet of the next generation of the Web, the service Web. In this respect, Service Computing is expected to be at the core of the emerging field of “Services Science.” The focus of this book is on a key enabler of service computing: a trust framework among Web services. The goal of this research is to provide a trust management framework for Web services interactions. The aim is to develop techniques for defining, ascertaining, and managing reputation among Web services as a key mechanism for managing trust. Reputation of Web services (in delivering services) is the key parameter used to establish this trust. A Peer-to-Peer approach is proposed to assess the reputation of a-priori unknown Web services. Previous solutions for reputation assessment make simplifying assumptions that may not apply on the Semantic Web (e.g. reliance on pre-existing trusted parties, specific data distribution, a common set or providers for similarity-based trust, requiring human intervention, focusing on efficiency/performance rather than functionality, etc). We develop a simple and holistic solution that overcomes the deficiencies of previous solutions and provides an automated, adaptive and decentralized mechanism, whereby reputations are evaluated through a number of heuristics with different perspectives, providing a fair and accurate assessment. We define new methods for the creation of reputation information, its collection, and assessment that are robust in the face of a variety of attacks. Figure 7.1 shows a high-level overview of our framework. We define an ontology-based approach for organizing and describing Web services, where an ontology typically refers to the hierarchical description of important concepts in a domain, along with descriptions of the properties of concepts. We introduce the notion of ”community” as an instance of the ontology that we have developed, to cluster Web services based on their domain of interest. Communities are also used to bootstrap the reputation of new participants and setting rules for when a member’s reputation goes below a specified threshold, e.g. information dissemination, membership suspension, etc. In the traditional Web services interaction model, provider selection is not trust-based. Web service consumers discover a list of providers that can provide the required functionality through service registries, select a provider arbitrarily, and then inZ. Malik and A. Bouguettaya, Trust Management for Service-Oriented Environments, DOI 10.1007/978-1-4419-0310-5_7, © Springer Science + Business Media, LLC 2009

121

122

7 Conclusion

Fig. 7.1 A High-Level View of RATEWeb

voke, i.e. communicate with the provider, expecting to retrieve the desired results. We extend this model to include a service provider’s reputation as the fundamental selection criteria. Service consumers gather feedbacks of providers from their peers, assimilate this information and derive corresponding reputations, and sort the providers according to the assessed reputations. The higher the reputation of a service provider, the better the service is expected to behave. Service consumers then invoke the best available Web service, and at the end of the interaction, service consumers rate the providers according to pre-determined criteria. These service ratings are used to compute provider reputations accordingly. We define the reputation of a Web service as a reflection of its quality. The quality of service is defined as a set of quantitative and qualitative characteristics of a system, necessary to achieve the required functionality of an application. We adopt a similar definition and extend its application to the domain of Web services. We define a model that shows how several Quality of Web Service (QoWS) parameters (e.g. reliability, availability, throughput, etc.) can be measured and processed by Web services in an automatic manner to capture a service’s non-functional attributes. Service providers publish their ”promised” QoWS values in the service registry along with the service descriptions. Post-transaction completion, observing the variation between the promised QoWS and the actually ”delivered” QoWS, rep-

7 Conclusion

123

utation ratings are created. We define original models for the collection of these ratings, extending existing approaches and defining new ones (e.g. community-based). We define novel metrics to aggregate the different feedbacks from raters, for assessing a service provider’s reputation in an accurate manner. For situations where rater feedbacks are scarce, we use statistical forecasting (particularly, a Hidden Markov Model) to ascertain the provider’s reputation. Our simple, unique, and original techniques inhibit a variety of attacks that include interaction falsification, unfair ratings for bad mouthing and complimentary purposes, strategic rating falsification, provider behavior deterioration, ratings staleness and scarcity, and incomprehensive provider evaluation. We conducted an extensive experimental study of the proposed framework to show that in comparison with existing works, our framework (RATEWeb) provides better accuracy. Our distributed approach enables the selection and composition of autonomous and dynamic set of Web services using reputation as a key parameter for ascertaining trust. This work is a major contribution in the emerging field of services science. Establishing a trusted environment for Web service interactions will facilitate the deployment of applications such as E-commerce (e.g., B2B applications), E-government (e.g., social services), and E-science (e.g., Grid computing).

Future Research Directions The Web has revolutionized the way we discover, acquire, process, and use information. New models, methods, and tools for building a variety of Web-based information systems are continuously being developed. In this regard, the Services Computing paradigm presents a probable direction for the future of the Web. Due to Web’s open nature, the extent of heterogeneities involved in service-oriented environments, and the variety of online participants, we believe that security, privacy, and trust are key requirements that need to be fulfilled before the true potential of the service Web can be fulfilled. The development of reputation management technologies will hence play a central role in shaping up individual and business behaviors online. According to Dellarocas and Resnick [35], reputation systems will work as quality assurance mechanisms to shape up the social, economical, and political online environments. In the following, we provide some open research problems related to reputation and trust management in open networked environments. • Trustworthy Dynamic Composition of Web Services: The number of services to be integrated may be large and continuously changing on the service Web. Web service composition requires flexibility to dynamically adapt to changes that may occur in partners’ applications. Participants must be able to respond rapidly to changes where both operational (e.g., server load) and market (e.g., changes in regulations) environments are not easily predictable. Additionally, the competitive nature of the Web makes possible the availability of alternate services that provide “similar” functions. As mentioned earlier in the book, businesses should team up with the “best” available services at any given time. They need to form

124

•

•

•

•

7 Conclusion

short term relationships and then disband when it is no longer profitable to stay together. This form of partnership does not assume any a priori trading relationship. The support of dynamic composition based on trust will facilitate the establishment of on demand and real-time partnerships. Services will not statically be bound to each other. New trustworthy partners with relevant features should be dynamically discovered and assembled. Currently, relationships among component services are mostly established at development time. While technologies such as SOAP, WSDL, and UDDI provide capabilities for defining Web services, they clearly are not sufficient to facilitate the establishment of dynamic business relationships based on trust. More research effort is needed to enable the creation of trustworthy dynamic relationships. Trust Support of Mobile Services: The past years have witnessed a boom in wireless technologies. Sophisticated wireless devices such as cellular phones and PDAs are now available at affordable prices. Emerging technologies including 3G and 4G (third and fourth generation) are under development to increase the bandwidth of wireless channels. However, most of the proposed Web service concepts cannot or may not be easily applicable to mobile services. This is due to the peculiarities of wireless environments including limited bandwidth, unbalanced client-server communication, limited power supply, and frequent unavailability of wireless networks. For example, using UDDI for discovering Web services requires multiple costly round-trips over wireless networks. Invoking Web services using SOAP may increase mobile hosts’ power consumption and waiting time. This calls for new techniques to adapt Web services to the wireless world. Moreover, trust computation involving a number of rater feedbacks is expensive, and may not be feasible for Mobile environments. Hence, new trust models for such environments need to be researched and developed. Secure Service Compositions: The sharing of privately owned data through and across remote applications while respecting access control restrictions poses a fundamental challenge in building loosely coupled distributed systems (e.g. Web services compositions). We believe that exploring how services hosted at different sites can be composed while respecting data access requirements, is an issue that requires immediate attention from the research community. A possible direction may be to investigate the applicability of cryptographic techniques (such as private information retrieval) and reputation in enforcing these requirements. Portable Reputations: With the proliferation of user accounts, interest in single sign-on is gradually increasing. Ease of use in managing several accounts is the main reason behind this surge in popularity. On similar lines, we propose researching the idea of having a portable reputation, in which reputation attained in one domain or context can be used in a different one with minimal human intervention. Reputation Change Detection: A fundamental issue in detecting a change in the reputation of Web services is to distinguish between ephemeral and persistent changes in a provider’s behavior. The former must be accurately identified and its impact must be minimized on the provider’s long-time reputation. For example, a technical incident at a server running a Web service may cause the service to

7 Conclusion

125

be unavailable for some time. This should not be interpreted as a trigger to lower the provider’s reputation. The unavailability may prompt the service requester to change the reputation rating immediately or it may decide to wait for a period of time before updating the reputation rating. Another approach is to put the suspected service on probation as a temporary measure. This information can be associated with the reputation information to be weighed according to the service requester’s preferences. • Trust on the Social Semantic Web: Semantic Web technologies are slowly beginning to influence our online activities. Semantics-enabled applications for annotating and sharing pictures, videos, and blogs are gaining acceptance, and are believed to become integral components of the “social Web. However, for the widespread adoption of this new paradigm, solutions for providing security, preserving privacy, and enabling trust need to be developed. An initial direction for this work is exploring how trust can be established in an open environment like the social Semantic Web, and how semantic technologies can be used to resolve privacy issues. • Trustworthy Sensor Networks: Sensor networks have mainly been used to collect data/information for a defined environment. The applications that are developed for these networks are usually “closed with a single entity (e.g. company, a government agency, research institution, etc.) maintaining control. In recent years, there has been a shift from this model and generic sensor networks are now being deployed to decouple the sensor network and the applications using it. In this model the sensor network query system is designed to answer queries from any application. We believe that next-generation sensor network applications would require customizable architectures which provide developers the ability to select and integrate individual components from several networks in an efficient manner. Therefore, trustworthy service-oriented sensor networks for building open, interoperable, and customizable solutions need to be developed.

“This page left intentionally blank.”

References

1. K. Aberer and Z. Despotovic. Managing Trust in a Peer-2-Peer Information System. In Proc. of the ACM Conference on Information and Knowledge Management (CIKM), pages 310–317, Atlanta, Georgia, 2001. 2. K. Aberer and Z. Despotovic. On Reputation in Game Theory - Application to Online Settings. Working Paper, 2004. 3. E. Al-Masri and Q. H. Mahmoud. Discovering the Best Web Service. In 16th International Conference on World Wide Web (WWW), pages 1257–1258, 2007. 4. E. Al-Masri and Q. H. Mahmoud. QoS-based Discovery and Ranking of Web Services. In IEEE 16th International Conference on Computer Communications and Networks (ICCCN), pages 529–534, 2007. 5. B. K. Alunkal. Grid Eigen Trust - A Framework for Computing Reputation in Grids. Master’s thesis, Computer Science Department, Illinois Institute of Technology, Chicago, December 2003. 6. S. Angelov, S. Till, and P. Grefen. Dynamic and secure b2b e-contract update management. In EC ’05: Proceedings of the 6th ACM Conference on Electronic commerce, pages 19–28, New York, NY, USA, 2005. ACM Press. 7. D. Artz and Y. Gil. A survey of trust in computer science and the semantic web. Web Semantics Journal, 5(2):58–71, 2007. 8. F. Azzedin and M. Maheswaran. Evolving and Managing Trust in Grid Cmputing Systems. In Proc. of the IEEE Canadian Conference on Electrical and Computer Engineering, pages 1424–1429, May 2002. 9. F. Azzedin and M. Maheswaran. Integrating Trust into Grid Resource Management Systems. In Proc. of the 31st International Conference on Parallel Processing (ICPP), pages 47–54, Vancouver, BC, Canada, August 20-23 2002. 10. F. Azzedin and M. Maheswaran. Towards Trust-Aware Resource Management in Grid Computing Systems. In Proc. of 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), pages 452–457, Berlin, Germany, May 22-24 2002. 11. D. Backus and J. Driffill. Inflation and Reputation. American Economic Review, 75(3):530– 38, June 1985. 12. Y. Baghdadi. A web services-based business interactions manager to support electronic commerce applications. In ICEC ’05: Proceedings of the 7th International Conference on Electronic Commerce, pages 435–445, New York, NY, USA, 2005. ACM Press. 13. P. Baldi, S. Brunak, K. Chauvin, and H. Nielsen. Assessing the accuracy of prediction algorithms for classi cation: An overview. Bioinformatics, 16:412, 2000. 14. L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Annals of Mathematics and Statistics, 41(1):164–171, 1970. 127

128

References

15. B. Benatallah, M. Dumas, Q. Z. Sheng, and A. H. Ngu. Declarative composition and peerto-peer provisioning of dynamic web services. In 18th International Conference on Data Engineering (ICDE), pages 297–308, 2002. 16. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34–43, May 2001. 17. E. Bertino, E. Ferrari, and A. C. Squicciarini. Trust-X: A Peer-to-Peer Framework for Trust Establishment. IEEE Transactions on Knowledge and Data Engineering, 16(7):827–842, 2004. 18. K. Birman. The untrustworthy web services revolution. IEEE Computer, 39(2):113–115, February 2006. 19. ISO Standards Body. Information Technology: Quality of Service Framework. Technical Report ISO/IEC JTC1/SC21 DIS 13236 ICS 35.020, 1995. 20. S. Buchegger J.-Y. Le Boudec. Performance Analysis of the CONFIDANT Protocol. In Proc. of the 3rd ACM Intl. Symposium on Mobile Ad Hoc Networking and Computing, pages 226–236, June 9-11 2002. 21. A. Bouguettaya, M. Ouzzani, B. Medjahed, and J. Cameron. Managing Government Databases. Computer, 34(2):56–64, February 2001. 22. S. Buchegger and J.-Y. Le Boudec. A robust reputation system for p2p and mobile ad-hoc networks. In Proceedings of the Second Workshop on the Economics of Peer-to-Peer Systems, 2004. 23. V. Buskens. Social Networks and the Effect of Reputation on Cooperation. In Proc. of the 6th Intl. Conf. on Social Dilemmas, page 42, 1998. 24. O. Cappe, E. Moulines, and T. Ryden. Inference in Hidden Markov Models, pages 44, year = 2005. Springer. 25. S. Casare and J. Sichman. Towards a functional ontology of reputation. In AAMAS ’05: Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, pages 505–511, New York, NY, USA, 2005. ACM Press. 26. F. Casati and M.-C. Shan. Models and Languages for Describing and Discovering EServices. In Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD), page 626, New York, NY, USA, 2001. ACM. 27. M. Chen and J. P. Singh. Computing and using reputations for internet ratings. In 3rd ACM Conference on Electronic Commerce, pages 154–162, New York, NY, USA, 2001. ACM. 28. A. P. Ciganek, M. N. Haines, and W. Haseman. Horizontal and vertical factors influencing the adoption of web services. In HICSS ’06: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, volume 6, pages 109c–109c, Jan. 2006. 29. CipherTrust. TrustedSource: The Next-Generation Reputation System. Technical report, February 2005. 30. E. Damiani, S. Vimercati, S. Paraboschi, and P. Samarati. Managing and Sharing Servents’ Reputations in P2P Systems. IEEE Trans. on Knowledge and Data Engineering, 15(4):840– 854, 2003. 31. E. Damiani, S. D. C. D. Vimercati, S. Paraboschi, P. Samarati, and F. Violante. A reputationbased approach for choosing reliable resources in peer-to-peer networks. In ACM Conference on Computer and Communications Security, pages 207–216, 2002. 32. J. Delgado and N. Ishii. Memory-Based Weighted-Majority Prediction for Recommender Systems. In ACM SIGIR ’99 Workshop on Recommender Systems: Algorithms and Evaluation, 1999. 33. C. Dellarocas. Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior. In EC ’00: Proceedings of the 2nd ACM Conference on Electronic Commerce, pages 150–157, New York, NY, USA, 2000. ACM. 34. C. Dellarocas. The Digitalization of Word-of-Mouth: Promise and Challeges of Online Feedback Mechanisms. Management Science, 49(10):1407–1424, 2003. 35. C. Dellarocas and P. Resnick. Online reputation mechanisms: A roadmap for future research. In Summary report of the First Interdisciplinary Symposium on Online Reputation Mechanisms, MIT, Boston, USA, April 26-27 2003.

References

129

36. P. Dewan and P. Dasgupta. Pride: Peer-to-peer reputation infrastructure for decentralized environments. In WWW (Alternate Track Papers & Posters), pages 480–481, 2004. 37. R. Dingledine, N. Mathewson, and P. Syverson. Reputation in P2P Anonymity Systems. In Workshop on Economics of Peer-to-Peer Systems, June 2003. 38. C. Dislis. Improving service availability via low-outage upgrades. In 26th Annual International Computer Software and Applications Conference (COMPSAC), pages 989–993, 2002. 39. M. A. Ekstrom, H. C. Bjornsson, and C. I. Nass. A Reputation Mechanism for B2B Electronic Commerce that Accounts for Rater Credibility. Journal of Organizational Computing and Electronic Commerce, 15(1):1–18, 2005. 40. J. C. Ely, D. Fudenberg, and D. Levine. When is Reputation Bad? Working Paper, 2005. 41. J. C. Ely and J. Valimaki. Bad Reputation. Quarterly Journal of Economics, 118:785–814, 2003. 42. F. Emekci, O. D. Sahin, D. Agrawal, and A. El-Abbadi. A peer-to-peer framework for web service discovery with ranking. pages 192–199, July 2004. 43. J. Falker and J. Kuchar. Analytical and Empirical Analysis of Restricting Airspace. In 4th. USA/Europe Air Traffic Management R&D Seminar, Santa Fe, NM, December 2001. 44. M. Feldman and J. Chuang. The Evolution of Cooperation Under Cheap Pseudonyms. In Seventh IEEE International Conference on E-Commerce Technology, CeC 2005, pages 284– 291, 2005. 45. M. Feldman, K. Lai, I. Stoica, and J. Chuang. Robust incentive techniques for peer-to-peer networks. In EC ’04: Proceedings of the 5th ACM conference on Electronic commerce, pages 102–111, New York, NY, USA, 2004. ACM Press. 46. T. W. Finin and A. Joshi. Agents, Trust, and Information Access on the Semantic Web. ACM SIGMOD Record, 31(4):30–35, 2002. 47. I. Foster, C. Kesselman, J. Nick, and S. Tuecke. Grid Services for Distributed System Integration. IEEE Computer, 35(6), June 2002. 48. D. Fudenberg and D. K. Levine. Maintaining a Reputation when Strategies are Imperfectly Observed. Review of Economics Studies, 59:561–579, 1992. 49. D. Garcia and M. Toledo. A web service architecture providing qos management. In LAWEB ’06: Proceedings of the Fourth Latin American Web Congress (LA-WEB’06), pages 189–198, Washington, DC, USA, 2006. IEEE Computer Society. 50. G. Giuliano. ARGOS: Dynamic Composition of Web Services for Goods Movement Analysis and Planning. Technical report, Univ. of Southern California, 2003. 51. J. Golbeck and J. Hendler. Reputation Network Analysis for Email Filtering. In Proc. of the 1st Conference on Email and Anti-Spam, Mountain View, California, July 30-31 2004. 52. R. M. Gray. Vector Quantization. IEEE ASSP Magazine, pages 4—29, 1984. 53. A. Gunasekaran and E.W.T. Ngaib. Build-to-Order Supply Chain Management: A Literature Review and Framework for Development. Journal of Operations Management, 23:423–451, 2005. 54. M. Gupta, P. Judge, and M. Ammar. A Reputation System for P2P Networks. In Proc. of the 13th Intl. Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 144–152, Monterey, CA, USA, 2003. 55. S. Hansen, J. Swan, and T. Powers. Encouraging Friendly Complaint Behavior in Industrial Markets: Preventing a Loss of Customers and Reputation. Journal of Industrial Marketing Management, 25:271–281, 1996. 56. M. R. Hassan and B. Nath. Stockmarket forecasting using hidden markov model: A new approach. In ISDA, pages 192–196, 2005. 57. D. Houser and J. Wooders. Reputation in Auctions: Theory, and Evidence from eBay. Journal of Economics and Management Strategy, 15(2):353–370, 2005. 58. B. A. Huberman and F. Wu. The dynamics of reputations. Computing in Economics and Finance 2003 18, Society for Computational Economics, August 2003. 59. M. N. Huhns and D. A. Buell. Trusted autonomy. IEEE Internet Computing, 6(3):92–95, 2002. 60. R. Hull and J. Su. Tools for composite web services: A short overview. SIGMOD Record, 34(2):86–95, 2005.

130

References

61. T. D. Huynh, N. R. Jennings, and N. R. Shadbolt. Certified reputation: How an agent can trust a stranger. In AAMAS ’06: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, pages 1217–1224, New York, NY, USA, 2006. ACM Press. 62. IDC. Worldwide Web Services Software Forecast, 2004-2008: Cautious Adoption Continues. Technical Report 31079, 2004. 63. A. Josang. A logic for uncertain probabilities. International Journal of Uncertainity, Fuzziness and Knowledge-Based Systems, 9(3):279–311, 2001. 64. A. Josang, R. Ismail, and C. Boyd. A survey of trust and reputation systems for online service provision. Decision Support Systems Journal, 43(2):618–644, 2007. 65. R. Jurca and B. Faltings. An Incentive Compatible Reputation Mechanism. In Proc. of the 2003 IEEE Intl. Conf. on E-Commerce, pages 285–292, June 2003. 66. S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In WWW ’03: Proceedings of the 12th International Conference on World Wide Web, pages 640–651, New York, NY, USA, 2003. ACM. 67. P. Kannan. Designing, managing and evaluating a social network based Information City for innovation success. In M. S. Thesis, University of Missouri–Rolla, 2006. 68. G. Keast, E. G. Toms, and J. Cherry. Measuring the Reputation of Web Sites: A Preliminary Exploration. In Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, pages 77–78, Roanoke, Virginia, United States, 2001. 69. C. Kesler. Experimental Games for the Design of Reputation Management Systems. IBM Systems Journal, 42(3):498–506, 2003. 70. P. Kollock. The Production of Trust in Online Markets. Advances in Group Processes, 16:99–123, 1999. 71. H. Kreger. Fulfilling the Web Services Promise. Communications of the ACM, 46(6):29–ff, 2003. 72. S. K. Lam and J. Riedl. Shilling Recommender Systems for Fun and Profit. In Proc. of the 13th International World Wide Web Conference (WWW), pages 393–402, New York, NY, USA, 2004. 73. D. Lamanna, J. Skene, and W. Emmerich. Slang: A language for defining service level agreements. In FTDCS ’03: Proceedings of the The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS’03), page 100, Washington, DC, USA, 2003. IEEE Computer Society. 74. Z. Li and S. Wang. The foundation of e-commerce: Social reputation system–a comparison between america and china. In ICEC ’05: Proceedings of the 7th International Conference on Electronic Commerce, pages 230–232, New York, NY, USA, 2005. ACM Press. 75. Y. Liu, A. Ngu, and L. Zheng. Qos computation and policing in dynamic web service selection. In WWW Alt. ’04: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pages 66–73, New York, NY, USA, 2004. ACM Press. 76. Y. Liu and Y. R. Yang. Reputation propagation and agreement in mobile ad-hoc networks. IEEE Wireless Communications and Networking Conference, WCNC 2003, 3:1510–1515, March 2003. 77. D. Lucking-Reily, D. Bryan, N. Prasad, and D. Reeves. Pennies from EBay: The Determinants of Price in Online Auctions. Journal of Industrial Economics, 55(2):223–233, March 2007. 78. W. Ma, V. Tosic, B. Esfandiari, and B. Pagurek. Extending apache axis for monitoring of web service offerings. In BSN ’05: Proceedings of the IEEE EEE05 international workshop on Business services networks, pages 7–7, Piscataway, NJ, USA, 2005. IEEE Press. 79. I. L. MacDonald and W. Zucchini. Hidden Markov and Other Models for Discrete-valued Time Series, page 52. Chapman and Hill, 1997. 80. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297, Berkeley, University of California Press, 1967.

References

131

81. S. Majithia, A. S. Ali, O. F. Rana, and D. W. Walker. Reputation-Based Semantic Service Discovery. In Proc. of the 13th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE), pages 297–302, 2004. 82. R. Malaga. Web-Based Reputation Management Systems: Problems and Suggested Solutions. Electronic Commerce Research, 1(1):403–417, 2001. 83. Z. Malik and A. Bouguettaya. Preserving Trade Secrets Between Competitors in B2B Interactions. International Journal of Cooperative Information Systems, 14(2-3):265–297, June/September 2005. 84. Z. Malik and A. Bouguettaya. Evaluating Rater Credibility for Reputation Assessment of Web Services. In 8th International Conference on Web Information Systems Engineering (WISE 07), pages 38–49, December 2007. 85. Z. Malik and A. Bouguettaya. Rater Credibility Assessment in Web Services Interactions. WWW Journal, 12(1):3–25, March 2009. 86. Z. Malik and A. Bouguettaya. RATEWeb: Reputation Assessment for Trust Establishment among Web Services. VLDB Journal (ISSN-Print:1066-8888, ISSN-Online:0949-877X), February 2009. 87. Z. Malik and A. Bouguettaya. Reputation Bootstrapping for Trust Establishment among Web Services. IEEE Internet Computing, 13(1):40–47, January 2009. 88. A. Mani and A. Nagarajan. Understanding quality of service for web services, 2002. http://www-128.ibm.com/developerworks/library/ws-quality.html. 89. S. Marti and H. Garcia-Molina. Identity Crisis: Anonymity vs. Reputation in P2P Systems. In Proc. of the 3rd International Conference on Peer-to-Peer Computing, pages 134–141, Linkoping, Sweden, 2003. 90. S. Marti and H. Garcia-Molina. Limited Reputation Sharing in P2P Systems. In Proc. of the 5th ACM Conference on Electronic Commerce, pages 91–101, New York, NY, USA, May 2004. 91. S. Marti and H. Garcia-Molina. Taxonomy of trust: Categorizing p2p reputation systems. Computer Networks, 50(4):472–484, 2006. 92. E. M. Maximilien and M. P. Singh. Reputation and Endorsement for Web services. SIGecom Exchanges, 3(1):24–31, December 2002. 93. E. M. Maximilien and M. P. Singh. A Framework and Ontology for Dynamic Web Services Selection. IEEE Internet Computing, 8(5):84–93, September-October 2004. 94. E. M. Maximilien and M. P. Singh. Toward autonomic web services trust and selection. In ICSOC ’04: Proceedings of the 2nd International Conference on Service Oriented Computing, pages 212–221, New York, NY, USA, 2004. ACM. 95. E. M. Maximilien and M. P. Singh. Agent-based trust model involving multiple qualities. In AAMAS 05: Proceedings of 4th International Autonomous Agents and Multi Agent Systems, pages 519–526, New York, NY, USA, 2005. ACM. 96. E. M. Maximillien and M.P. Singh. Conceptual Model of Web Service Reputation. SIGMOD Record, 31(4):36–41, December 2002. 97. B. Medjahed. Semantic Web Enabled Composition of Web Services. PhD thesis, Department of Computer Science, Virginia Tech, January 2004. 98. B. Medjahed and A. Bouguettaya. Customized delivery of e-government web services. IEEE Intelligent Systems, 20(6):77–84, Nov.-Dec. 2005. 99. B. Medjahed, A. Bouguettaya, and A. Elmagarmid. Composing Web Services on the Semantic Web. The VLDB Journal, 12(4):333–351, 2003. 100. P. Milgrom and J. Roberts. Predation, Reputation, and Entry Deterrence. Journal of Economic Theory, pages 280–294, 1982. 101. N. Miller, P. Resnick, and R. Zeckhauser. Eliciting informative feedback: The peer-prediction method. Management Science Journal, 51(9):1359–1373, 2005. 102. F. Ming, T. Yong, and A. B. Whinston. Evaluation and Design of Online Cooperative Feedback Mechanisms for Reputation Management. IEEE Transactions on Knowledge and Data Engineering, 17(2):244–254, Feb. 2005.

132

References

103. L. Mui, M. Mohtashemi, and A. Halberstadt. A computational model of trust and reputation . In Proceedings of the 35th Annual Hawaii International Conference on System Sciences, pages 2431–2439, January 2002. 104. CNet News. AmEx Unveils Disposable Credit Card Numbers. htt p : //news.com.com/2100 − 1017 − 245428.html, 2006. 105. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998. 106. M. P. Papazoglou and D. Georgakopoulos. Serive-Oriented Computing. Communcications of the ACM, 46(10):25–65, 2003. 107. S. Park, L. Liu, C. Pu, M. Srivatsa, and J. Zhang. Resilient trust management for web service integration. In ICWS ’05: Proceedings of the IEEE International Conference on Web Services, pages 499–506, Washington, DC, USA, 2005. IEEE Computer Society. 108. K. M. Passino and S. Yurkovich. Fuzzy Control. Addison Wesley Publishing Company, 1 edition, September 1997. 109. J. M. Pujol, R. Sanguesa, and J. Delgado. Extracting Reputation in Multi-agent Systems by Means of Social Network Topology. In Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 467–474, Bologna, Italy, 2002. 110. L.R. Rabiner and B. H. Juang. An introduction to hidden markov models. IEEE ASSP Magazine, 3(1):4–16, 1986. 111. S. Ran. A model for web services discovery with qos. SIGecom Exchanges, 4(1):1–10, 2003. 112. P. Resnick, K. Kuwabara, R. Zeckhauser, and E. Friedman. Reputation Systems. Communications of the ACM, 43(12):45–48, 2000. 113. P. Resnick and R. Zeckhauser. Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System, volume 11 of Advances in Applied Microeconomics, pages 1–26. Elsevier Science, Amsterdam, 2002. 114. A. Rezgui, A. Bouguettaya, and Z. Malik. A Reputation-based Approach to Preserving Privacy in Web Services. In The 4th VLDB Workshop on Technologies for E-Services (TES’03), volume 2819, pages 91–103, Berlin, Germany, September 2003. 115. B. G. Rocha, V. Almeida, and D. Guedes. Increasing qos in selfish overlay networks. IEEE Internet Computing, 10(3):24–31, May-June 2006. 116. J. Sabater and C. Sierra. Reputation and social network analysis in multi-agent systems. In Proceedings of the First Intlernational Joint Conference on Autonomous Agents and Multiagent Systems, pages 475–482, Bologna, Italy, 2003. 117. M. Sabou, C. Wroe, C. Goble, and G. Mishne. Learning Domain Ontologies for Web Service Descriptions: an Experiment in Bioinformatics. In WWW ’05: Proceedings of the 14th International World Wide Web Conference, pages 190–198, New York, NY, USA, 2005. ACM. 118. A. Sanil, S. Gomatam, and A. F. Karr. NISS WebSwap: A Web Service for Data Swapping. Journal of Statictical Science, 8(7):1–12, 2003. 119. J. Sarkis and R. P. Sundarraj. Evaluation of Enterprise Information Technologies: A Decision Model Incorporating Strategic and Operational Issues. IEEE Transactions on Systems, Man, and Cybernetics, 36(2):260–273, 2006. 120. R. Selten. The Chain Store Paradox. Theory and Decision Journal, 9:127–159, 1978. 121. D. Y. Sha and Z. H. Che. Virtual Integration with a Multi-Criteria Partner Selection Model for the Multi-Echelon Manufacturing System. The International Journal of Advanced Manufacturing Technology, 25(7,8):793–802, 2005. 122. A. ShaikhAli, O. F. Rana, R. Al-Ali, and D. W. Walker. Uddie: An extended registry for web services. SAINT-W 03: Proceedings of the Symposium on Applications and the Internet Workshops, page 85, 2003. 123. D. A. Simunic. Theory is Too Important to be Left to the Theorists. Canadian Academic Accounting Association, 2008. 124. M. P. Singh. Trustworthy Service Composition: Challenges and Research Questions. In Proceedings of the Conference on Autonomous Agents and Multi-Agent Systems, Workshop on Deception, Fraud and Trust in Agent Societies, pages 39–52, 2002.

References

133

125. M. P. Singh and M. Huhns. Service-Oriented Computing: Semantics, Processes, Agents. Wiley, USA, 2004. 126. C. C. Snow, R. E. Miles, and H. J. Coleman. Managing 21st Century Network Organizations. Organizational Dynamics, 20(3):5–20, 1992. 127. J. D. Sonnek and J. B. Weissman. A Quantitative Comparison of Reputation Systems in the Grid. In The 6th IEEE/ACM International Workshop on Grid Computing, pages 242–249, November 2005. 128. N. Sundaresan. Online trust and reputation systems. In EC ’07: Proceedings of the 8th ACM conference on Electronic commerce, pages 366–367, New York, NY, USA, 2007. ACM Press. 129. F. Taiani, M. Hiltunen, and R. Schlichting. The impact of web service integration on grid performance. In HPDC-14: Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, volume 14, pages 24–27, 2005. 130. M. Tennenholtz. Reputation systems: An axiomatic approach. In AUAI ’04: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages 544–551, Arlington, Virginia, United States, 2004. AUAI Press. 131. Tim Berners-Lee. Information Management: A Proposal. March 1989. 132. S. Tsur, S. Abiteboul, R. Agrawal, U. Dayal, J. Klein, and G. Weikum. Are Web Services the Next Revolution in e-Commerce? (Panel). In VLDB ’01: Proceedings of the 27th International Conference on Very Large Data Bases, pages 614–617, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. 133. William Turin. Digital Transmission Systems: Performance Analysis and Modeling. McGraw-Hill, New York, 1998. 134. Y. B. Udupi and M. P. Singh. Information sharing among autonomous agents in referral networks systems. In 6th International Workshop on Agents and Peer-to-Peer Computing, May 2007. 135. A. Vogel, B. Kerherve, G. Bochmann, and J. Gecsei. Distributed multimedia and qos: A survey. IEEE Multimedia, 2(1):10–18, 1995. 136. L.-H. Vu, M. Hauswirth, and K. Aberer. QoS-based Service Selection and Ranking with Trust and Reputation Management. In 13th International Conference on Cooperative Information Systems (CoopIS 2005), Oct 31 - Nov 4 2005. 137. W3C. DAML Services. http://www.daml.org/services/owl-s. 138. W3C. Web Services Description Requirements, October 2002. http://www.w3.org/TR/wsdesc-reqs. 139. K. Walsh and E. G. Sirer. Fighting peer-to-peer spam and decoys with object reputation. In P2PECON ’05: Proceedings of the 2005 ACM SIGCOMM workshop on Economics of peer-to-peer systems, pages 138–143, New York, NY, USA, 2005. ACM Press. 140. Y. Wang and J. Vassileva. Trust and reputation model in peer-to-peer networks. In Proceedings of the 3rd International Conference on Peer-to-Peer Computing, pages 150–158, September 2003. 141. Y. Wang and J. Vassileva. Toward Trust and Reputation Based Web Service Selection: A Survey. International Transactions on Systems Science and Applications, special Issue on New Tendencies on Web Services and Multi-Agent Systems, 3(2):118–132, 2007. 142. J. Weng, C. Miao, and A. Goh. Protecting Online Rating Systems from Unfair Ratings, volume 3592 of Trust, Privacy and Security in Digital Business. LNCS, pages 50–59. Springer Berlin / Heidelberg, August 2005. 143. A. Whitby, A. Josang, and J. Indulska. Filtering Out Unfair Ratings in Bayesian Reputation Systems. The Icfain Journal of Management Research, 4(2):48–64, February 2005. 144. W. Wu, A. Doan, C. Yu, and W. Meng. Bootstrapping Domain Ontology for Semantic Web Services from Source Web Sites. In VLDB Workshop on Technologies for E-Services, pages 11–22, 2005. 145. L. Xiong and L. Liu. A Reputation-based Trust Model for Peer-to-Peer E-Commerce Communities. In Proceedings of the 2003 IEEE International Conference on E-Commerce, pages 275 –284, June 2003.

134

References

146. L. Xiong and L. Liu. PeerTrust: Supporting Reputation-based Trust for Peer-to-Peer Electronic Communities. IEEE Transactions on Knowledge and Data Engineering (TKDE), 16(7):843–857, July 2004. 147. J. Yang, L. Wang, S. Zhang, X. Sui, N. Zhang, and Z. Xu. Building Domain Ontology Based on Web Data and Generic Ontology. In IEEE/ACM International Conference on Web Intelligence (WI 2004), pages 686–689, September 2004. 148. X. Yang, A. Bouguettaya, B. Medjahed, H. Long, and W. He. Organizing and accessing web services on air. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 33(6):742–757, Nov. 2003. 149. K. P. Yoon and C-L. Hwang. Multiple Attribute Decision Making: An Introduction. Sage Publications, 2006. 150. B. Yu and M. P. Singh. An evidential model of distributed reputation management. In AAMAS ’02: Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems, pages 294–301, New York, NY, USA, 2002. ACM Press. 151. B. Yu and M. P. Singh. Detecting Deception in Reputation Management. In AAMAS ’03: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, pages 73–80, New York, NY, USA, 2003. ACM. 152. K. Yu, N. Petrovsky, C. Schonbach, J. Koh, and V. Brusic. Methods for Prediction of Peptide Binding to MHC Molecules: A Comparative Study. Journal of Molecular Medicine, 8(3):137–148, 2002. 153. Q. Yu, X. Liu, A. Bouguettaya, and B. Medjahed. Deploying and Managing Web services: Issues, Solutions, and Directions. The VLDB Journal, 17(3):537–572, 2008. 154. G. Zacharia, T. Evgeniou, A. Moukas, P. Boufounos, and P. Maes. Economics of Dynamic Pricing in a Reputation Brokered Agent Mediated Marketplace. Electronic Commerce Research, 1(1-2):85–100, 2001. 155. G. Zacharia and P. Maes. Trust Management Through Reputation Mechanisms. Applied Artificial Intelligence Journal, (14):881–907, 2000. 156. G. Zacharia, A. Moukas, and P. Maes. Collaborative Reputation Mechanisms in Electronic Marketplaces. In HICSS ’99: Proceedings of the 32nd Annual Hawaii International Conference on System Sciences, page 8026, Washington, DC, USA, 1999. IEEE Computer Society. 157. C. Zhang. An improved fully connected hidden markov model for rational vaccine design. Masters Thesis, Department of Computer Science, University of Saskatchewan, February 2005. 158. Q. Zheng and X. Zhang. Study of virtual organizations using multi-agent systems. In Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, pages 1147– 1178, The Netherlands, July 2005. ACM. 159. R. Zhou and K. Hwang. Powertrust: A robust and scalable reputation system for trusted peerto-peer computing. IEEE Transactions on Parallel and Distributed Systems, 18(4):460–473, 2007.

Index

acronyms and abbreviations, xix

reputation, 9, 31, 44

bootstrapping, 69

security, 8 service consumer, 16 service discovery, 17 service oriented computing, 1 service provider, 15 service registry, 15 service Web, 121 service-oriented environment, 13 SOAP, 2, 16 symbols, list of, xix

centralized reputation system, 47 collusion, 46 community, 18 composition, 17, 60, 95 credibility, 45, 69 fuzzy controller, 66 Hidden Markov Models, 55 ontology, 18 orchestrator, 62

third-party certification, 8 trust, 5

performance study, 69

UDDI, 2, 16

quality of service, 31

virtual enterprises, 60

RATEWeb, 43, 79 ratings scarcity, 55, 90 regulations, 8

Web, 1 Web service, 1, 14 WSDL, 2, 16

135