The following is a sortable list of my publications (all peer-reviewed), the associated abstracts, and links to the papers whenever possible. For a full list of publications, please see my Google Scholar page.
Companies such as Zappos.com and Amazon.com provide financial incentives for newer employees to quit . The premise is that workers who will accept this offer are misaligned with their company culture, which will therefore negatively affect quality over time. Could this pay-to-quit incentive scheme align workers in online labor markets? We conduct five empirical experiments evaluating different pay-to-quit incentives with crowdworkers and evaluate their effects on mean task accuracy, retention rate, and improvement in mean task accuracy. We find that the number of times a user is prompted for the inducement, the type and frequency of performance feedback given to participants, the type of incentive, as well as the amount offered can help retain high-performing workers but encourage poor-performing workers to quit early. When we combine the best features from our experiments and examine their aggregate effectiveness, mean task accuracy is improved by 28.3%. Last, we also find that certain demographics contribute to the effectiveness of pay-to-quit incentives.
Sporting events broadcast on television or through the internet are often supplemented with statistics and background information on each player. This information is typically only available for sporting events followed by a large number of spectators. Here we describe an Android-based augmented reality (AR) tool built on the Tesseract API that can store and provide augmented information about each participant in nearly any sporting event. This AR tool provides for a more engaging spectator experience for viewing professional and amateur events alike. We also describe the preliminary field tests we have conducted, some identified limitations of our approach, and how we plan to address each in future work.
Pooling is a document sampling strategy commonly used to collect relevance judgments when multiple retrieval/ranking algorithms are involved. A fixed number of top ranking documents from each algorithm form a pool. Traditionally, expensive experts judge the pool of documents for relevance. We propose and test two hybrid algorithms as alternatives that reduce assessment costs and are effective. The machine part selects documents to judge from the full set of retrieved documents. The human part uses inexpensive crowd workers to make judgments. We present a clustered and a non-clustered approach for document selection and two experiments testing our algorithms. The first is designed to be statistically robust, controlling for variations across crowd workers, collections, domains and topics. The second is designed along natural lines and investigates more topics. Our results demonstrate high quality can be achieved and at low cost. Moreover, this can be done by judging far fewer documents than with pooling. Precision, recall, F-scores and LAM are very strong, indicating that our algorithms with crowdsourcing offer viable alternatives to collecting judgments via pooling with expert assessments.
In Computer Supported Cooperative Work (CSCW), many tasks require exclusive access to a shared resource by a single collaborator. Similarly, in distributed systems, mutual exclusion is required to ensure concurrency in a resource shared among several processes. These resource allocation algorithms can be divided into two genres: token-based and permission-based. To date, few empirical studies have evaluated token-based collaborative behavior in CSCW tasks. We examine four token-based protocols on a task which requires participants to properly order a series of screenshots obtained from ten short films. Using teams of 3, 4, and 5 participants who are collectively incentivized to perform the task as quickly as possible, we evaluate the effects of team size and token based protocol on task completion and participant satisfaction across 600 sessions. Our study determined that task satisfaction was negatively correlated with team size and positively correlated with the perception of “fairness”, or lack of potential bias, of each protocol.
In this paper, we examine the Keynesian Beauty Contest, a well-known examination of rational agents used to explain the role of consensus predictions in decision making such as price fluctuations in equity markets. Using a game, we study the crowd's ability to judge relevance for both images and textual documents. In addition to asking participants to determine if a document is relevant, we also ask them to rank all choices. One group of participants (N=137) was asked to make judgments based on their own assessment while another group of participants (N = 137) was asked to make judgments based on their estimate of a consensus decision. In addition to measuring recall and precision, our game also uses rank-biased overlap (RBO) to compare each participant's ranked list with the overall consensus decision. Results show the group asked to make ranking decisions based on their estimate of consensus had significantly higher recall for judging relevance in text documents and significantly higher recall and precision when judging relevance for a set of images. We believe this has implications for the determination of consensus across multiple contexts.
Crowdsourcing has rapidly developed as a mechanism to accomplish tasks that are easy for humans to accomplish but are challenging for machines. However, unlike machines, humans need to be cajoled to perform tasks, usually through some type of incentive. Since participants from the crowd are typically anonymous and have no expectation of an ongoing work relationship with a task requester, the types of incentives offered to workers are usually short-term monetary bonuses, which have had an inconclusive impact on crowdsourcing worker quality. In this paper, we explore the notion that the risk attitude of crowdsourcing workers may play an important role in the effectiveness of incentives on task accuracy. Traditional utility theories, such as prospect theory, depend on decisions made relative to a singular reference point, whereas the tri-reference point (TRP) theory holds that three reference points impact decision making. Using the TRP theory as a guide, we develop a game that provides workers with three reference points and subsequently explores the role of multiple reference points on worker risk aversion and task accuracy.
Book ChapterHandbook of Human Computation, 2013, pp 205-214 | ISBN-13: 978-1-4614-8805-7
Human computation techniques, such as crowdsourcing and games, have demonstrated their ability to accomplish portions of information retrieval (IR) tasks that machine-based techniques find challenging. Query refinement is one such IR task that may benefit from human involvement. We conduct an experiment that evaluates the contributions of participants from Amazon Mechanical Turk (N = 40). Each of our crowd participants is randomly assigned to use one of two query interfaces: a traditional web-based interface or a game-based interface. We ask each participant to manually construct queries to respond to a set of OHSUMED information needs and we calculate their resulting recall and precision. Those using a web interface are provided feedback on their initial queries and asked to use this information to reformulate their original queries. Game interface users are provided with instant scoring and asked to refine their queries based on their scores. In our experiment, crowdsourcing-based methods in general provide a significant improvement over machine algorithmic methods, and among crowdsourcing methods, games provide a better mean average precision (MAP) for query reformulations as compared to a non-game interface.
Book ChapterAdvances in Information Retrieval, Volume 7814, 2013, pp 495-506 | ISBN-13: 978-3-642-36973-5
Human computation techniques have demonstrated their ability to accomplish portions of tasks that machine-based techniques find difficult. Query refinement is a task that may benefit from human involvement. We conduct an experiment that evaluates the contributions of two user types: student participants and crowdworkers hired from an online labor market. Human participants are assigned to use one of two query interfaces: a traditional web-based interface or a game-based interface. We ask each group to manually construct queries to respond to TREC information needs and calculate their resulting recall and precision. Traditional web interface users are provided feedback on their initial queries and asked to use this information to reformulate their original queries. Game interface users are provided with instant scoring and ask to refine their queries based on their scores. We measure the resulting feedback-based improvement on each group and compare the results from human computation techniques to machine-based algorithms.
Conference Paper Proceedings of the IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT'12, Macau, China), December, 2012, pp 390-394
In repeated multi-agent constant-sum games, each player's objective is to maximize control over a finite set of resources. We introduce Tens potter, an easy-to-use publicly-available game designed to allow human players to compete as agents against a machine algorithm. The algorithm learns play strategies from humans, reduces them to nine basic strategies, and uses this knowledge to build and adapt its collusion strategy. We use a tournament format to test our algorithm against human players as well as against other established multi-agent algorithms taken from the literature. Through these tournament experiments, we demonstrate how learning techniques adapted using human computation - formation obtained from both human and machine inputs - can contribute to the development of an algorithm able to defeat two well-established multi-agent machine algorithms in tournament play.
Conference Paper Proceedings of the 7th Nordic Conference on Human-Computer Interaction (NordiCHI'12, Copenhagen, Denmark), October 2012, pp 554-557
In this paper, we introduce VisualizIR, a game where players identify relevant document terms that match predefined categories. VisualizIR evaluates players on accuracy, recall, and precision against an established gold standard, a pooled consensus of judgments made by other players, or a weighted combination of the two. The annotated document can then viewed by any XML-compatible browser, allowing for quick identification of terms in the document related to each category. Here we describe some of the playability design tradeoffs made during the game's development, as well as our findings from two experiments conducted using VisualizIR output.
Conference PaperProceedings of the 75th Annual Meeting of the American Society for Information Science and Technology (ASIS&T'12, Baltimore, Maryland, USA), October 2012. pp 1-10
Crowdsourcing and Games with a Purpose (GWAP) have each received considerable attention in recent years. These two human computation mechanisms assist with tasks that cannot be solved by computers alone. Despite this increased attention, much of this transformation has been limited to a few aspects of Information Retrieval (IR). In this paper, we examine these two mechanisms’ applicability to IR. Using an IR model, we apply criteria to determine the suitability of these crowdsourcing and GWAP mechanisms to each step of the model. Our analysis illustrates that these mechanisms can apply to several of these steps with good returns.
Book ChapterSecurity and Privacy in Social Networks, Springer, 2012 | ISBN-13: 978-1-461-44138-0
Crowdsourcing has received considerable attention for its ability to provide researchers and task requesters with an inexpensive, quick and easy method to complete repetitive tasks and utilize human intellect. Most reports have expressed the merits of crowdsourcing; however, little discussion has been reported on the potential to utilize the crowd to accomplish unethical tasks. In this chapter, we start with a survey on crowdsourcing ethics which illustrates the crowd's reluctance to perform unethical tasks. We then conduct an experiment with crowdsourcing workers to explore selected influential factors that might encourage them to knowingly violate the ethical norms of privacy.
Technical PaperProceedings of the JDCL Technical Bulletin, Volume 8, Number 2, September 2012
Crowdsourcing and games with a purpose (GWAP) have each received considerable attention in recent years. These two human computation mechanisms aid humans in solving tasks that either cannot be solved or are difficult to solve using machines. Despite this increased attention, much of this transformation has been limited to a few areas of information retrieval (IR). In this paper, we examine these two mechanisms’ applicability to IR. Using a traditional, or “core” IR model, we break the model into distinct steps, evaluate the literature, and define several essential criteria for the suitability of these crowdsourcing and games to each step. After finding the most suitable steps using these criteria, we choose one step and design an experiment to empirically evaluate how these mechanisms can benefit IR tasks.
Conference PaperProceedings of 2012 IEEE Fourth International Conference on Social Computing (SocialCom'12, Amsterdam, Netherlands), September, 2012, pp 904-909
Although a vast majority of crowd sourcing tasks are for ethical purposes, the anonymity and global reach of online labor markets also create a clearinghouse for unethical crowdsourcing tasks. Recent studies show a majority of students have engaged in academic dishonesty using the Internet, and a growing number find this behavior is acceptable. We conduct a study to see if crowd workers will provide solutions to exams and homework assignments, and knowingly permit these solutions to be used for this purpose. For those who don't agree, we examine if additional financial incentives can entice them. Our findings indicate most crowd workers are willing to permit the use of their work, however, for those that are unwilling, additional financial incentives have little effect on altering their decision.
Conference PaperProceedings of the 35th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR'12, Portland, Oregon, USA), August 2012, pp 871-880
Crowdsourcing is a market of steadily-growing importance upon which both academia and industry increasingly rely. However, this market appears to be inherently infested with a significant share of malicious workers who try to maximise their profits through cheating or sloppiness. This serves to undermine the very merits crowdsourcing has come to represent. Based on previous experience as well as psychological insights, we propose the use of a game in order to attract and retain a larger share of reliable workers to frequently-requested crowdsourcing tasks such as relevance assessments and clustering. In a large-scale comparative study conducted using recent TREC data, we investigate the performance of traditional HIT designs and a game-based alternative that is able to achieve high quality at significantly lower pay rates, facing fewer malicious submissions.
Conference PaperProceedings of the 4th AAAI Human Computation Workshop (HCOMP'12, Toronto, Canada), July, 2012, pp. 87-93
Websites that encourage consumers to research, rate, and review products online have become an increasingly important factor in purchase decisions. This increased importance has been accompanied by a growth in deceptive opinion spam - fraudulent reviews written with the intent to sound authentic and mislead consumers. In this study, we pool deceptive reviews solicited through crowdsourcing with actual reviews obtained from product review websites. We then explore several human- and machine-based assessment methods to spot deceptive opinion spam in our pooled review set. We find that the combination of human-based assessment methods with easily-obtained statistical information generated from the review text outperforms detection methods using human assessors alone.
Conference PaperProceedings of the First International WWW Workshop on Crowdsourcing Web search (CrowdSearch 2012, Lyon, France), April, 2012, pp 48-53
As the amount of user-generated content (UGC) on websites such as YouTube have experienced explosive growth, the demand for searching for relevant content has expanded at a similar pace. Unfortunately the minimally-required production effort and decentralization of content make these searches problematic. In addition, most UGC search efforts rely on notoriously noisy user-supplied tags and comments. In this paper, we examine UGC search strategies on YouTube using video requests from several knowledge markets such as Yahoo! Answers. We compare crowdsourcing and student search efforts to YouTube’s own search interface and apply these strategies to different types of information needs, ranging from easy to difficult. We evaluate our findings using two different assessment methods and discuss how the relative time and financial costs of these three search strategies affect our results.
Conference PaperProceedings of 2011 IEEE Third International Conference on Social Computing (SocialCom'11, Boston, MA, USA), October, 2011, pp 1314-1317
Several recent studies have examined the merits of crowdsourcing to aid in completing repetitive or complex tasks requiring human computation. In comparison, scant attention has been placed on the use of crowdsourcing for the purpose of meeting unethical objectives, which may or may not be known to the participants. In this paper, we explore the potential for which crowdsourcing may be used to bypass commonly-established ethical standards for personal or professional gain.
Conference PaperProceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF'11, Amsterdam, Netherlands), September, 2011, pp 107-118
As video-sharing websites such as YouTube proliferate, the ability to rapidly translate video clips into multiple languages has become an essential component for enhancing their global reach and impact. Moreover, the ability to provide closed captioning in a variety of languages is paramount to reach a wider variety of viewers. We investigate the importance of visual context clues by comparing transcripts of multimedia clips (which allow transcriptionists to make use of visual context clues in their translations) with their corresponding written transcripts (which do not). Additionally, we contrast translations produced using crowdsourcing workers with those made by professional translators on cost and quality. Finally, we evaluate several genres of multimedia to examine the effects of visual context clues on each and demonstrate the results through heat maps.
Conference Paper Proceedings of the SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR'11, Beijing, China) (Demo/Poster)
Crowdsourcing becomes a market of steadily growing importance on which both academia and industry, rely increasingly heavily. However, this market appears to be inherently infested with a significant share of malicious workers who try to maximise their profits through cheating or sloppiness. This serves to undermine the very merits crowdsourcing has come to represent. Based on previous experience as well as psychological insights, we propose the use of a game in crowdsourcing scenarios in order to attract and retain a larger share of entertainment seekers to relevance assessment tasks.
Conference PaperProceedings of the WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (CSDM'11, Hong Kong, China)
Many human resource tasks, such as screening a large number of job candidates, are labor-intensive and rely on subjective evaluation, making them excellent candidates for crowdsourcing. We conduct several experiments using the Amazon Mechanical Turk platform to conduct resume reviews. We then apply several incentive-based models and examine their effects. Next, we assess the accuracy measures of our incentive models against a gold standard and ascertain which incentives provide the best results. We find that some incentives actually encourage quality if the task is designed appropriately.
Book ChapterCurrent Challenges in Patent Information Retrieval. The Information Retrieval Series, Vol. 29. 1st Edition., 2011| ISBN-13: 978-3-642-19230-2
Searches in patent collections to determine if a given patent application has related prior art patents is non-trivial and often requires extensive manpower. When time is constrained, an automatically generated, ranked list of prior art patents associated with a given patent application decreases search costs and improves search efficiency. One may view the discovery of this prior art patent set as a problem of finding patents ‘related’ to the patent application. To accomplish this, we examine whether semantic relations between patent classification codes can aid in the recognition of related prior art patents. We explore similarity measures for hierarchically ordered patent classes and subclasses for this purpose. Next, we examine various patent feature-weighting schemes to achieve the best similarities between our patent applications and related prior art patents. Finally, we provide a method and demonstrate that patent prior art searches can successfully be used as an aid in patent ranking.
Conference PaperProceedings of the 3rd International Workshop on Patent Information Retrieval (PaIR '10, Toronto, Canada). pp 27-32
Patent classification systems are used to help scrutinize patent applications for possible violations of the novelty and non-obviousness/inventive steps of a patentability test. There are several different patent classification systems in use today, each with a different underlying philosophy and approach. We compare the two most widely-used patent classification systems -- the IPC and USPC -- and examine their ability to help re-rank patents based on similarity. We observed a significant improvement in MAP, Recall@100, and nDCG when using these systems to re-rank our retrieved document set, demonstrating their overall utility in patent searches.
Conference PaperProceedings of the 4th IEEE International Conference on Semantic Computing (ICSC'10, Carnegie Mellon University, Pittsburgh, PA, USA), pp 414-419
ews event modeling and tracking in the social web is the task of discovering which news events individuals in social communities are most interested in, how much discussion these events generate and tracking these discussions over time. The task could provide informative summaries on what has happened in the real world, yield important knowledge on what are the most important events from the crowd's perspective and reveal their temporal evolutionary trends. Latent Dirichlet Allocation (LDA) has been used intensively for modeling and tracking events (or topics) in text streams. However, the event models discovered by this bottom-up approach have limitations such as a lack of semantic correspondence to real world events. Besides, they do not scale well to large datasets. This paper proposes a novel latent Dirichlet framework for event modeling and tracking. Our approach takes into account ontological knowledge on events that exist in the real world to guide the modeling and tracking processes. Therefore, event models extracted from the social web by our approach are always meaningful and semantically match with real world events. Practically, our approach requires only a single scan over the dataset to model and track events and hence scales well with dataset size.
Conference PaperProceedings of the 2nd International Workshop on Patent Information Retrieval (PaIR'09, Hong Kong, China). p. 29-32 doi=10.1145/1651343.1651350 (Poster)
Searches on patents to determine prior art violations are often cumbersome and require extensive manpower to accomplish successfully. When time is constrained, an automatically generated list of candidate patents may decrease search costs and improve search efficiency. We examine whether semantic relations inferred from the pseudo-hierarchy of patent classifications can contribute to the recognition of related patents. We examine a similarity measure for hierarchically-ordered patent classes and subclasses and return a ranked list of candidate patents, using a similarity measure that has demonstrated its effectiveness when applied to WordNet ontologies. We then demonstrate that this ranked list of candidate patents allows us to better constrain the effort needed to examine for prior art violations on a target patent.
Conference PaperProceedings of the 3rd International AAAI Conference on Weblogs and Social Media Data Challenge Workshop (ICWSM-DCW' 09, San Jose, California, USA), pp 24-32
Event tracking is the task of discovering temporal patterns of events from text streams. Existing approaches for event tracking have two limitations: scalability and their inability to rule out non-relevant portions within texts in the stream ‘relevant’ to the event of interest. In this study, we propose a novel approach to tackle these limitations. To demonstrate our approach, we track news events across a collection of weblogs spanning a two-month time period. In particular we track variations in the intensity of discussion on a given event over time. We demonstrate that our model is capable of tracking both events and sub-events at a finer granularity. We also present in this paper our analysis of the blog dataset distributed by the conference organizers