My research, broadly defined, examines how quality data can be obtained from a variety of sources, turned into meaningful information, and used to convey knowledge to the public for making decisions. Much of it focuses on solving real-world problems and builds on my previous industry experience working in data-intensive projects ranging from entertainment to government to defense.
First, I evaluate at how meaningful data can be obtained from different inputs, while satisfying three often-competing constraints: high quality, high speed, and low cost. The sources of this data are often humans (using crowdsourcing) or mechanical sensors.
Once data is obtained, it is processed and turned into information that can be used in decision making. Frequently, the amount of data to evaluate is substantial. I examine the tradeoff between more data and better decision making in future work. Another research challenge is the assurance of data privacy in collections where the data is not fully understood
Conveying this information as usable knowledge to decision makers is the final stage. One key aspect is understanding the user's constraints, and how she needs to use this knowledge to make decisions. This can be done using interactive visualizations, business intelligence and data analytics, as well as techniques in machine learning and data mining, depending on the data and the need.
Crowdsourcing tasks, when designed appropriately, can accomplish tasks quickly, cheaply, and with impressive quality. The challenge for researchers is how to design tasks that hit the sweet spot between cost, time, and accuracy/quality. An important ingredient is the type of incentive offered, which may be intrinsic (taking part for the fun or challenge entailed), extrinsic (taking part for the external reward offered, often financial), or a combination of the two.
There are a number of different ways to implement incentives across different workflow contexts. For example, gamification is one type of intrinsic incentive that works well in certain contexts, such as learning and for tedious tasks such as image labeling. However, in other situations intrinsic motivators do not work as well as extrinsic motivators do. The challenge is to find the right incentive to match the process across different types of tasks.
Machine algorithms are good at performing calculations quickly and accurately. Humans are good at looking at the bigger picture and applying 'real world' knowledge but cannot accomplish the scale of machines. The ideal hybrid man-machine design would allow each agent to focus on what they do best - machines at speed and accuracy, humans at complex tasks which require decision-making that computers cannot (yet) perform.
One challenge to this model is that machines need to know when the don't know the answer, or would perform a task more poorly than humans. Because human decisions are stochastic, nuanced, occasionally irrational, and far more variable, machines cannot make these determinations in the same manner. This invokes machine learning methods and other artificial intelligence and data science techniques in an effort to mimic human decision making. The more machines understand about the humans involved in the task, the greater the confidence in their human-like decisions... but which features are most important in which context?
In many regions of the world, the infrastructure is sufficiently advanced so that its citizens can expect an fast "always on" internet connection, a browser-enabled smartphone, and an interface that sufficiently provides for their information needs. In many regions of the world, however, these expectations are far from a reality. Smartphones are beyond the financial reach of many citizens and few have them, mobile phones are a decade old and used in harsh environments, mobile and Internet connections are unreliable, and a sizeable users are illiterate or unable to use basic phone functions.
The processes and incentives in the developed world do not translate well to these usability conditions. When systems are unreliable or frustrating to use because of poorly designed infrastructure, it affects utility and it limits the efficiency of processes - in other words, people abandon them. From an economic standpoint, this makes the user and the provider worse off. We examine ways in which systems can work with limited infrastructure and interfaces designed with users from the developing world in mind, increasing adoption, efficientcy, and satisfaction.
From translating rare novels to gathering information on the best route to go from location A to location B, the crowd is indispensable in gathering this data quickly. However, allowing only the aggregated information - as opposed to each member of the crowd's individual contribution - to be publicly available requires techniques in privacy preservation.
Translating literature between two languages, for example, requires a consistent tone and voice. This may be easier to accomplish with a single translator; however, the single translator method not use the advantages of speed, low cost, and concerns about intellectual property piracy that a divide-and-conquer approach would provide. In this study, we examine the building blocks of privacy-preserving crowdwork mechanisms so that only the aggregated information is known to each worker.